plotly
pandas
SQL
pandasql
yammer
case-studies
feasibility-studies
hypothesis-testing
data-visualisation
python
ipynb
]
In this Part 2 of three case studies investigating Yammer–a social network for the workplace, we will be helping the product team determine priorities for their next development cycle. They are considering improving the site’s search functionality, but first need our help deciding whether or not to work on the project in the first place. Thus, it’s our job to consider the impact that search has on users, and see if there’s any specific improvements that can be made. We’ll be making recommendations to the development team, so it’s best that we form quantitative hypotheses that seek to understand the general quality of an individual user’s search experience.
Let’s get started…
In this study, we will be using a few common technologies:
# prerequisites
!pip install pandasql
!pip install plotly
# data tools
import pandas as pd # for querying pandas dataframes
import pandasql
from pandasql import sqldf
# plotting tools
import plotly.graph_objects as go
import plotly.express as px
import plotly.io as pio # output charts to HMTL
#init_notebook_mode()
There are two tables that are relevant to this problem, yammer_events
and yammer_users
.
It is going to be crucial that we look at the following events in our events table:
search_autocomplete
: this event is logged when a user clicks on a search option from autocomplete.search_run
: this event is logged when a user runs a search and sees the results page.search_click_X
: this event is logged when a user clicks on a search result X
, which ranges from 1
to 10
.events_path = '../src/yammer_events.csv'
yammer_events = pd.read_csv(events_path, sep=',', header=0)
yammer_events
user_id | occurred_at | event_type | event_name | location | device | user_type | |
---|---|---|---|---|---|---|---|
0 | 10522.0 | 2014-05-02 11:02:39 | engagement | login | Japan | dell inspiron notebook | 3.0 |
1 | 10522.0 | 2014-05-02 11:02:53 | engagement | home_page | Japan | dell inspiron notebook | 3.0 |
2 | 10522.0 | 2014-05-02 11:03:28 | engagement | like_message | Japan | dell inspiron notebook | 3.0 |
3 | 10522.0 | 2014-05-02 11:04:09 | engagement | view_inbox | Japan | dell inspiron notebook | 3.0 |
4 | 10522.0 | 2014-05-02 11:03:16 | engagement | search_run | Japan | dell inspiron notebook | 3.0 |
... | ... | ... | ... | ... | ... | ... | ... |
340827 | 18815.0 | 2014-08-31 13:41:46 | engagement | like_message | Ireland | dell inspiron notebook | 2.0 |
340828 | 18815.0 | 2014-08-31 13:42:11 | engagement | home_page | Ireland | dell inspiron notebook | 2.0 |
340829 | 18815.0 | 2014-08-31 13:42:43 | engagement | send_message | Ireland | dell inspiron notebook | 2.0 |
340830 | 18815.0 | 2014-08-31 13:43:07 | engagement | home_page | Ireland | dell inspiron notebook | 2.0 |
340831 | 18815.0 | 2014-08-31 13:43:42 | engagement | like_message | Ireland | dell inspiron notebook | 2.0 |
340832 rows × 7 columns
Here’s a re-hash of the above: Yammer’s product team is determining priorities for the next development cycle. They’re considering improving the site’s search functionality. Before they proceed, they want to know whether they should even work on search in the first place, and, if so, how they should modify it.
In particular, you should seek to answer the following questions:
In order to form our hypotheses (by way of metrics we will investigate), we need to use the above questions to better understand the ultimate purpose of search at Yammer. For this study, we will consider a session to be a string of events that occur without a 10-minute break between any two events. That goes to say that if an active users fails to log an event within a 10-minute window, their session is considered over and the next engagement will mark a new session.
The first thing to understand is whether anyone even uses search at all.
q = """
SELECT
week_of,
(with_autocompletes * 1.00 / sessions) * 100 AS perct_autocompletes,
(with_runs * 1.00 / sessions) * 100 AS perct_runs
FROM (
SELECT
strftime('%Y-%m-%d', session_start, 'weekday 0', '-6 days') AS week_of,
COUNT(*) sessions,
COUNT(CASE WHEN autocompletes > 0 THEN session ELSE NULL END) AS with_autocompletes,
COUNT(CASE WHEN runs > 0 THEN session ELSE NULL END) AS with_runs
FROM (
SELECT
strftime('%Y-%m-%d %H:%M:%S', session_start) AS session_start,
session,
user_id,
COUNT(CASE WHEN event_name = 'search_autocomplete' THEN user_id ELSE NULL END) AS autocompletes,
COUNT(CASE WHEN event_name = 'search_run' THEN user_id ELSE NULL END) AS runs,
COUNT(CASE WHEN event_name LIKE 'search_click_result_%' THEN user_id ELSE NULL END) AS clicks
FROM (
SELECT
e.*,
sessions.session,
sessions.session_start
FROM yammer_events e
LEFT JOIN (
SELECT
user_id,
session,
MIN(occurred_at) AS session_start,
MAX(occurred_at) AS session_end
FROM (
SELECT
intervals.*,
CASE WHEN prev_event >= 10.0 THEN id
WHEN prev_event IS NULL THEN id
ELSE LAG(id, 1) OVER (PARTITION BY user_id ORDER BY occurred_at)
END AS session
FROM (
SELECT
user_id,
occurred_at,
event_type,
event_name,
ROUND((JULIANDAY(occurred_at) - JULIANDAY(LAG(occurred_at, 1) OVER (PARTITION BY user_id ORDER BY occurred_at))) * 1440) AS prev_event,
ROUND((JULIANDAY(LEAD(occurred_at, 1) OVER (PARTITION BY user_id ORDER BY occurred_at)) - JULIANDAY(occurred_at)) * 1440) AS next_event,
ROW_NUMBER() OVER () AS id
FROM yammer_events
WHERE event_type = 'engagement'
ORDER BY user_id, occurred_at
) intervals
WHERE prev_event >= 10.0
OR next_event >= 10.0
OR prev_event IS NULL
OR next_event IS NULL
) bounds
GROUP BY 1, 2
) sessions
ON e.user_id = sessions.user_id
AND e.occurred_at >= sessions.session_start
AND e.occurred_at <= sessions.session_end
WHERE e.event_type = 'engagement'
) events
GROUP BY 1, 2, 3
) counts
GROUP BY 1
ORDER BY 1
) counts
GROUP BY 1
ORDER BY 1
"""
searches = pandasql.sqldf(q, globals())
week_of = searches['week_of']
perct_autocompletes = searches['perct_autocompletes']
perct_runs = searches['perct_runs']
fig1 = go.Figure()
fig1.add_trace(go.Scatter(x=week_of, y=perct_autocompletes, mode='lines', name='searches with autocompletes'))
fig1.add_trace(go.Scatter(x=week_of, y=perct_runs, mode='lines', name='searches with runs'))
fig1.update_layout(title='Search Rate by Week', xaxis_title='Week of', yaxis_title='Percent of sessions')
pio.write_html(fig1, file='2021-06-01-Understanding-Search-Functionality-fig1.html', auto_open=True)
q = """
SELECT
autocompletes,
COUNT(*) AS sessions
FROM (
SELECT
strftime('%Y-%m-%d %H:%M:%S', session_start) AS session_start,
session,
user_id,
COUNT(CASE WHEN event_name = 'search_autocomplete' THEN user_id ELSE NULL END) AS autocompletes,
COUNT(CASE WHEN event_name = 'search_run' THEN user_id ELSE NULL END) AS runs,
COUNT(CASE WHEN event_name LIKE 'search_click_result_%' THEN user_id ELSE NULL END) AS clicks
FROM (
SELECT
e.*,
session,
session_start
FROM yammer_events e
LEFT JOIN (
SELECT
user_id,
session,
MIN(occurred_at) AS session_start,
MAX(occurred_at) AS session_end
FROM (
SELECT
intervals.*,
CASE WHEN prev_event >= 10.0 THEN id
WHEN prev_event IS NULL THEN id
ELSE LAG(id, 1) OVER (PARTITION BY user_id ORDER BY occurred_at)
END AS session
FROM (
SELECT
user_id,
occurred_at,
event_type,
event_name,
ROUND((JULIANDAY(occurred_at) - JULIANDAY(LAG(occurred_at, 1) OVER (PARTITION BY user_id ORDER BY occurred_at))) * 1440) AS prev_event,
ROUND((JULIANDAY(LEAD(occurred_at, 1) OVER (PARTITION BY user_id ORDER BY occurred_at)) - JULIANDAY(occurred_at)) * 1440) AS next_event,
ROW_NUMBER() OVER () AS id
FROM yammer_events
WHERE event_type = 'engagement'
) intervals
WHERE prev_event >= 10.0
OR next_event >= 10.0
OR prev_event IS NULL
OR next_event IS NULL
) bounds
GROUP BY 1, 2
) sessions
ON e.user_id = sessions.user_id
AND e.occurred_at >= sessions.session_start
AND e.occurred_at <= sessions.session_end
WHERE e.event_type = 'engagement'
) events
GROUP BY 1, 2, 3
) counts
WHERE autocompletes > 0
GROUP BY 1
ORDER BY 1
"""
autocompletes = pandasql.sqldf(q, globals())
num_autocompletes = autocompletes['autocompletes']
num_sessions = autocompletes['sessions']
fig2 = go.Figure()
fig2.add_trace(go.Bar(x=num_autocompletes, y=num_sessions, text=num_sessions, textposition='auto', name='sessions with autocomplete'))
fig2.update_yaxes(nticks=6)
fig2.update_layout(title='Number of Sessions with Autocompletes', xaxis_title='Autocompletes per session', yaxis_title='Number of sessions')
pio.write_html(fig2, file='2021-06-01-Understanding-Search-Functionality-fig2.html', auto_open=True)
Conclusions:
autocomplete
searches than they are with the search results page.search_autocomplete
rate is ~23% while our search_run
rate is roughly 8%.If users search a lot, it’s likely that they’re getting value out of the feature– with a major exception. If users search repeatedly within a short timeframe, it’s likely that they’re refining their terms because they were unable to find what they initially wanted.
q = """
SELECT
runs,
COUNT(*) AS sessions
FROM (
SELECT
strftime('%Y-%m-%d %H:%M:%S', session_start) AS session_start,
session,
user_id,
COUNT(CASE WHEN event_name = 'search_autocomplete' THEN user_id ELSE NULL END) AS autocompletes,
COUNT(CASE WHEN event_name = 'search_run' THEN user_id ELSE NULL END) AS runs,
COUNT(CASE WHEN event_name LIKE 'search_click_result_%' THEN user_id ELSE NULL END) AS clicks
FROM (
SELECT
e.*,
session,
session_start
FROM yammer_events e
LEFT JOIN (
SELECT
user_id,
session,
MIN(occurred_at) AS session_start,
MAX(occurred_at) AS session_end
FROM (
SELECT
intervals.*,
CASE WHEN prev_event >= 10.0 THEN id
WHEN prev_event IS NULL THEN id
ELSE LAG(id, 1) OVER (PARTITION BY user_id ORDER BY occurred_at)
END AS session
FROM (
SELECT
user_id,
strftime('%Y-%m-%d %H:%M:%S', occurred_at) AS occurred_at,
event_type,
event_name,
ROUND((JULIANDAY(occurred_at) - JULIANDAY(LAG(occurred_at, 1) OVER (PARTITION BY user_id ORDER BY occurred_at))) * 1440) AS prev_event,
ROUND((JULIANDAY(LEAD(occurred_at, 1) OVER (PARTITION BY user_id ORDER BY occurred_at)) - JULIANDAY(occurred_at)) * 1440) AS next_event,
ROW_NUMBER() OVER () AS id
FROM yammer_events
WHERE event_type = 'engagement'
) intervals
WHERE prev_event >= 10.0
OR next_event >= 10.0
OR prev_event IS NULL
OR next_event IS NULL
) bounds
GROUP BY 1, 2
) sessions
ON e.user_id = sessions.user_id
AND e.occurred_at >= sessions.session_start
AND e.occurred_at <= sessions.session_end
WHERE e.event_type = 'engagement'
) events
GROUP BY 1, 2, 3
) counts
WHERE runs > 0
GROUP BY 1
ORDER BY 1
"""
runs = pandasql.sqldf(q, globals())
num_runs = runs['runs']
num_sessions = runs['sessions']
fig3 = go.Figure()
fig3.add_trace(go.Bar(x=num_runs, y=num_sessions, text=num_sessions, textposition='auto', name='runs per session'))
fig3.update_layout(title='Number of Sessions with Runs', xaxis_title='Number of runs', yaxis_title='Number of sessions')
pio.write_html(fig3, file='2021-06-01-Understanding-Search-Functionality-fig3.html', auto_open=True)
Conclusions:
search_run
) per session. This is an important metric to consider, since it appears that users are typically running multiple searches in a single session.search_run
(Yammer’s results page) is a lesser used feature, this indicates that the search results might not be very good.If a user clicks many links in the search results, it’s likely that she isn’t having a great experience. However, the inverse is not necessarily true–clicking only one result does not imply a success. If the user clicks through one result, then refines their search, that’s certainly not a great experience, so search frequently is probably a better way to understand that piece of the puzzle. Clickthroughs are, however, very useful in determining whether search rankings are good. If users frequently click low results or scroll to additional pages, then the ranking algorithm should probably be adjusted.
q = """
SELECT
clicks,
COUNT(*) AS sessions
FROM (
SELECT
strftime('%Y-%m-%d %H:%M:%S', session_start) AS session_start,
session,
user_id,
COUNT(CASE WHEN event_name = 'search_autocomplete' THEN user_id ELSE NULL END) AS autocompletes,
COUNT(CASE WHEN event_name = 'search_run' THEN user_id ELSE NULL END) AS runs,
COUNT(CASE WHEN event_name LIKE 'search_click_result_%' THEN user_id ELSE NULL END) AS clicks
FROM (
SELECT
e.*,
session,
session_start
FROM yammer_events e
LEFT JOIN (
SELECT
user_id,
session,
MIN(occurred_at) AS session_start,
MAX(occurred_at) AS session_end
FROM (
SELECT
intervals.*,
CASE WHEN prev_event >= 10.0 THEN id
WHEN prev_event IS NULL THEN id
ELSE LAG(id, 1) OVER (PARTITION BY user_id ORDER BY occurred_at)
END AS session
FROM (
SELECT
user_id,
occurred_at,
event_type,
event_name,
ROUND((JULIANDAY(occurred_at) - JULIANDAY(LAG(occurred_at, 1) OVER (PARTITION BY user_id ORDER BY occurred_at))) * 1440) AS prev_event,
ROUND((JULIANDAY(LEAD(occurred_at, 1) OVER (PARTITION BY user_id ORDER BY occurred_at)) - JULIANDAY(occurred_at)) * 1440) AS next_event,
ROW_NUMBER() OVER () AS id
FROM yammer_events
WHERE event_type = 'engagement'
) intervals
WHERE prev_event >= 10.0
OR next_event >= 10.0
OR prev_event IS NULL
OR next_event IS NULL
) bounds
GROUP BY 1, 2
) sessions
ON e.user_id = sessions.user_id
AND e.occurred_at >= sessions.session_start
AND e.occurred_at <= sessions.session_end
WHERE e.event_type = 'engagement'
) events
GROUP BY 1, 2, 3
) counts
WHERE runs > 0
GROUP BY 1
ORDER BY 1
"""
clicks = pandasql.sqldf(q, globals())
num_clicks = clicks['clicks']
num_sessions = clicks['sessions']
fig4 = go.Figure()
fig4.add_trace(go.Bar(x=num_clicks, y=num_sessions, text=num_sessions, textposition='auto', name='num clicks per session'))
fig4.update_layout(title='Clicks Per Session During Search Runs', xaxis_title='Clicks per session', yaxis_title='Number of sessions')
pio.write_html(fig4, file='2021-06-01-Understanding-Search-Functionality-fig4.html', auto_open=True)
q = """
SELECT
runs,
ROUND(AVG(clicks), 2) AS avg_clicks
FROM (
SELECT
strftime('%Y-%m-%d %H:%M:%S', session_start) AS session_start,
session,
user_id,
COUNT(CASE WHEN event_name = 'search_autocomplete' THEN user_id ELSE NULL END) AS autocompletes,
COUNT(CASE WHEN event_name = 'search_run' THEN user_id ELSE NULL END) AS runs,
COUNT(CASE WHEN event_name LIKE 'search_click_result_%' THEN user_id ELSE NULL END) AS clicks
FROM (
SELECT
e.*,
session,
session_start
FROM yammer_events e
LEFT JOIN (
SELECT
user_id,
session,
MIN(occurred_at) AS session_start,
MAX(occurred_at) AS session_end
FROM (
SELECT
intervals.*,
CASE WHEN prev_event >= 10.0 THEN id
WHEN prev_event IS NULL THEN id
ELSE LAG(id, 1) OVER (PARTITION BY user_id ORDER BY occurred_at)
END AS session
FROM (
SELECT
user_id,
occurred_at,
event_name,
event_type,
ROUND((JULIANDAY(occurred_at) - JULIANDAY(LAG(occurred_at, 1) OVER (PARTITION BY user_id ORDER BY occurred_at))) * 1440) AS prev_event,
ROUND((JULIANDAY(LEAD(occurred_at, 1) OVER (PARTITION BY user_id ORDER BY occurred_at)) - JULIANDAY(occurred_at)) * 1440) AS next_event,
ROW_NUMBER() OVER () AS id
FROM yammer_events
WHERE event_type = 'engagement'
) intervals
WHERE prev_event >= 10.0
OR next_event >= 10.0
OR prev_event IS NULL
OR next_event IS NULL
) bounds
GROUP BY 1, 2
) sessions
ON e.user_id = sessions.user_id
AND e.occurred_at >= sessions.session_start
AND e.occurred_at <= sessions.session_end
WHERE e.event_type = 'engagement'
) events
GROUP BY 1, 2, 3
) counts
WHERE runs > 0
GROUP BY 1
ORDER BY 1
"""
avg_clicks = pandasql.sqldf(q, globals())
runs = avg_clicks['runs']
avg_clicks = avg_clicks['avg_clicks']
fig5 = go.Figure()
fig5.add_trace(go.Bar(x=runs, y=avg_clicks, text=avg_clicks, textposition='auto', name='avg clicks'))
fig5.update_layout(title='Average Clicks Per Search', xaxis_title='Searches per session', yaxis_title='Average clicks per session')
pio.write_html(fig5, file='2021-06-01-Understanding-Search-Functionality-fig5.html', auto_open=True)
q = """
SELECT
r.*
FROM (
SELECT
CASE event_name
WHEN 'search_click_result_1' THEN 1
WHEN 'search_click_result_2' THEN 2
WHEN 'search_click_result_3' THEN 3
WHEN 'search_click_result_4' THEN 4
WHEN 'search_click_result_5' THEN 5
WHEN 'search_click_result_6' THEN 6
WHEN 'search_click_result_7' THEN 7
WHEN 'search_click_result_8' THEN 8
WHEN 'search_click_result_9' THEN 9
WHEN 'search_click_result_10' THEN 10
END AS result,
COUNT(user_id) AS clicks
FROM yammer_events
GROUP BY 1
) r
WHERE r.result IS NOT NULL
"""
results = pandasql.sqldf(q, globals())
result_num = results['result']
num_clicks = results['clicks']
fig6 = go.Figure()
fig6.add_trace(go.Bar(x=result_num, y=num_clicks, text=num_clicks, textposition='auto', name='Clicks per result'))
fig6.update_layout(title='Clicks by Result Order', xaxis_title='Search result order', yaxis_title='Number of clicks')
pio.write_html(fig6, file='2021-06-01-Understanding-Search-Functionality-fig6.html', auto_open=True)
q = """
WITH intervals AS (
SELECT
user_id,
occurred_at,
event_name,
event_type,
ROUND((JULIANDAY(occurred_at) - JULIANDAY(LAG(occurred_at, 1) OVER (PARTITION BY user_id ORDER BY occurred_at))) * 1440) AS prev_event,
ROUND((JULIANDAY(LEAD(occurred_at, 1) OVER (PARTITION BY user_id ORDER BY occurred_at)) - JULIANDAY(occurred_at)) * 1440) AS next_event,
ROW_NUMBER() OVER () AS id
FROM yammer_events
WHERE event_type = 'engagement'
),
bounds AS (
SELECT
user_id,
occurred_at,
CASE WHEN prev_event >= 10.0 THEN id
WHEN prev_event IS NULL THEN id
ELSE LAG(id, 1) OVER (PARTITION BY user_id ORDER BY occurred_at)
END AS session
FROM intervals
WHERE prev_event >= 10.0
OR next_event >= 10.0
OR prev_event IS NULL
OR next_event IS NULL
),
sessions AS (
SELECT
user_id,
session,
MIN(occurred_at) AS session_start,
MAX(occurred_at) AS session_end
FROM bounds
GROUP BY 1, 2
),
first AS (
SELECT
user_id,
MIN(occurred_at) AS first_run
FROM yammer_events
WHERE event_name = 'search_run'
GROUP BY 1
),
events AS (
SELECT
e.*,
sessions.session,
sessions.session_start,
first.first_run
FROM yammer_events e
JOIN first
ON e.user_id = first.user_id
LEFT JOIN sessions
ON e.user_id = sessions.user_id
AND e.occurred_at >= sessions.session_start
AND e.occurred_at <= sessions.session_end
AND sessions.session_start <= datetime(first.first_run, '1 MONTH')
WHERE e.event_type = 'engagement'
),
counts AS (
SELECT
strftime('%Y-%m-%d %H:%M:%S', session_start) AS session_start,
session,
user_id,
first_run,
COUNT(CASE WHEN event_name = 'search_run' THEN user_id ELSE NULL END) AS runs
FROM events
GROUP BY 1, 2, 3, 4
),
runs AS (
SELECT
user_id,
COUNT(*) AS searches
FROM counts
WHERE runs > 0
GROUP BY 1
ORDER BY 1
)
SELECT
searches,
COUNT(*) AS users
FROM runs
GROUP BY 1
ORDER BY 1
"""
run_retention = pandasql.sqldf(q, globals())
searches = run_retention['searches']
users = run_retention['users']
fig7 = go.Figure()
fig7.add_trace(go.Bar(x=searches, y=users, text=users, textposition='auto', name='sessions with search runs'))
fig7.update_layout(title="Sessions with Search Runs Month After Users' First Search", xaxis_title='Number of sessions with search runs', yaxis_title='Number of users')
pio.write_html(fig7, file='2021-06-01-Understanding-Search-Functionality-fig7.html', auto_open=True)
Conclusions:
search_run
.q = """
WITH intervals AS (
SELECT
user_id,
occurred_at,
event_type,
event_name,
ROUND((JULIANDAY(occurred_at) - JULIANDAY(LAG(occurred_at, 1) OVER (PARTITION BY user_id ORDER BY occurred_at))) * 1440) AS prev_event,
ROUND((JULIANDAY(LEAD(occurred_at, 1) OVER (PARTITION BY user_id ORDER BY occurred_at)) - JULIANDAY(occurred_at)) * 1440) AS next_event,
ROW_NUMBER() OVER () AS id
FROM yammer_events
WHERE event_type = 'engagement'
),
bounds AS (
SELECT
user_id,
occurred_at,
CASE WHEN prev_event >= 10.0 THEN id
WHEN next_event IS NULL THEN id
ELSE LAG(id, 1) OVER (PARTITION BY user_id ORDER BY occurred_at)
END AS session
FROM intervals
WHERE prev_event >= 10.0
OR next_event >= 10.0
OR prev_event IS NULL
OR next_event IS NULL
),
sessions AS (
SELECT
user_id,
session,
MIN(occurred_at) AS session_start,
MAX(occurred_at) AS session_end
FROM bounds
GROUP BY 1, 2
),
first AS (
SELECT
user_id,
MIN(occurred_at) AS first_autocomplete
FROM yammer_events
WHERE event_name = 'search_autocomplete'
GROUP BY 1
),
events AS (
SELECT
e.*,
sessions.session,
sessions.session_start,
first.first_autocomplete
FROM yammer_events e
JOIN first
ON e.user_id = first.user_id
LEFT JOIN sessions
ON e.user_id = sessions.user_id
AND e.occurred_at >= sessions.session_start
AND e.occurred_at <= sessions.session_end
AND sessions.session_start <= datetime(first.first_autocomplete, '1 MONTH')
WHERE e.event_type = 'engagement'
),
counts AS (
SELECT
strftime('%Y-%m-%d %H:%M:%S', session_start) AS session_start,
session,
user_id,
first_autocomplete,
COUNT(CASE WHEN event_name = 'search_autocomplete' THEN user_id ELSE NULL END) AS autocompletes
FROM events
GROUP BY 1, 2, 3, 4
),
searches AS (
SELECT
user_id,
COUNT(*) AS searches
FROM counts
WHERE autocompletes > 0
GROUP BY 1
)
SELECT
searches,
COUNT(*) AS users
FROM searches
GROUP BY 1
ORDER BY 1
"""
autocompletes = pandasql.sqldf(q, globals())
searches = autocompletes['searches']
users = autocompletes['users']
fig8 = go.Figure()
fig8.add_trace(go.Bar(x=searches, y=users, text=users, textposition='auto', name='sessions with autocompletes'))
fig8.update_layout(title="Autocomplete Sessions Month After Users' First Search", xaxis_title='Number of sessions with autocompletes', yaxis_title='Number of users')
pio.write_html(fig8, file='2021-06-01-Understanding-Search-Functionality-fig8.html', auto_open=True)
Conclusions:
After looking at the data more thoroughly, we can safely conclude that Yammer’s autocomplete feature is performing reasonably well. From our analysis, we determine that the best place to start with improvements is our search results page, which appears to be performing poorly. Users seem to have very low click-through rates when running full searches, indicating that they are not finding the result they are looking for. Since Yammer has a goal metric of providing the most value out of every feature, we recommend that the development team begin the task of improving our search ranking algorithm.
The data and problem description used in this study was provided by Mode Analytics.
2021-06-01 00:00:00 -0700 - Written by Jonathan Logan Moran