[ plotly  pandas  SQL  pandasql  yammer  case-studies  feasibility-studies  hypothesis-testing  data-visualisation  python  ipynb  ]

Understanding Search Functionality

In this Part 2 of three case studies investigating Yammer–a social network for the workplace, we will be helping the product team determine priorities for their next development cycle. They are considering improving the site’s search functionality, but first need our help deciding whether or not to work on the project in the first place. Thus, it’s our job to consider the impact that search has on users, and see if there’s any specific improvements that can be made. We’ll be making recommendations to the development team, so it’s best that we form quantitative hypotheses that seek to understand the general quality of an individual user’s search experience.

Let’s get started…

Programming

In this study, we will be using a few common technologies:

# prerequisites
!pip install pandasql
!pip install plotly
# data tools
import pandas as pd					# for querying pandas dataframes
import pandasql
from pandasql import sqldf
# plotting tools
import plotly.graph_objects as go
import plotly.express as px
import plotly.io as pio					# output charts to HMTL
#init_notebook_mode()

Collecting our datasets

There are two tables that are relevant to this problem, yammer_events and yammer_users.

It is going to be crucial that we look at the following events in our events table:

events_path = '../src/yammer_events.csv'
yammer_events = pd.read_csv(events_path, sep=',', header=0)
yammer_events
user_id occurred_at event_type event_name location device user_type
0 10522.0 2014-05-02 11:02:39 engagement login Japan dell inspiron notebook 3.0
1 10522.0 2014-05-02 11:02:53 engagement home_page Japan dell inspiron notebook 3.0
2 10522.0 2014-05-02 11:03:28 engagement like_message Japan dell inspiron notebook 3.0
3 10522.0 2014-05-02 11:04:09 engagement view_inbox Japan dell inspiron notebook 3.0
4 10522.0 2014-05-02 11:03:16 engagement search_run Japan dell inspiron notebook 3.0
... ... ... ... ... ... ... ...
340827 18815.0 2014-08-31 13:41:46 engagement like_message Ireland dell inspiron notebook 2.0
340828 18815.0 2014-08-31 13:42:11 engagement home_page Ireland dell inspiron notebook 2.0
340829 18815.0 2014-08-31 13:42:43 engagement send_message Ireland dell inspiron notebook 2.0
340830 18815.0 2014-08-31 13:43:07 engagement home_page Ireland dell inspiron notebook 2.0
340831 18815.0 2014-08-31 13:43:42 engagement like_message Ireland dell inspiron notebook 2.0

340832 rows × 7 columns

The problem

Here’s a re-hash of the above: Yammer’s product team is determining priorities for the next development cycle. They’re considering improving the site’s search functionality. Before they proceed, they want to know whether they should even work on search in the first place, and, if so, how they should modify it.

Making a recommendation

In particular, you should seek to answer the following questions:

Analysis

In order to form our hypotheses (by way of metrics we will investigate), we need to use the above questions to better understand the ultimate purpose of search at Yammer. For this study, we will consider a session to be a string of events that occur without a 10-minute break between any two events. That goes to say that if an active users fails to log an event within a 10-minute window, their session is considered over and the next engagement will mark a new session.

1. Search Use

The first thing to understand is whether anyone even uses search at all.

Search Use Over Time

q = """
SELECT
  week_of,
  (with_autocompletes * 1.00 / sessions) * 100 AS perct_autocompletes,
  (with_runs * 1.00 / sessions) * 100 AS perct_runs
FROM (
      SELECT
        strftime('%Y-%m-%d', session_start, 'weekday 0', '-6 days') AS week_of,
        COUNT(*) sessions,
        COUNT(CASE WHEN autocompletes > 0 THEN session ELSE NULL END) AS with_autocompletes,
        COUNT(CASE WHEN runs > 0 THEN session ELSE NULL END) AS with_runs
      FROM (
            SELECT
              strftime('%Y-%m-%d %H:%M:%S', session_start) AS session_start,
              session,
              user_id,
              COUNT(CASE WHEN event_name = 'search_autocomplete' THEN user_id ELSE NULL END) AS autocompletes,
              COUNT(CASE WHEN event_name = 'search_run' THEN user_id ELSE NULL END) AS runs,
              COUNT(CASE WHEN event_name LIKE 'search_click_result_%' THEN user_id ELSE NULL END) AS clicks
            FROM (
                  SELECT
                    e.*,
                    sessions.session,
                    sessions.session_start
                  FROM yammer_events e
                  LEFT JOIN (
                              SELECT
                                user_id,
                                session,
                                MIN(occurred_at) AS session_start,
                                MAX(occurred_at) AS session_end
                              FROM (
                                    SELECT
                                      intervals.*,
                                      CASE WHEN prev_event >= 10.0 THEN id
                                           WHEN prev_event IS NULL THEN id
                                           ELSE LAG(id, 1) OVER (PARTITION BY user_id ORDER BY occurred_at)
                                           END AS session
                                    FROM (
                                          SELECT
                                            user_id,
                                            occurred_at,
                                            event_type,
                                            event_name,
                                            ROUND((JULIANDAY(occurred_at) - JULIANDAY(LAG(occurred_at, 1) OVER (PARTITION BY user_id ORDER BY occurred_at))) * 1440) AS prev_event,
                                            ROUND((JULIANDAY(LEAD(occurred_at, 1) OVER (PARTITION BY user_id ORDER BY occurred_at)) - JULIANDAY(occurred_at)) * 1440) AS next_event,
                                            ROW_NUMBER() OVER () AS id
                                          FROM yammer_events
                                          WHERE event_type = 'engagement'
                                          ORDER BY user_id, occurred_at
                                    ) intervals
                                    WHERE prev_event >= 10.0
                                       OR next_event >= 10.0
                                       OR prev_event IS NULL
                                       OR next_event IS NULL
                              ) bounds
                              GROUP BY 1, 2
                  ) sessions
                  ON e.user_id = sessions.user_id
                  AND e.occurred_at >= sessions.session_start
                  AND e.occurred_at <= sessions.session_end
                  WHERE e.event_type = 'engagement'
            ) events
            GROUP BY 1, 2, 3
      ) counts
      GROUP BY 1
      ORDER BY 1
) counts
GROUP BY 1
ORDER BY 1
"""
searches = pandasql.sqldf(q, globals())
week_of = searches['week_of']
perct_autocompletes = searches['perct_autocompletes']
perct_runs = searches['perct_runs']
fig1 = go.Figure()
fig1.add_trace(go.Scatter(x=week_of, y=perct_autocompletes, mode='lines', name='searches with autocompletes'))
fig1.add_trace(go.Scatter(x=week_of, y=perct_runs, mode='lines', name='searches with runs'))
fig1.update_layout(title='Search Rate by Week', xaxis_title='Week of', yaxis_title='Percent of sessions')
pio.write_html(fig1, file='2021-06-01-Understanding-Search-Functionality-fig1.html', auto_open=True)

Search Use Per Session

q = """
SELECT
  autocompletes,
  COUNT(*) AS sessions
FROM (
      SELECT
        strftime('%Y-%m-%d %H:%M:%S', session_start) AS session_start,
        session,
        user_id,
        COUNT(CASE WHEN event_name = 'search_autocomplete' THEN user_id ELSE NULL END) AS autocompletes,
        COUNT(CASE WHEN event_name = 'search_run' THEN user_id ELSE NULL END) AS runs,
        COUNT(CASE WHEN event_name LIKE 'search_click_result_%' THEN user_id ELSE NULL END) AS clicks
      FROM (
            SELECT
              e.*,
              session,
              session_start
            FROM yammer_events e
            LEFT JOIN (
                        SELECT
                          user_id,
                          session,
                          MIN(occurred_at) AS session_start,
                          MAX(occurred_at) AS session_end
                        FROM (
                              SELECT
                                intervals.*,
                                CASE WHEN prev_event >= 10.0 THEN id
                                     WHEN prev_event IS NULL THEN id
                                     ELSE LAG(id, 1) OVER (PARTITION BY user_id ORDER BY occurred_at)
                                     END AS session
                              FROM (
                                    SELECT
                                      user_id,
                                      occurred_at,
                                      event_type,
                                      event_name,
                                      ROUND((JULIANDAY(occurred_at) - JULIANDAY(LAG(occurred_at, 1) OVER (PARTITION BY user_id ORDER BY occurred_at))) * 1440) AS prev_event,
                                      ROUND((JULIANDAY(LEAD(occurred_at, 1) OVER (PARTITION BY user_id ORDER BY occurred_at)) - JULIANDAY(occurred_at)) * 1440) AS next_event,
                                      ROW_NUMBER() OVER () AS id
                                    FROM yammer_events
                                    WHERE event_type = 'engagement'
                              ) intervals
                              WHERE prev_event >= 10.0
                                 OR next_event >= 10.0
                                 OR prev_event IS NULL
                                 OR next_event IS NULL
                        ) bounds
                        GROUP BY 1, 2
            ) sessions
            ON e.user_id = sessions.user_id
            AND e.occurred_at >= sessions.session_start
            AND e.occurred_at <= sessions.session_end
            WHERE e.event_type = 'engagement'
      ) events
      GROUP BY 1, 2, 3
) counts
WHERE autocompletes > 0
GROUP BY 1
ORDER BY 1
"""
autocompletes = pandasql.sqldf(q, globals())
num_autocompletes = autocompletes['autocompletes']
num_sessions = autocompletes['sessions']
fig2 = go.Figure()
fig2.add_trace(go.Bar(x=num_autocompletes, y=num_sessions, text=num_sessions, textposition='auto', name='sessions with autocomplete'))
fig2.update_yaxes(nticks=6)
fig2.update_layout(title='Number of Sessions with Autocompletes', xaxis_title='Autocompletes per session', yaxis_title='Number of sessions')
pio.write_html(fig2, file='2021-06-01-Understanding-Search-Functionality-fig2.html', auto_open=True)

Conclusions:

2. Search Frequency

If users search a lot, it’s likely that they’re getting value out of the feature– with a major exception. If users search repeatedly within a short timeframe, it’s likely that they’re refining their terms because they were unable to find what they initially wanted.

Search Runs Per Session

q = """
SELECT
  runs,
  COUNT(*) AS sessions
FROM (
      SELECT
        strftime('%Y-%m-%d %H:%M:%S', session_start) AS session_start,
        session,
        user_id,
        COUNT(CASE WHEN event_name = 'search_autocomplete' THEN user_id ELSE NULL END) AS autocompletes,
        COUNT(CASE WHEN event_name = 'search_run' THEN user_id ELSE NULL END) AS runs,
        COUNT(CASE WHEN event_name LIKE 'search_click_result_%' THEN user_id ELSE NULL END) AS clicks
      FROM (
            SELECT
              e.*,
              session,
              session_start
            FROM yammer_events e
            LEFT JOIN (
                        SELECT
                          user_id,
                          session,
                          MIN(occurred_at) AS session_start,
                          MAX(occurred_at) AS session_end
                        FROM (
                              SELECT
                                intervals.*,
                                CASE WHEN prev_event >= 10.0 THEN id
                                     WHEN prev_event IS NULL THEN id
                                     ELSE LAG(id, 1) OVER (PARTITION BY user_id ORDER BY occurred_at)
                                     END AS session
                              FROM (
                                    SELECT
                                      user_id,
                                      strftime('%Y-%m-%d %H:%M:%S', occurred_at) AS occurred_at,
                                      event_type,
                                      event_name,
                                      ROUND((JULIANDAY(occurred_at) - JULIANDAY(LAG(occurred_at, 1) OVER (PARTITION BY user_id ORDER BY occurred_at))) * 1440) AS prev_event,
                                      ROUND((JULIANDAY(LEAD(occurred_at, 1) OVER (PARTITION BY user_id ORDER BY occurred_at)) - JULIANDAY(occurred_at)) * 1440) AS next_event,
                                      ROW_NUMBER() OVER () AS id
                                    FROM yammer_events
                                    WHERE event_type = 'engagement'
                              ) intervals
                              WHERE prev_event >= 10.0
                                 OR next_event >= 10.0
                                 OR prev_event IS NULL
                                 OR next_event IS NULL
                        ) bounds
                        GROUP BY 1, 2
            ) sessions
            ON e.user_id = sessions.user_id
            AND e.occurred_at >= sessions.session_start
            AND e.occurred_at <= sessions.session_end
            WHERE e.event_type = 'engagement'
      ) events
      GROUP BY 1, 2, 3
) counts
WHERE runs > 0
GROUP BY 1
ORDER BY 1
"""
runs = pandasql.sqldf(q, globals())
num_runs = runs['runs']
num_sessions = runs['sessions']
fig3 = go.Figure()
fig3.add_trace(go.Bar(x=num_runs, y=num_sessions, text=num_sessions, textposition='auto', name='runs per session'))
fig3.update_layout(title='Number of Sessions with Runs', xaxis_title='Number of runs', yaxis_title='Number of sessions')
pio.write_html(fig3, file='2021-06-01-Understanding-Search-Functionality-fig3.html', auto_open=True)

Conclusions:

3. Clickthroughs

If a user clicks many links in the search results, it’s likely that she isn’t having a great experience. However, the inverse is not necessarily true–clicking only one result does not imply a success. If the user clicks through one result, then refines their search, that’s certainly not a great experience, so search frequently is probably a better way to understand that piece of the puzzle. Clickthroughs are, however, very useful in determining whether search rankings are good. If users frequently click low results or scroll to additional pages, then the ranking algorithm should probably be adjusted.

Clickthroughs in Search Runs

q = """
SELECT
  clicks,
  COUNT(*) AS sessions
FROM (
        SELECT
          strftime('%Y-%m-%d %H:%M:%S', session_start) AS session_start,
          session,
          user_id,
          COUNT(CASE WHEN event_name = 'search_autocomplete' THEN user_id ELSE NULL END) AS autocompletes,
          COUNT(CASE WHEN event_name = 'search_run' THEN user_id ELSE NULL END) AS runs,
          COUNT(CASE WHEN event_name LIKE 'search_click_result_%' THEN user_id ELSE NULL END) AS clicks
        FROM (
              SELECT
                e.*,
                session,
                session_start
              FROM yammer_events e
              LEFT JOIN (
                          SELECT
                            user_id,
                            session,
                            MIN(occurred_at) AS session_start,
                            MAX(occurred_at) AS session_end
                          FROM (
                                SELECT
                                  intervals.*,
                                  CASE WHEN prev_event >= 10.0 THEN id
                                       WHEN prev_event IS NULL THEN id
                                       ELSE LAG(id, 1) OVER (PARTITION BY user_id ORDER BY occurred_at)
                                       END AS session
                                FROM (
                                      SELECT
                                        user_id,
                                        occurred_at,
                                        event_type,
                                        event_name,
                                        ROUND((JULIANDAY(occurred_at) - JULIANDAY(LAG(occurred_at, 1) OVER (PARTITION BY user_id ORDER BY occurred_at))) * 1440) AS prev_event,
                                        ROUND((JULIANDAY(LEAD(occurred_at, 1) OVER (PARTITION BY user_id ORDER BY occurred_at)) - JULIANDAY(occurred_at)) * 1440) AS next_event,
                                        ROW_NUMBER() OVER () AS id
                                      FROM yammer_events
                                      WHERE event_type = 'engagement'
                                ) intervals
                                WHERE prev_event >= 10.0
                                   OR next_event >= 10.0
                                   OR prev_event IS NULL
                                   OR next_event IS NULL
                          ) bounds
                          GROUP BY 1, 2
              ) sessions
              ON e.user_id = sessions.user_id
              AND e.occurred_at >= sessions.session_start
              AND e.occurred_at <= sessions.session_end
              WHERE e.event_type = 'engagement'
        ) events
        GROUP BY 1, 2, 3
) counts
WHERE runs > 0
GROUP BY 1
ORDER BY 1
"""
clicks = pandasql.sqldf(q, globals())
num_clicks = clicks['clicks']
num_sessions = clicks['sessions']
fig4 = go.Figure()
fig4.add_trace(go.Bar(x=num_clicks, y=num_sessions, text=num_sessions, textposition='auto', name='num clicks per session'))
fig4.update_layout(title='Clicks Per Session During Search Runs', xaxis_title='Clicks per session', yaxis_title='Number of sessions')
pio.write_html(fig4, file='2021-06-01-Understanding-Search-Functionality-fig4.html', auto_open=True)
q = """
SELECT
  runs,
  ROUND(AVG(clicks), 2) AS avg_clicks
FROM (
      SELECT
        strftime('%Y-%m-%d %H:%M:%S', session_start) AS session_start,
        session,
        user_id,
        COUNT(CASE WHEN event_name = 'search_autocomplete' THEN user_id ELSE NULL END) AS autocompletes,
        COUNT(CASE WHEN event_name = 'search_run' THEN user_id ELSE NULL END) AS runs,
        COUNT(CASE WHEN event_name LIKE 'search_click_result_%' THEN user_id ELSE NULL END) AS clicks
      FROM (
            SELECT
              e.*,
              session,
              session_start
            FROM yammer_events e
            LEFT JOIN (
                        SELECT
                          user_id,
                          session,
                          MIN(occurred_at) AS session_start,
                          MAX(occurred_at) AS session_end
                        FROM (
                              SELECT
                                intervals.*,
                                CASE WHEN prev_event >= 10.0 THEN id
                                     WHEN prev_event IS NULL THEN id
                                     ELSE LAG(id, 1) OVER (PARTITION BY user_id ORDER BY occurred_at)
                                     END AS session
                              FROM (
                                    SELECT
                                      user_id,
                                      occurred_at,
                                      event_name,
                                      event_type,
                                      ROUND((JULIANDAY(occurred_at) - JULIANDAY(LAG(occurred_at, 1) OVER (PARTITION BY user_id ORDER BY occurred_at))) * 1440) AS prev_event,
                                      ROUND((JULIANDAY(LEAD(occurred_at, 1) OVER (PARTITION BY user_id ORDER BY occurred_at)) - JULIANDAY(occurred_at)) * 1440) AS next_event,
                                      ROW_NUMBER() OVER () AS id
                                    FROM yammer_events
                                    WHERE event_type = 'engagement'
                              ) intervals
                              WHERE prev_event >= 10.0
                                 OR next_event >= 10.0
                                 OR prev_event IS NULL
                                 OR next_event IS NULL
                        ) bounds
                        GROUP BY 1, 2
            ) sessions
            ON e.user_id = sessions.user_id
            AND e.occurred_at >= sessions.session_start
            AND e.occurred_at <= sessions.session_end
            WHERE e.event_type = 'engagement'
      ) events
      GROUP BY 1, 2, 3
) counts
WHERE runs > 0
GROUP BY 1
ORDER BY 1
"""
avg_clicks = pandasql.sqldf(q, globals())
runs = avg_clicks['runs']
avg_clicks = avg_clicks['avg_clicks']
fig5 = go.Figure()
fig5.add_trace(go.Bar(x=runs, y=avg_clicks, text=avg_clicks, textposition='auto', name='avg clicks'))
fig5.update_layout(title='Average Clicks Per Search', xaxis_title='Searches per session', yaxis_title='Average clicks per session')
pio.write_html(fig5, file='2021-06-01-Understanding-Search-Functionality-fig5.html', auto_open=True)

Clicks by Search Result

q = """
SELECT 
  r.*
FROM (
      SELECT
        CASE event_name
          WHEN 'search_click_result_1' THEN 1
          WHEN 'search_click_result_2' THEN 2
          WHEN 'search_click_result_3' THEN 3
          WHEN 'search_click_result_4' THEN 4
          WHEN 'search_click_result_5' THEN 5
          WHEN 'search_click_result_6' THEN 6
          WHEN 'search_click_result_7' THEN 7
          WHEN 'search_click_result_8' THEN 8
          WHEN 'search_click_result_9' THEN 9
          WHEN 'search_click_result_10' THEN 10
          END AS result,
        COUNT(user_id) AS clicks
      FROM yammer_events
      GROUP BY 1
) r
WHERE r.result IS NOT NULL
"""
results = pandasql.sqldf(q, globals())
result_num = results['result']
num_clicks = results['clicks']
fig6 = go.Figure()
fig6.add_trace(go.Bar(x=result_num, y=num_clicks, text=num_clicks, textposition='auto', name='Clicks per result'))
fig6.update_layout(title='Clicks by Result Order', xaxis_title='Search result order', yaxis_title='Number of clicks')
pio.write_html(fig6, file='2021-06-01-Understanding-Search-Functionality-fig6.html', auto_open=True)

Search Retention

q = """
WITH intervals AS (
SELECT
  user_id,
  occurred_at,
  event_name,
  event_type,
  ROUND((JULIANDAY(occurred_at) - JULIANDAY(LAG(occurred_at, 1) OVER (PARTITION BY user_id ORDER BY occurred_at))) * 1440) AS prev_event,
  ROUND((JULIANDAY(LEAD(occurred_at, 1) OVER (PARTITION BY user_id ORDER BY occurred_at)) - JULIANDAY(occurred_at)) * 1440) AS next_event,
  ROW_NUMBER() OVER () AS id
FROM yammer_events
WHERE event_type = 'engagement'
),
bounds AS (
SELECT
  user_id,
  occurred_at,
  CASE WHEN prev_event >= 10.0 THEN id
       WHEN prev_event IS NULL THEN id
       ELSE LAG(id, 1) OVER (PARTITION BY user_id ORDER BY occurred_at)
      END AS session
FROM intervals
WHERE prev_event >= 10.0
  OR next_event >= 10.0
  OR prev_event IS NULL
  OR next_event IS NULL
),
sessions AS (
SELECT
  user_id,
  session,
  MIN(occurred_at) AS session_start,
  MAX(occurred_at) AS session_end
FROM bounds
GROUP BY 1, 2
),
first AS (
SELECT
  user_id,
  MIN(occurred_at) AS first_run
FROM yammer_events
WHERE event_name = 'search_run'
GROUP BY 1
),
events AS (
SELECT
  e.*,
  sessions.session,
  sessions.session_start,
  first.first_run
FROM yammer_events e
JOIN first
ON e.user_id = first.user_id
LEFT JOIN sessions
ON e.user_id = sessions.user_id
AND e.occurred_at >= sessions.session_start
AND e.occurred_at <= sessions.session_end
AND sessions.session_start <= datetime(first.first_run, '1 MONTH')
WHERE e.event_type = 'engagement'
),
counts AS (
SELECT
  strftime('%Y-%m-%d %H:%M:%S', session_start) AS session_start,
  session,
  user_id,
  first_run,
  COUNT(CASE WHEN event_name = 'search_run' THEN user_id ELSE NULL END) AS runs
FROM events
GROUP BY 1, 2, 3, 4
),
runs AS (
SELECT
  user_id,
  COUNT(*) AS searches
FROM counts
WHERE runs > 0
GROUP BY 1
ORDER BY 1
)
SELECT
  searches,
  COUNT(*) AS users
FROM runs
GROUP BY 1
ORDER BY 1
"""
run_retention = pandasql.sqldf(q, globals())
searches = run_retention['searches']
users = run_retention['users']
fig7 = go.Figure()
fig7.add_trace(go.Bar(x=searches, y=users, text=users, textposition='auto', name='sessions with search runs'))
fig7.update_layout(title="Sessions with Search Runs Month After Users' First Search", xaxis_title='Number of sessions with search runs', yaxis_title='Number of users')
pio.write_html(fig7, file='2021-06-01-Understanding-Search-Functionality-fig7.html', auto_open=True)

Conclusions:

4. Autocomplete Clickthroughs

q = """
WITH intervals AS (
SELECT
  user_id,
  occurred_at,
  event_type,
  event_name,
  ROUND((JULIANDAY(occurred_at) - JULIANDAY(LAG(occurred_at, 1) OVER (PARTITION BY user_id ORDER BY occurred_at))) * 1440) AS prev_event,
  ROUND((JULIANDAY(LEAD(occurred_at, 1) OVER (PARTITION BY user_id ORDER BY occurred_at)) - JULIANDAY(occurred_at)) * 1440) AS next_event,
  ROW_NUMBER() OVER () AS id
FROM yammer_events
WHERE event_type = 'engagement'
),
bounds AS (
SELECT
  user_id,
  occurred_at,
  CASE WHEN prev_event >= 10.0 THEN id
       WHEN next_event IS NULL THEN id
       ELSE LAG(id, 1) OVER (PARTITION BY user_id ORDER BY occurred_at)
       END AS session
FROM intervals
WHERE prev_event >= 10.0
   OR next_event >= 10.0
   OR prev_event IS NULL
   OR next_event IS NULL
),
sessions AS (
SELECT
  user_id,
  session,
  MIN(occurred_at) AS session_start,
  MAX(occurred_at) AS session_end
FROM bounds
GROUP BY 1, 2
),
first AS (
SELECT
  user_id,
  MIN(occurred_at) AS first_autocomplete
FROM yammer_events
WHERE event_name = 'search_autocomplete'
GROUP BY 1
),
events AS (
SELECT
  e.*,
  sessions.session,
  sessions.session_start,
  first.first_autocomplete
FROM yammer_events e
JOIN first
ON e.user_id = first.user_id
LEFT JOIN sessions
ON e.user_id = sessions.user_id
AND e.occurred_at >= sessions.session_start
AND e.occurred_at <= sessions.session_end
AND sessions.session_start <= datetime(first.first_autocomplete, '1 MONTH')
WHERE e.event_type = 'engagement'
),
counts AS (
SELECT
  strftime('%Y-%m-%d %H:%M:%S', session_start) AS session_start,
  session,
  user_id,
  first_autocomplete,
  COUNT(CASE WHEN event_name = 'search_autocomplete' THEN user_id ELSE NULL END) AS autocompletes
FROM events
GROUP BY 1, 2, 3, 4
),
searches AS (
SELECT
  user_id,
  COUNT(*) AS searches
FROM counts
WHERE autocompletes > 0
GROUP BY 1
)
SELECT
  searches,
  COUNT(*) AS users
FROM searches
GROUP BY 1
ORDER BY 1
"""
autocompletes = pandasql.sqldf(q, globals())
searches = autocompletes['searches']
users = autocompletes['users']
fig8 = go.Figure()
fig8.add_trace(go.Bar(x=searches, y=users, text=users, textposition='auto', name='sessions with autocompletes'))
fig8.update_layout(title="Autocomplete Sessions Month After Users' First Search", xaxis_title='Number of sessions with autocompletes', yaxis_title='Number of users')
pio.write_html(fig8, file='2021-06-01-Understanding-Search-Functionality-fig8.html', auto_open=True)

Conclusions:

Summary

After looking at the data more thoroughly, we can safely conclude that Yammer’s autocomplete feature is performing reasonably well. From our analysis, we determine that the best place to start with improvements is our search results page, which appears to be performing poorly. Users seem to have very low click-through rates when running full searches, indicating that they are not finding the result they are looking for. Since Yammer has a goal metric of providing the most value out of every feature, we recommend that the development team begin the task of improving our search ranking algorithm.

Credit

The data and problem description used in this study was provided by Mode Analytics.

2021-06-01 00:00:00 -0700 - Written by Jonathan Logan Moran


Page design by Ankit Sultana