[ plotly  pandas  SQL  pandasql  yammer  case-studies  hypothesis-testing  data-visualisation  python  ipynb  ]

Investigating a Drop in User Engagement

In this study, we will be using a few common technologies:

Programming

# prerequisites
!pip install pandasql
!pip install plotly
# data tools
import pandas as pd
import pandasql					# for querying pandas dataframes
from pandasql import sqldf

# plotting tools
import plotly.graph_objects as go
import plotly.express as px
import plotly.io as pio				# output Plotly charts to HMTL

Task overview

This notebook features the first case study in a series of three that dives into the Yammer dataset. For our analysis we will be considering Yammer’s core metrics (user engagement, retention and growth) and analyzing product-specific usage metrics (e.g. number of times someone views another user’s profile). For clarification, Yammer is a social network for the workplace whose primary goal is to drive better product and business decisions using data. While this specific dataset is fabricated due to privacy and security reasons, it is similar in structure to Yammer’s actual data. With all that said–let’s begin our study…

Collecting our dataset

emails_path = '../src/yammer_emails.csv'
events_path = '../src/yammer_events.csv'
users_path = '../src/yammer_users.csv'
yammer_emails = pd.read_csv(emails_path, sep=',', header=0)
yammer_emails
user_id occurred_at action user_type
0 0.0 2014-05-06 09:30:00 sent_weekly_digest 1.0
1 0.0 2014-05-13 09:30:00 sent_weekly_digest 1.0
2 0.0 2014-05-20 09:30:00 sent_weekly_digest 1.0
3 0.0 2014-05-27 09:30:00 sent_weekly_digest 1.0
4 0.0 2014-06-03 09:30:00 sent_weekly_digest 1.0
... ... ... ... ...
90384 18814.0 2014-08-31 12:12:26 email_open 3.0
90385 18814.0 2014-08-31 12:12:57 email_clickthrough 3.0
90386 18815.0 2014-08-31 13:39:56 sent_reengagement_email 2.0
90387 18815.0 2014-08-31 13:40:14 email_open 2.0
90388 18815.0 2014-08-31 13:40:47 email_clickthrough 2.0

90389 rows × 4 columns

yammer_events = pd.read_csv(events_path, sep=',', header=0)
yammer_events
user_id occurred_at event_type event_name location device user_type
0 10522.0 2014-05-02 11:02:39 engagement login Japan dell inspiron notebook 3.0
1 10522.0 2014-05-02 11:02:53 engagement home_page Japan dell inspiron notebook 3.0
2 10522.0 2014-05-02 11:03:28 engagement like_message Japan dell inspiron notebook 3.0
3 10522.0 2014-05-02 11:04:09 engagement view_inbox Japan dell inspiron notebook 3.0
4 10522.0 2014-05-02 11:03:16 engagement search_run Japan dell inspiron notebook 3.0
... ... ... ... ... ... ... ...
340827 18815.0 2014-08-31 13:41:46 engagement like_message Ireland dell inspiron notebook 2.0
340828 18815.0 2014-08-31 13:42:11 engagement home_page Ireland dell inspiron notebook 2.0
340829 18815.0 2014-08-31 13:42:43 engagement send_message Ireland dell inspiron notebook 2.0
340830 18815.0 2014-08-31 13:43:07 engagement home_page Ireland dell inspiron notebook 2.0
340831 18815.0 2014-08-31 13:43:42 engagement like_message Ireland dell inspiron notebook 2.0

340832 rows × 7 columns

yammer_users = pd.read_csv(users_path, sep=',', header=0)
yammer_users
user_id created_at company_id language activated_at state
0 0.0 2013-01-01 20:59:39 5737.0 english 2013-01-01 21:01:07 active
1 1.0 2013-01-01 13:07:46 28.0 english NaN pending
2 2.0 2013-01-01 10:59:05 51.0 english NaN pending
3 3.0 2013-01-01 18:40:36 2800.0 german 2013-01-01 18:42:02 active
4 4.0 2013-01-01 14:37:51 5110.0 indian 2013-01-01 14:39:05 active
... ... ... ... ... ... ...
19061 19061.0 2014-08-31 13:21:16 2156.0 chinese 2014-08-31 13:22:50 active
19062 19062.0 2014-08-31 19:21:23 7520.0 spanish NaN pending
19063 19063.0 2014-08-31 07:10:41 72.0 spanish 2014-08-31 07:12:09 active
19064 19064.0 2014-08-31 17:45:18 2.0 english NaN pending
19065 19065.0 2014-08-31 19:29:19 8352.0 italian NaN pending

19066 rows × 6 columns

The problem

It’s Tuesday, September 2, 2014. You’ve just been informed by the head of the Yammer Product team that user engagement is down. Here’s the chart of weekly active users that you’ve been provided:

q = """
SELECT strftime('%Y-%m-%d', occurred_at, 'weekday 0', '-6 days') AS week, COUNT(DISTINCT user_id) AS weekly_active_users
FROM yammer_events
WHERE occurred_at >= '2014-05-01 00:00:00'
AND occurred_at < '2014-09-01 00:00:00'
AND event_type = 'engagement'
GROUP BY 1
ORDER BY 1;
"""
weekly_active_users = pandasql.sqldf(q, globals())
week = weekly_active_users['week']
count_users = weekly_active_users['weekly_active_users']
fig1 = go.Figure()
fig1.add_trace(go.Scatter(x=week, y=count_users, mode='lines', name='num_users'))
fig1.update_layout(title='Weekly Active Users', xaxis_title='Week of', yaxis_title='Number of users')
pio.write_html(fig1, file='figures/fig1.html', auto_open=True)

The above chart shows the number of engaged users each week. Yammer defines engagement as having made some type of server call by interacting with the product (showed in the data as events of type engagement). Any point in this chart can be interpreted as “the number of users who logged at least one engagement event during the week starting on that date”.

Analysis

We are responsible for determining what caused the dip at the end of the chart shown above, and, if appropriate, recommending solutions for the problem.

In this study, we will be considering the following metrics:

1. Engagement by Factors of Time

“Do we see the engagement levels gradually decreasing over time or only a one time sudden decrease? Is the decrease specific to certain days of the week or certain times of the day?”

Daily Engagement Rate

# engagement rate = num of engagments / num of users
q = """
SELECT strftime('%Y-%m-%d', occurred_at) AS day, COUNT(*) * 1.00 / COUNT(DISTINCT user_id) AS engagement_rate
FROM yammer_events
WHERE occurred_at >= '2014-05-01 00:00:00'
AND occurred_at < '2014-09-01 00:00:00'
AND event_type = 'engagement'
GROUP BY 1
ORDER BY 1
"""
daily_engagement = pandasql.sqldf(q, globals())
daily_engagement.head()
day engagement_rate
0 2014-05-01 9.808874
1 2014-05-02 10.765363
2 2014-05-03 8.903448
3 2014-05-04 8.734177
4 2014-05-05 10.050584
day = daily_engagement['day']
engagement_rate = daily_engagement['engagement_rate']
fig2 = go.Figure()
fig2.add_trace(go.Scatter(x=day, y=engagement_rate, mode='lines', name='engagement_rate'))
fig2.update_layout(title='Daily Engagement', xaxis_title='Day', yaxis_title='Engagement rate')
pio.write_html(fig2, file='figures/fig2.html', auto_open=True)

Engagement by Week

q = """
SELECT strftime('%w', occurred_at) AS dow,
       CASE strftime('%w', occurred_at)
       WHEN '0' THEN 'Sun'
       WHEN '1' THEN 'Mon'
       WHEN '2' THEN 'Tues'
       WHEN '3' THEN 'Wed'
       WHEN '4' THEN 'Thurs'
       WHEN '5' THEN 'Fri'
       WHEN '6' THEN 'Sat'
       ELSE NULL END AS day_of_week,
       COUNT(*) * 1.00 / COUNT(DISTINCT user_id) AS active_users
FROM yammer_events
WHERE occurred_at >= '2014-05-01 00:00:00'
AND occurred_at < '2014-09-01 00:00:00'
AND event_type = 'engagement'
GROUP BY 1
ORDER BY 1
"""
dow_engagement = pandasql.sqldf(q, globals())
dow_engagement.head()
dow day_of_week active_users
0 0 Sun 9.633461
1 1 Mon 14.766006
2 2 Tues 16.405758
3 3 Wed 17.235331
4 4 Thurs 17.961189
day = dow_engagement['day_of_week']
engagement_rate = dow_engagement['active_users']
fig3 = go.Figure()
fig3.add_trace(go.Bar(x=day, y=engagement_rate))
fig3.update_layout(title='Engagement by Day of Week', xaxis_title='Day of Week', yaxis_title='Engagement rate')
pio.write_html(fig3, file='figures/fig3.html', auto_open=True)

Engagement by Time of Day

q = """
SELECT strftime('%H', occurred_at) AS hour,
       COUNT(*) * 1.00 / COUNT(DISTINCT user_id) AS engagement_rate
FROM yammer_events
WHERE occurred_at >= '2014-05-01 00:00:00'
AND occurred_at < '2014-9-01 00:00:00'
AND event_type = 'engagement'
GROUP BY 1
ORDER BY 1
"""
time_day = pandasql.sqldf(q, globals())
time_day.head()
hour engagement_rate
0 00 9.787736
1 01 9.410628
2 02 8.578125
3 03 9.088710
4 04 9.169014
hour = time_day['hour']
engagement_rate = time_day['engagement_rate']
fig4 = go.Figure()
fig4.add_trace(go.Bar(x=hour, y=engagement_rate))
fig4.update_layout(title='Engagement by Time of Day', xaxis_title='Hour', yaxis_title='Engagement rate')
pio.write_html(fig4, file='figures/fig4.html', auto_open=True)

Conclusions:

2. New Users (Activation and Growth Rate)

“Since the drop in users were gradual over time and not localized to a particular day of the week or time of day, we should now move on and identify if there is an issue with acquiring new users or activating new users”

q = """
SELECT
  strftime('%Y-%m-%d', created_at) AS day,
  COUNT(*) AS all_users,
  COUNT(CASE WHEN activated_at IS NOT NULL THEN user_id ELSE NULL END) AS activated_users
FROM yammer_users
WHERE created_at >= '2014-05-01 00:00:00'
AND created_at < '2014-09-01 00:00:00'
GROUP BY 1
ORDER BY 1
"""
growth = pandasql.sqldf(q, globals())
day = growth['day']
all_users = growth['all_users']
activated_users = growth['activated_users']
growth.head()
day all_users activated_users
0 2014-05-01 73 34
1 2014-05-02 57 30
2 2014-05-03 19 8
3 2014-05-04 22 9
4 2014-05-05 58 24
fig5 = go.Figure()
fig5.add_trace(go.Scatter(x=day, y=all_users, mode='lines', name='all_users'))
fig5.add_trace(go.Scatter(x=day, y=activated_users, mode='lines', name='activated_users'))
fig5.update_layout(title='Daily signups', xaxis_title='Date', yaxis_title='Number of users')
pio.write_html(fig5, file='figures/fig5.html', auto_open=True)

Conclusions:

3. Engagement by Event Type

By looking at the assumed UX flow, there are four major types of engagement events:

We want to group these together, and see if there is a more pronounced drop in engagement for either groups

q = """
SELECT DISTINCT event_name
FROM yammer_events
"""
events = pandasql.sqldf(q, globals())
events.head(50)
event_name
0 login
1 home_page
2 like_message
3 view_inbox
4 search_run
5 send_message
6 search_autocomplete
7 search_click_result_10
8 create_user
9 enter_email
10 enter_info
11 complete_signup
12 search_click_result_7
13 search_click_result_8
14 search_click_result_1
15 search_click_result_3
16 search_click_result_2
17 search_click_result_5
18 search_click_result_6
19 search_click_result_9
20 search_click_result_4
q = """
SELECT
  strftime('%Y-%m-%d', occurred_at, 'weekday 0', '-6 days') AS week_of,
  COUNT(DISTINCT CASE WHEN event_name IN ('create_user', 'enter_email', 'enter_info', 'complete_signup') THEN user_id ELSE NULL END) AS signup,
  COUNT(DISTINCT CASE WHEN event_name IN ('login', 'home_page') THEN user_id ELSE NULL END) AS login,
  COUNT(DISTINCT CASE WHEN event_name IN ('view_inbox', 'like_message', 'send_message') THEN user_id ELSE NULL END) AS inbox,
  COUNT(DISTINCT CASE WHEN event_name IN ('search_run', 'search_autocomplete', 'search_click_result_1', 'search_click_result_2', 'search_click_result_3', 'search_click_result_4', 'search_click_result_5', 'search_click_result_6', 'search_click_result_7', 'search_click_result_8', 'search_click_result_9', 'search_click_result_10') THEN user_id ELSE NULL END) AS search_and_click
FROM yammer_events
WHERE occurred_at >= '2014-05-01 00:00:00'
AND occurred_at < '2014-09-01 00:00:00'
GROUP BY 1
ORDER BY 1
"""
event_engagement = pandasql.sqldf(q, globals())
week_of = event_engagement['week_of']
signup = event_engagement['signup']
login = event_engagement['login']
inbox = event_engagement['inbox']
search_and_click = event_engagement['search_and_click']
fig6 = go.Figure()
fig6.add_trace(go.Scatter(x=week_of, y=signup, mode='lines', name='signup'))
fig6.add_trace(go.Scatter(x=week_of, y=login, mode='lines', name='login'))
fig6.add_trace(go.Scatter(x=week_of, y=inbox, mode='lines', name='inbox'))
fig6.add_trace(go.Scatter(x=week_of, y=search_and_click, mode='lines', name='search and click'))
fig6.update_layout(title='Engagement by Event Type', xaxis_title='Week of', yaxis_title='Number of users')
pio.write_html(fig6, file='figures/fig6.html', auto_open=True)

Conclusions:

4. Engagement by Region

“If the drop is localized to a particular region then there could be an issue related to that.”

q = """
-- Selects Top 5 countries of all-time
WITH region AS (SELECT location, COUNT(user_id) AS num_users FROM yammer_events WHERE occurred_at >= '2014-05-01 00:00:00' AND occurred_at < '2014-09-01 00:00:00' AND event_type = 'engagement' GROUP BY 1 ORDER BY 2 DESC),
top AS (SELECT region.location, region.num_users, ROW_NUMBER() OVER (ORDER BY num_users DESC) AS rank FROM region)
SELECT strftime('%Y-%m-%d', e.occurred_at, 'weekday 0', '-6 days') AS week_of,
       e.location,
       COUNT(DISTINCT e.user_id) AS num_users
FROM yammer_events e
INNER JOIN top
ON e.location = top.location
WHERE occurred_at >= '2014-05-01 00:00:00' 
AND occurred_at < '2014-09-01 00:00:00' 
AND event_type = 'engagement'
AND top.rank <=5
GROUP BY 1, 2
ORDER BY 1, 3 DESC
"""
top_regions = pandasql.sqldf(q, globals())
top_regions.head(10)
week_of location num_users
0 2014-04-28 United States 203
1 2014-04-28 Japan 47
2 2014-04-28 Germany 42
3 2014-04-28 France 34
4 2014-04-28 United Kingdom 33
5 2014-05-05 United States 294
6 2014-05-05 Japan 76
7 2014-05-05 Germany 72
8 2014-05-05 France 53
9 2014-05-05 United Kingdom 49
fig7 = px.line(top_regions, x="week_of", y="num_users", color="location", title='Weekly Engagement by Region')
#iplot(fig7)
pio.write_html(fig7, file='figures/fig7.html', auto_open=True)

Conclusions:

5. Engagement by Device/Product Type

“It is also good to check if the drop in engagement rate is localized to a type of device. We’d want to compare laptops vs phones vs tablets and then perhaps iOS vs androids, or macs vs PCs.”

Engagement by Device Type

We’ll categorize our users by their device type, specifically:

q = """
SELECT DISTINCT device
FROM yammer_events
"""
devices = pandasql.sqldf(q, globals())
devices.head(50)
device
0 dell inspiron notebook
1 iphone 5
2 iphone 4s
3 windows surface
4 macbook air
5 iphone 5s
6 macbook pro
7 kindle fire
8 ipad mini
9 nexus 7
10 nexus 5
11 samsung galaxy s4
12 lenovo thinkpad
13 samsumg galaxy tablet
14 acer aspire notebook
15 asus chromebook
16 htc one
17 nokia lumia 635
18 samsung galaxy note
19 acer aspire desktop
20 mac mini
21 hp pavilion desktop
22 dell inspiron desktop
23 ipad air
24 amazon fire phone
25 nexus 10
q = """
SELECT 
  strftime('%Y-%m-%d', occurred_at, 'weekday 0', '-6 days') AS week_of,
  COUNT(DISTINCT user_id) AS weekly_active_users,
  COUNT(DISTINCT CASE WHEN device IN ('dell inspiron notebook', 'macbook air', 'macbook pro', 'lenovo thinkpad', 'acer aspire notebook', 'asus chromebook', 'acer aspire desktop', 'mac mini', 'hp pavillion desktop', 'dell inspiron desktop') THEN user_id ELSE NULL END) AS desktop,
  COUNT(DISTINCT CASE WHEN device IN ('iphone 5', 'iphone 4s', 'iphone 5s', 'nexus 5', 'samsung galaxy s4', 'htc one', 'nokia lumia 635', 'samsung galaxy note', 'amazon fire phone') THEN user_id ELSE NULL END) AS phone,
  COUNT(DISTINCT CASE WHEN device IN ('windows surface', 'kindle fire', 'ipad mini', 'nexus 7', 'samsung galaxy tablet', 'ipad air', 'nexus 10') THEN user_id ELSE NULL END) AS tablet
FROM yammer_events
WHERE occurred_at >= '2014-05-01 00:00:00'
AND occurred_at < '2014-09-01 00:00:00'
AND event_type = 'engagement'
GROUP BY 1
ORDER BY 1
"""
device_engagement = pandasql.sqldf(q, globals())
week_of = device_engagement['week_of']
weekly_active_users = device_engagement['weekly_active_users']
desktop = device_engagement['desktop']
phone = device_engagement['phone']
tablet = device_engagement['tablet']
fig8 = go.Figure()
fig8.add_trace(go.Scatter(x=week_of, y=weekly_active_users, mode='lines', name='weekly active users'))
fig8.add_trace(go.Scatter(x=week_of, y=desktop, mode='lines', name='desktop'))
fig8.add_trace(go.Scatter(x=week_of, y=phone, mode='lines', name='phone'))
fig8.add_trace(go.Scatter(x=week_of, y=tablet, mode='lines', name='tablet'))
fig8.update_layout(title='Engagement by Device Type', xaxis_title='Week of', yaxis_title='Number of users')
pio.write_html(fig8, file='figures/fig8.html', auto_open=True)

Engagement by Product Type

From the above plot, we can see that our engagement decreased most significantly on the mobile (“phone”) Yammer platform.

We’ll further investigate by looking at the differences in engagement across two mobile OS types:

q = """
SELECT
  strftime('%Y-%m-%d', occurred_at, 'weekday 0', '-6 days') AS week_of,
  COUNT(DISTINCT CASE WHEN device IN ('iphone 4s', 'iphone 5', 'iphone 5s') THEN user_id ELSE NULL END) AS "iOS",
  COUNT(DISTINCT CASE WHEN device IN ('nexus 5', 'samsung galaxy s4', 'htc one', 'nokia lumia 635', 'samsung galaxy note', 'amazon fire phone') THEN user_id ELSE NULL END) AS "Other"
FROM yammer_events
WHERE occurred_at >= '2014-05-01 00:00:00' 
AND occurred_at < '2014-09-01 00:00:00'
AND event_type = 'engagement'
GROUP BY 1
ORDER BY 1
"""
os_engagement = pandasql.sqldf(q, globals())
week_of = os_engagement['week_of']
iOS = os_engagement['iOS']
Other = os_engagement['Other']
fig9 = go.Figure()
fig9.add_trace(go.Scatter(x=week_of, y=iOS, mode='lines', name='iOS'))
fig9.add_trace(go.Scatter(x=week_of, y=Other, mode='lines', name='Other'))
fig9.update_layout(title='Engagement by Phone OS', xaxis_title='Week of', yaxis_title='Number of users')
pio.write_html(fig9, file='figures/fig9.html', auto_open=True)

Conclusions:

5. Cohort Analysis

“Since activation rate is normal, we know the issue is not associated with growth (acquiring new users). The drop will likely be due to disengagement from existing users, so we should do a cohort analysis allowing us to compare users at different life stages with Yammer.”

q = """
SELECT 
  strftime('%Y-%m-%d', ue.occurred_at, 'weekday 0', '-6 days') AS week_of,
  COUNT(DISTINCT CASE WHEN ue.user_age < 7 THEN ue.user_id ELSE NULL END) AS '<1 week old',
  COUNT(DISTINCT CASE WHEN ue.user_age >= 7 AND ue.user_age < 14 THEN ue.user_id ELSE NULL END) AS '1 week old',
  COUNT(DISTINCT CASE WHEN ue.user_age >= 14 AND ue.user_age < 21 THEN ue.user_id ELSE NULL END) AS '2 weeks old',
  COUNT(DISTINCT CASE WHEN ue.user_age >= 21 AND ue.user_age < 28 THEN ue.user_id ELSE NULL END) AS '3 weeks old',
  COUNT(DISTINCT CASE WHEN ue.user_age >= 28 AND ue.user_age < 35 THEN ue.user_id ELSE NULL END) AS '4 weeks old',
  COUNT(DISTINCT CASE WHEN ue.user_age >= 35 AND ue.user_age < 42 THEN ue.user_id ELSE NULL END) AS '5 weeks old',
  COUNT(DISTINCT CASE WHEN ue.user_age >= 42 AND ue.user_age < 49 THEN ue.user_id ELSE NULL END) AS '6 weeks old',
  COUNT(DISTINCT CASE WHEN ue.user_age >= 49 AND ue.user_age < 56 THEN ue.user_id ELSE NULL END) AS '7 weeks old',
  COUNT(DISTINCT CASE WHEN ue.user_age >= 56 AND ue.user_age < 63 THEN ue.user_id ELSE NULL END) AS '8 weeks old',
  COUNT(DISTINCT CASE WHEN ue.user_age >= 63 AND ue.user_age < 70 THEN ue.user_id ELSE NULL END) AS '9 weeks old',
  COUNT(DISTINCT CASE WHEN ue.user_age >= 70 THEN ue.user_id ELSE NULL END) AS '10+ weeks old'
FROM (SELECT e.occurred_at, u.user_id, CAST((julianday('2014-09-01') - julianday(u.activated_at)) AS INT) AS user_age
      FROM yammer_users u
      JOIN yammer_events e
      ON e.user_id = u.user_id
      WHERE e.occurred_at >= '2014-05-01 00:00:00'
      AND e.occurred_at < '2014-09-01 00:00:00'
      AND e.event_type = 'engagement'
      AND u.activated_at IS NOT NULL) ue
GROUP BY 1
ORDER BY 1
"""
age_cohort = pandasql.sqldf(q, globals())
age_cohort.head(50)
week_of <1 week old 1 week old 2 weeks old 3 weeks old 4 weeks old 5 weeks old 6 weeks old 7 weeks old 8 weeks old 9 weeks old 10+ weeks old
0 2014-04-28 0 0 0 0 0 0 0 0 0 0 701
1 2014-05-05 0 0 0 0 0 0 0 0 0 0 1054
2 2014-05-12 0 0 0 0 0 0 0 0 0 0 1094
3 2014-05-19 0 0 0 0 0 0 0 0 0 0 1147
4 2014-05-26 0 0 0 0 0 0 0 0 0 0 1113
5 2014-06-02 0 0 0 0 0 0 0 0 0 0 1173
6 2014-06-09 0 0 0 0 0 0 0 0 0 0 1219
7 2014-06-16 0 0 0 0 0 0 0 0 0 0 1263
8 2014-06-23 0 0 0 0 0 0 0 0 0 210 1039
9 2014-06-30 0 0 0 0 0 0 0 0 199 151 921
10 2014-07-07 0 0 0 0 0 0 0 223 130 100 902
11 2014-07-14 0 0 0 0 0 0 215 152 82 62 834
12 2014-07-21 0 0 0 0 0 228 144 95 60 44 792
13 2014-07-28 0 0 0 0 234 156 91 83 43 30 806
14 2014-08-04 0 0 0 189 154 82 52 52 34 24 679
15 2014-08-11 0 0 250 126 94 59 33 39 33 19 562
16 2014-08-18 0 259 163 69 64 40 19 26 26 15 522
17 2014-08-25 266 173 82 48 47 31 20 23 14 15 475
week_of = age_cohort['week_of']
y0 = age_cohort['<1 week old']
y1 = age_cohort['1 week old']
y2 = age_cohort['2 weeks old']
y3 = age_cohort['3 weeks old']
y4 = age_cohort['4 weeks old']
y5 = age_cohort['5 weeks old']
y6 = age_cohort['6 weeks old']
y7 = age_cohort['7 weeks old']
y8 = age_cohort['8 weeks old']
y9 = age_cohort['9 weeks old']
y10 = age_cohort['10+ weeks old']
fig10 = go.Figure()
fig10.add_trace(go.Scatter(x=week_of, y=y0, mode='lines', name='<1 week old'))
fig10.add_trace(go.Scatter(x=week_of, y=y1, mode='lines', name='1 week old'))
fig10.add_trace(go.Scatter(x=week_of, y=y2, mode='lines', name='2 weeks old'))
fig10.add_trace(go.Scatter(x=week_of, y=y3, mode='lines', name='3 weeks old'))
fig10.add_trace(go.Scatter(x=week_of, y=y4, mode='lines', name='4 weeks old'))
fig10.add_trace(go.Scatter(x=week_of, y=y5, mode='lines', name='5 weeks old'))
fig10.add_trace(go.Scatter(x=week_of, y=y6, mode='lines', name='6 weeks old'))
fig10.add_trace(go.Scatter(x=week_of, y=y7, mode='lines', name='7 weeks old'))
fig10.add_trace(go.Scatter(x=week_of, y=y8, mode='lines', name='8 weeks old'))
fig10.add_trace(go.Scatter(x=week_of, y=y9, mode='lines', name='9 weeks old'))
fig10.add_trace(go.Scatter(x=week_of, y=y10, mode='lines', name='10+ weeks old'))
fig10.update_layout(title='Engagement by User Age Cohort', xaxis_title='Week of', yaxis_title='Number of Users')
pio.write_html(fig10, file='figures/fig10.html', auto_open=True)

Conclusions:

6. Email Open and CTR

Yammer sends users two types of email to increase engagement:

We want to see if something has gone wrong with these emails by looking into their open and click-through rates.

q = """
SELECT DISTINCT action
FROM yammer_emails
"""
emails = pandasql.sqldf(q, globals())
emails.head()
action
0 sent_weekly_digest
1 email_open
2 email_clickthrough
3 sent_reengagement_email

Note: Some preliminary analysis done on email_open rates suggest that >97% of all engagement emails sent are opened within the first hour. Furthermore, less than 2% of engagment emails are opened after the 24-hour mark. Because of this, we’ll be looking at email_open and email_clickthrough rates that occur within a 24-hour window.

q = """
SELECT 
  strftime('%Y-%m-%d', e.occurred_at, 'weekday 0', '-6 days') AS week_of,
  COUNT(CASE WHEN e.action = 'sent_weekly_digest' THEN e.user_id ELSE NULL END) AS emails_sent,
  COUNT(CASE WHEN e.action = 'sent_weekly_digest' THEN e1.user_id ELSE NULL END) AS emails_opened,
  COUNT(CASE WHEN e.action = 'sent_weekly_digest' THEN e2.user_id ELSE NULL END) AS emails_ct,
  COUNT(CASE WHEN e.action = 'sent_reengagement_email' THEN e.user_id ELSE NULL END) AS reengagements_sent,
  COUNT(CASE WHEN e.action = 'sent_reengagement_email' THEN e1.user_id ELSE NULL END) AS reengagements_opened,
  COUNT(CASE WHEN e.action = 'sent_reengagement_email' THEN e2.user_id ELSE NULL END) AS reengagements_ct
FROM yammer_emails e
LEFT JOIN yammer_emails e1
  ON e.user_id = e1.user_id
    AND e1.action = 'email_open'
    AND e1.occurred_at >= e.occurred_at
    AND e1.occurred_at < datetime(e.occurred_at, '+24 hours')
LEFT JOIN yammer_emails e2
  ON e.user_id = e2.user_id
    AND e2.action = 'email_clickthrough'
    AND e2.occurred_at >= e.occurred_at
    AND e2.occurred_at < datetime(e.occurred_at, '+24 hours')
WHERE e.occurred_at >= '2014-05-01 00:00:00'
AND e.occurred_at < '2014-09-01 00:00:00'
GROUP BY 1
ORDER BY 1
"""
email_counts = pandasql.sqldf(q, globals())
email_counts.head()
week_of emails_sent emails_opened emails_ct reengagements_sent reengagements_opened reengagements_ct
0 2014-04-28 908 246 105 98 86 82
1 2014-05-05 2602 776 306 164 143 128
2 2014-05-12 2665 809 332 175 162 147
3 2014-05-19 2733 834 348 179 161 150
4 2014-05-26 2822 868 314 179 158 139
week_of = email_counts['week_of']
emails_sent = email_counts['emails_sent']
emails_opened = email_counts['emails_opened']
emails_ct = email_counts['emails_ct']
reengagements_sent = email_counts['reengagements_sent']
reengagements_opened = email_counts['reengagements_opened']
reengagements_ct = email_counts['reengagements_ct']
fig11 = go.Figure()
fig11.add_trace(go.Scatter(x=week_of, y=emails_sent, mode='lines', name='emails sent'))
fig11.add_trace(go.Scatter(x=week_of, y=emails_opened, mode='lines', name='emails opened'))
fig11.add_trace(go.Scatter(x=week_of, y=emails_ct, mode='lines', name='emails clicked-through'))
fig11.add_trace(go.Scatter(x=week_of, y=reengagements_sent, mode='lines', name='reengagements sent'))
fig11.add_trace(go.Scatter(x=week_of, y=reengagements_opened, mode='lines', name='reengagements opened'))
fig11.add_trace(go.Scatter(x=week_of, y=reengagements_ct, mode='lines', name='reengagements clicked-through'))
fig11.update_layout(title='Email Open and Click-Through Rates', xaxis_title='Week of', yaxis_title='Number of emails')
pio.write_html(fig11, file='figures/fig11.html', auto_open=True)

Conclusions:

Summary

From our analysis, we find that our mobile users are becoming less engaged with the platform. We also see email click-through rates decreasing, which tells us that our email reengagement and retention campaigns aren’t going so well. Our next step is to reach out to the Engineering and Marketing teams to check for potential issues (e.g. product releases, marketing campaigns) that have contributed to the decline in engageent. We’ve successfully ruled-out major bugs (e.g., platform outages, regional differences, signup funnel issues) and have a good understanding of what steps to take next (A/B testing, local segmentation, feature roll-backs, improved email campaigns).

Credit

The data and problem description was provided to us by Mode Analytics. The approach was based off the work of Jodi Zhang. A lot of time and energy was spent by all investigating each query and analysing goal and value metrics.

2021-05-26 00:00:00 -0700 - Written by Jonathan Logan Moran


Page design by Ankit Sultana