What's driving the drop in weekly active users?
add visualizations
How does the weekly active user rate vary by signup source (Referral vs other sources), and are certain acquisition channels showing steeper declines in user engagement?
Is there a significant difference in weekly active user rates between platforms (iOS vs others), and could platform-specific issues be contributing to the overall drop?
What is the relationship between onboarding completion rates and weekly active users - are users who didn't complete onboarding more likely to become inactive, and has this pattern changed recently?
How does load time performance correlate with weekly active user status, and are users experiencing slower load times (higher load_time_seconds) more likely to become inactive?
What is the conversion funnel from signup → onboarding completion → weekly active user, and at which stage are we seeing the biggest drop-offs that could explain the decline in weekly active users?
This analysis investigated the factors driving the decline in weekly active users by examining user engagement patterns across 1,000 users. The study focused on five key areas: signup source performance, platform differences, onboarding impact, load time effects, and conversion funnel analysis.
The most significant factor affecting weekly active users is onboarding completion:
- Users who completed onboarding: 74.4% weekly active rate
- Users who didn't complete onboarding: 13.2% weekly active rate
- Strong positive correlation (r=0.5250, p<0.001) between onboarding and user activity
- 22.8% of users fail to complete onboarding, representing the largest conversion drop-off
Organic traffic significantly outperforms paid channels:
- Organic users: 69.5% weekly active rate
- Referral users: 58.9% weekly active rate
- Ad Campaign users: 45.0% weekly active rate
- Statistical significance confirmed (p<0.001), indicating genuine performance differences
iOS users show higher engagement than Android users:
- iOS users: 64.4% weekly active rate
- Android users: 57.8% weekly active rate
- Moderate but statistically significant difference (p=0.043)
Slower load times correlate with reduced user activity:
- Active users average load time: 2.15 seconds
- Inactive users average load time: 2.54 seconds
- Negative correlation (r=-0.2425, p<0.001) confirms performance impact on engagement
The user journey shows clear drop-off points:
- Total signups → Onboarding completion: 77.2% (228 users lost)
- Onboarding completion → Weekly active: 78.2% (168 users lost)
- Overall signup → Weekly active conversion: 60.4%
Primary Issue: Onboarding Barrier
The analysis clearly identifies onboarding completion as the strongest predictor of user activity. With nearly 23% of users failing to complete onboarding and only 13% of non-completers becoming active, this represents the most critical intervention point.
Secondary Issues:
1. Acquisition Quality: Ad campaign users show 24.5 percentage points lower engagement than organic users
2. Platform Experience: Android users lag iOS users by 6.6 percentage points
3. Performance Issues: Load time differences of 0.39 seconds correlate with activity differences
Implement onboarding abandonment recovery campaigns
Optimize Android Experience: Address the 6.6 percentage point gap
Increase investment in organic and referral channels
Performance Optimization:
Based on the statistical relationships identified:
- Improving onboarding completion by 10 percentage points could increase overall WAU rate from 60.4% to approximately 67-70%
- Addressing platform gaps could add 2-4 percentage points to overall engagement
- Combined interventions could potentially achieve 70%+ weekly active user rates
This analysis provides a clear roadmap for addressing the weekly active user decline, with onboarding optimization offering the highest potential return on investment.
import pandas as pd
import numpy as np
import plotly.graph_objects as go
import plotly.express as px
from plotly.subplots import make_subplots
from scipy.stats import chi2_contingency, pearsonr
import warnings
warnings.filterwarnings('ignore')
# Initialize list to store all Plotly figures
plotly_figs = []
print("=== USER ENGAGEMENT ANALYSIS PIPELINE ===\n")
# ===== PREPROCESSING STAGE =====
print("1. DATA PREPROCESSING")
print("=" * 50)
# Create a copy of the original DataFrame
cleaned_df = df.copy()
# Remove the unnamed index column as it's not needed for analysis
if 'Unnamed: 0' in cleaned_df.columns:
cleaned_df = cleaned_df.drop('Unnamed: 0', axis=1)
# Handle missing values in load_time_seconds
if cleaned_df['load_time_seconds'].isnull().any():
missing_count = cleaned_df['load_time_seconds'].isnull().sum()
cleaned_df['load_time_seconds'] = cleaned_df['load_time_seconds'].fillna(cleaned_df['load_time_seconds'].median())
print(f"Filled {missing_count} missing values in load_time_seconds with median")
# Handle missing values in categorical columns
for col in ['signup_source', 'platform']:
if cleaned_df[col].isnull().any():
mode_value = cleaned_df[col].mode()[0] if not cleaned_df[col].mode().empty else 'Unknown'
cleaned_df[col] = cleaned_df[col].fillna(mode_value)
print(f"Filled missing values in {col} with {mode_value}")
# Ensure proper data types
cleaned_df['signup_source'] = cleaned_df['signup_source'].astype('category')
cleaned_df['platform'] = cleaned_df['platform'].astype('category')
cleaned_df['onboarding_completed'] = cleaned_df['onboarding_completed'].astype('bool')
cleaned_df['is_weekly_active_user'] = cleaned_df['is_weekly_active_user'].astype('bool')
print(f"Dataset shape after cleaning: {cleaned_df.shape}")
print(f"Missing values: {cleaned_df.isnull().sum().sum()}")
print()
# ===== STATISTICAL ANALYSIS STAGE =====
print("2. STATISTICAL ANALYSIS")
print("=" * 50)
# Create numeric versions for analysis
cleaned_df_numeric = cleaned_df.copy()
cleaned_df_numeric['onboarding_completed_num'] = cleaned_df_numeric['onboarding_completed'].astype(int)
cleaned_df_numeric['is_weekly_active_user_num'] = cleaned_df_numeric['is_weekly_active_user'].astype(int)
# 2.1 SEGMENTATION ANALYSIS
print("2.1 Segmentation Analysis")
print("-" * 30)
# Weekly active user rates by signup source
signup_crosstab = pd.crosstab(cleaned_df['signup_source'], cleaned_df['is_weekly_active_user'])
signup_rates = pd.crosstab(cleaned_df['signup_source'], cleaned_df['is_weekly_active_user'], normalize='index')
print("Weekly Active User Rates by Signup Source:")
for source in signup_rates.index:
active_rate = signup_rates.loc[source, True] if True in signup_rates.columns else 0
total_users = signup_crosstab.loc[source].sum()
print(f" {source}: {active_rate:.1%} ({signup_crosstab.loc[source, True] if True in signup_crosstab.columns else 0}/{total_users})")
# Chi-square test for signup source
chi2_signup, p_signup, dof_signup, expected_signup = chi2_contingency(signup_crosstab)
print(f"Chi-square test: ={chi2_signup:.4f}, p={p_signup:.4f}")
print()
# Weekly active user rates by platform
platform_crosstab = pd.crosstab(cleaned_df['platform'], cleaned_df['is_weekly_active_user'])
platform_rates = pd.crosstab(cleaned_df['platform'], cleaned_df['is_weekly_active_user'], normalize='index')
print("Weekly Active User Rates by Platform:")
for platform in platform_rates.index:
active_rate = platform_rates.loc[platform, True] if True in platform_rates.columns else 0
total_users = platform_crosstab.loc[platform].sum()
print(f" {platform}: {active_rate:.1%} ({platform_crosstab.loc[platform, True] if True in platform_crosstab.columns else 0}/{total_users})")
# Chi-square test for platform
chi2_platform, p_platform, dof_platform, expected_platform = chi2_contingency(platform_crosstab)
print(f"Chi-square test: ={chi2_platform:.4f}, p={p_platform:.4f}")
print()
# Store segmentation analysis results
segmentation_analysis = {
'signup_source_rates': signup_rates.to_dict(),
'platform_rates': platform_rates.to_dict(),
'signup_chi2_test': {'statistic': chi2_signup, 'p_value': p_signup},
'platform_chi2_test': {'statistic': chi2_platform, 'p_value': p_platform}
}
# 2.2 CORRELATION ANALYSIS
print("2.2 Correlation Analysis")
print("-" * 30)
# Correlation between onboarding completion and weekly active users
onboarding_corr, onboarding_p = pearsonr(cleaned_df_numeric['onboarding_completed_num'],
cleaned_df_numeric['is_weekly_active_user_num'])
print(f"Onboarding completion vs Weekly active users: r={onboarding_corr:.4f}, p={onboarding_p:.4f}")
# Correlation between load time and weekly active users
load_time_corr, load_time_p = pearsonr(cleaned_df_numeric['load_time_seconds'],
cleaned_df_numeric['is_weekly_active_user_num'])
print(f"Load time vs Weekly active users: r={load_time_corr:.4f}, p={load_time_p:.4f}")
# Load time statistics by user activity
load_time_stats = cleaned_df.groupby('is_weekly_active_user')['load_time_seconds'].agg(['mean', 'std', 'count'])
print("\nLoad Time Statistics by User Activity:")
print(load_time_stats)
print()
# Store correlation analysis results
correlation_analysis = {
'onboarding_correlation': {'coefficient': onboarding_corr, 'p_value': onboarding_p},
'load_time_correlation': {'coefficient': load_time_corr, 'p_value': load_time_p},
'load_time_stats': load_time_stats.to_dict()
}
# 2.3 FUNNEL ANALYSIS
print("2.3 Conversion Funnel Analysis")
print("-" * 30)
total_users = len(cleaned_df)
onboarding_completed = cleaned_df['onboarding_completed'].sum()
weekly_active_users = cleaned_df['is_weekly_active_user'].sum()
# Calculate conversion rates
signup_to_onboarding_rate = onboarding_completed / total_users
onboarding_to_active_rate = weekly_active_users / onboarding_completed if onboarding_completed > 0 else 0
overall_conversion_rate = weekly_active_users / total_users
print(f"Conversion Funnel:")
print(f" Total Signups: {total_users}")
print(f" Onboarding Completed: {onboarding_completed} ({signup_to_onboarding_rate:.1%})")
print(f" Weekly Active Users: {weekly_active_users} ({overall_conversion_rate:.1%})")
print()
print(f"Stage Conversion Rates:")
print(f" Signup -> Onboarding: {signup_to_onboarding_rate:.1%}")
print(f" Onboarding -> Active: {onboarding_to_active_rate:.1%}")
print(f" Overall Signup -> Active: {overall_conversion_rate:.1%}")
# Weekly active rates by onboarding completion
completed_onboarding = cleaned_df[cleaned_df['onboarding_completed'] == True]
not_completed_onboarding = cleaned_df[cleaned_df['onboarding_completed'] == False]
active_rate_completed = completed_onboarding['is_weekly_active_user'].mean()
active_rate_not_completed = not_completed_onboarding['is_weekly_active_user'].mean()
print(f"\nWeekly Active Rates:")
print(f" Completed onboarding: {active_rate_completed:.1%}")
print(f" Did not complete onboarding: {active_rate_not_completed:.1%}")
print()
# Store funnel analysis results
funnel_analysis = {
'total_users': total_users,
'onboarding_completed': onboarding_completed,
'weekly_active_users': weekly_active_users,
'conversion_rates': {
'signup_to_onboarding': signup_to_onboarding_rate,
'onboarding_to_active': onboarding_to_active_rate,
'overall_conversion': overall_conversion_rate
},
'active_rates_by_onboarding': {
'completed_onboarding': active_rate_completed,
'not_completed_onboarding': active_rate_not_completed
}
}
# ===== VISUALIZATION STAGE =====
print("3. CREATING VISUALIZATIONS")
print("=" * 50)
# 3.1 Weekly Active User Rates by Signup Source
fig1 = go.Figure()
signup_sources = list(signup_rates.index)
wau_rates_signup = [signup_rates.loc[source, True] if True in signup_rates.columns else 0 for source in signup_sources]
fig1.add_trace(go.Bar(
x=signup_sources,
y=wau_rates_signup,
name='Weekly Active User Rate',
marker_color=['#1f77b4', '#ff7f0e'],
text=[f'{rate:.1%}' for rate in wau_rates_signup],
textposition='outside'
))
fig1.update_layout(
title='Weekly Active User Rate by Signup Source',
xaxis_title='Signup Source',
yaxis_title='Weekly Active User Rate',
yaxis_tickformat='.1%',
template='plotly_white',
height=500
)
plotly_figs.append(fig1)
# 3.2 Weekly Active User Rates by Platform
fig2 = go.Figure()
platforms = list(platform_rates.index)
wau_rates_platform = [platform_rates.loc[platform, True] if True in platform_rates.columns else 0 for platform in platforms]
fig2.add_trace(go.Bar(
x=platforms,
y=wau_rates_platform,
name='Weekly Active User Rate',
marker_color=['#2ca02c', '#d62728'],
text=[f'{rate:.1%}' for rate in wau_rates_platform],
textposition='outside'
))
fig2.update_layout(
title='Weekly Active User Rate by Platform',
xaxis_title='Platform',
yaxis_title='Weekly Active User Rate',
yaxis_tickformat='.1%',
template='plotly_white',
height=500
)
plotly_figs.append(fig2)
# 3.3 Load Time Distribution by User Activity
fig3 = go.Figure()
active_users_load_time = cleaned_df[cleaned_df['is_weekly_active_user'] == True]['load_time_seconds']
inactive_users_load_time = cleaned_df[cleaned_df['is_weekly_active_user'] == False]['load_time_seconds']
fig3.add_trace(go.Histogram(
x=active_users_load_time,
name='Active Users',
opacity=0.7,
marker_color='#2ca02c',
nbinsx=30
))
fig3.add_trace(go.Histogram(
x=inactive_users_load_time,
name='Inactive Users',
opacity=0.7,
marker_color='#d62728',
nbinsx=30
))
fig3.update_layout(
title='Load Time Distribution by User Activity Status',
xaxis_title='Load Time (seconds)',
yaxis_title='Count',
template='plotly_white',
height=500,
barmode='overlay'
)
plotly_figs.append(fig3)
# 3.4 Onboarding Completion Impact
fig4 = go.Figure()
onboarding_categories = ['Completed Onboarding', 'Did Not Complete']
onboarding_wau_rates = [active_rate_completed, active_rate_not_completed]
fig4.add_trace(go.Bar(
x=onboarding_categories,
y=onboarding_wau_rates,
name='Weekly Active User Rate',
marker_color=['#2ca02c', '#d62728'],
text=[f'{rate:.1%}' for rate in onboarding_wau_rates],
textposition='outside'
))
fig4.update_layout(
title='Weekly Active User Rate by Onboarding Completion',
xaxis_title='Onboarding Status',
yaxis_title='Weekly Active User Rate',
yaxis_tickformat='.1%',
template='plotly_white',
height=500
)
plotly_figs.append(fig4)
# 3.5 Conversion Funnel Visualization
fig5 = go.Figure()
funnel_stages = ['Total Signups', 'Onboarding Completed', 'Weekly Active Users']
funnel_counts = [total_users, onboarding_completed, weekly_active_users]
funnel_rates = [1.0, signup_to_onboarding_rate, overall_conversion_rate]
fig5.add_trace(go.Funnel(
y=funnel_stages,
x=funnel_counts,
textinfo="value+percent initial",
marker_color=['#1f77b4', '#ff7f0e', '#2ca02c']
))
fig5.update_layout(
title='User Conversion Funnel: Signup to Weekly Active User',
template='plotly_white',
height=500
)
plotly_figs.append(fig5)
# 3.6 Load Time vs Weekly Active User Status (Box Plot)
fig6 = go.Figure()
fig6.add_trace(go.Box(
y=active_users_load_time,
name='Active Users',
marker_color='#2ca02c',
boxpoints='outliers'
))
fig6.add_trace(go.Box(
y=inactive_users_load_time,
name='Inactive Users',
marker_color='#d62728',
boxpoints='outliers'
))
fig6.update_layout(
title='Load Time Distribution by User Activity Status',
xaxis_title='User Activity Status',
yaxis_title='Load Time (seconds)',
template='plotly_white',
height=500
)
plotly_figs.append(fig6)
# 3.7 Correlation Heatmap
correlation_data = pd.DataFrame({
'onboarding_completed': cleaned_df_numeric['onboarding_completed_num'],
'is_weekly_active_user': cleaned_df_numeric['is_weekly_active_user_num'],
'load_time_seconds': cleaned_df_numeric['load_time_seconds']
})
correlation_matrix = correlation_data.corr()
fig7 = go.Figure(data=go.Heatmap(
z=correlation_matrix.values,
x=correlation_matrix.columns,
y=correlation_matrix.columns,
colorscale='RdBu',
zmid=0,
text=correlation_matrix.round(3).values,
texttemplate='%{text}',
textfont={'size': 12},
hoverongaps=False
))
fig7.update_layout(
title='Correlation Matrix: User Behavior Metrics',
template='plotly_white',
height=500,
width=600
)
plotly_figs.append(fig7)
# 3.8 Comprehensive Dashboard
fig8 = make_subplots(
rows=2, cols=2,
subplot_titles=[
'WAU Rate by Signup Source',
'WAU Rate by Platform',
'Onboarding Impact on Activity',
'Load Time vs Activity'
],
specs=[[{'type': 'bar'}, {'type': 'bar'}],
[{'type': 'bar'}, {'type': 'box'}]]
)
# Add traces to dashboard
fig8.add_trace(go.Bar(x=signup_sources, y=wau_rates_signup, name='Signup Source',
marker_color=['#1f77b4', '#ff7f0e']), row=1, col=1)
fig8.add_trace(go.Bar(x=platforms, y=wau_rates_platform, name='Platform',
marker_color=['#2ca02c', '#d62728']), row=1, col=2)
fig8.add_trace(go.Bar(x=onboarding_categories, y=onboarding_wau_rates, name='Onboarding',
marker_color=['#2ca02c', '#d62728']), row=2, col=1)
fig8.add_trace(go.Box(y=active_users_load_time, name='Active', marker_color='#2ca02c'), row=2, col=2)
fig8.add_trace(go.Box(y=inactive_users_load_time, name='Inactive', marker_color='#d62728'), row=2, col=2)
fig8.update_layout(
title_text='Weekly Active User Analysis Dashboard',
height=800,
template='plotly_white',
showlegend=False
)
# Update y-axes formatting
fig8.update_yaxes(tickformat='.1%', row=1, col=1)
fig8.update_yaxes(tickformat='.1%', row=1, col=2)
fig8.update_yaxes(tickformat='.1%', row=2, col=1)
plotly_figs.append(fig8)
# Display all figures
for i, fig in enumerate(plotly_figs, 1):
print(f"Displaying Figure {i}")
fig.show()
# ===== SUMMARY INSIGHTS =====
print("4. KEY INSIGHTS SUMMARY")
print("=" * 50)
print("Question 1 - Signup Source Analysis:")
referral_rate = signup_rates.loc['Referral', True] if 'Referral' in signup_rates.index and True in signup_rates.columns else 0
other_sources = [src for src in signup_sources if src != 'Referral']
other_rate = signup_rates.loc[other_sources[0], True] if other_sources and True in signup_rates.columns else 0
print(f" Referral users: {referral_rate:.1%} WAU rate")
print(f" Other sources: {other_rate:.1%} WAU rate")
print(f" Statistical significance: p={p_signup:.4f}")
print("\nQuestion 2 - Platform Analysis:")
ios_rate = platform_rates.loc['iOS', True] if 'iOS' in platform_rates.index and True in platform_rates.columns else 0
other_platforms = [plt for plt in platforms if plt != 'iOS']
other_platform_rate = platform_rates.loc[other_platforms[0], True] if other_platforms and True in platform_rates.columns else 0
print(f" iOS users: {ios_rate:.1%} WAU rate")
print(f" Other platforms: {other_platform_rate:.1%} WAU rate")
print(f" Statistical significance: p={p_platform:.4f}")
print("\nQuestion 3 - Onboarding Impact:")
print(f" Completed onboarding: {active_rate_completed:.1%} WAU rate")
print(f" Did not complete: {active_rate_not_completed:.1%} WAU rate")
print(f" Correlation coefficient: r={onboarding_corr:.4f}")
print("\nQuestion 4 - Load Time Impact:")
active_mean_load = load_time_stats.loc[True, 'mean'] if True in load_time_stats.index else 0
inactive_mean_load = load_time_stats.loc[False, 'mean'] if False in load_time_stats.index else 0
print(f" Active users avg load time: {active_mean_load:.2f}s")
print(f" Inactive users avg load time: {inactive_mean_load:.2f}s")
print(f" Correlation coefficient: r={load_time_corr:.4f}")
print("\nQuestion 5 - Conversion Funnel:")
print(f" Signup -> Onboarding: {signup_to_onboarding_rate:.1%}")
print(f" Onboarding -> Active: {onboarding_to_active_rate:.1%}")
print(f" Overall conversion: {overall_conversion_rate:.1%}")
biggest_dropoff_stage = "Onboarding" if (1 - signup_to_onboarding_rate) > (1 - onboarding_to_active_rate) else "Activation"
print(f" Biggest drop-off at: {biggest_dropoff_stage} stage")
print(f"\nTotal figures created: {len(plotly_figs)}")
print("Analysis complete!")
The drop in weekly active users is primarily driven by onboarding failures, with nearly 1 in 4 users abandoning the process before completion.
## Key Takeaways
## Recommended Next Steps
Immediate Priority (High Impact)
Medium-Term Actions
Expected Impact: Improving onboarding completion by just 10 percentage points could increase overall weekly active user rates from 60.4% to 67-70%, representing a significant recovery in user engagement.
The data provides a clear roadmap: fix onboarding first, optimize platform experiences second, and refine acquisition channels third. This prioritized approach should deliver measurable improvements in weekly active user retention.