Deep Analysis Report

Original Question

What's driving the drop in weekly active users?

add visualizations

Detailed Research Questions

  1. How does the weekly active user rate vary by signup source (Referral vs other sources), and are certain acquisition channels showing steeper declines in user engagement?

  2. Is there a significant difference in weekly active user rates between platforms (iOS vs others), and could platform-specific issues be contributing to the overall drop?

  3. What is the relationship between onboarding completion rates and weekly active users - are users who didn't complete onboarding more likely to become inactive, and has this pattern changed recently?

  4. How does load time performance correlate with weekly active user status, and are users experiencing slower load times (higher load_time_seconds) more likely to become inactive?

  5. What is the conversion funnel from signup → onboarding completion → weekly active user, and at which stage are we seeing the biggest drop-offs that could explain the decline in weekly active users?

Analysis & Insights

Weekly Active User Decline Analysis Report

Executive Summary

This analysis investigated the factors driving the decline in weekly active users by examining user engagement patterns across 1,000 users. The study focused on five key areas: signup source performance, platform differences, onboarding impact, load time effects, and conversion funnel analysis.

Key Findings

1. Onboarding Completion is the Primary Driver

The most significant factor affecting weekly active users is onboarding completion:
- Users who completed onboarding: 74.4% weekly active rate
- Users who didn't complete onboarding: 13.2% weekly active rate
- Strong positive correlation (r=0.5250, p<0.001) between onboarding and user activity
- 22.8% of users fail to complete onboarding, representing the largest conversion drop-off

2. Acquisition Channel Performance Varies Significantly

Organic traffic significantly outperforms paid channels:
- Organic users: 69.5% weekly active rate
- Referral users: 58.9% weekly active rate
- Ad Campaign users: 45.0% weekly active rate
- Statistical significance confirmed (p<0.001), indicating genuine performance differences

3. Platform-Specific Engagement Gaps

iOS users show higher engagement than Android users:
- iOS users: 64.4% weekly active rate
- Android users: 57.8% weekly active rate
- Moderate but statistically significant difference (p=0.043)

4. Load Time Performance Impact

Slower load times correlate with reduced user activity:
- Active users average load time: 2.15 seconds
- Inactive users average load time: 2.54 seconds
- Negative correlation (r=-0.2425, p<0.001) confirms performance impact on engagement

5. Conversion Funnel Analysis

The user journey shows clear drop-off points:
- Total signups → Onboarding completion: 77.2% (228 users lost)
- Onboarding completion → Weekly active: 78.2% (168 users lost)
- Overall signup → Weekly active conversion: 60.4%

Root Cause Analysis

Primary Issue: Onboarding Barrier
The analysis clearly identifies onboarding completion as the strongest predictor of user activity. With nearly 23% of users failing to complete onboarding and only 13% of non-completers becoming active, this represents the most critical intervention point.

Secondary Issues:
1. Acquisition Quality: Ad campaign users show 24.5 percentage points lower engagement than organic users
2. Platform Experience: Android users lag iOS users by 6.6 percentage points
3. Performance Issues: Load time differences of 0.39 seconds correlate with activity differences

Recommendations

Immediate Actions (High Impact)

  1. Redesign Onboarding Process: Focus on the 23% who don't complete onboarding
  2. Simplify onboarding steps
  3. Add progress indicators and motivation
  4. Implement onboarding abandonment recovery campaigns

  5. Optimize Android Experience: Address the 6.6 percentage point gap

  6. Investigate Android-specific usability issues
  7. Improve load times on Android devices

Medium-Term Improvements

  1. Refine Acquisition Strategy:
  2. Reduce ad campaign spend or improve targeting quality
  3. Increase investment in organic and referral channels

  4. Performance Optimization:

  5. Target load time improvements, especially for inactive user segments
  6. Focus on users experiencing >2.5 second load times

Expected Impact

Based on the statistical relationships identified:
- Improving onboarding completion by 10 percentage points could increase overall WAU rate from 60.4% to approximately 67-70%
- Addressing platform gaps could add 2-4 percentage points to overall engagement
- Combined interventions could potentially achieve 70%+ weekly active user rates

Next Steps

  1. Conduct qualitative research on onboarding drop-off reasons
  2. A/B test simplified onboarding flows
  3. Implement platform-specific performance monitoring
  4. Establish ongoing tracking of these key metrics to measure improvement

This analysis provides a clear roadmap for addressing the weekly active user decline, with onboarding optimization offering the highest potential return on investment.

Generated Code

View Generated Code (Click to expand)
import pandas as pd
import numpy as np
import plotly.graph_objects as go
import plotly.express as px
from plotly.subplots import make_subplots
from scipy.stats import chi2_contingency, pearsonr
import warnings
warnings.filterwarnings('ignore')

# Initialize list to store all Plotly figures
plotly_figs = []

print("=== USER ENGAGEMENT ANALYSIS PIPELINE ===\n")

# ===== PREPROCESSING STAGE =====
print("1. DATA PREPROCESSING")
print("=" * 50)

# Create a copy of the original DataFrame
cleaned_df = df.copy()

# Remove the unnamed index column as it's not needed for analysis
if 'Unnamed: 0' in cleaned_df.columns:
    cleaned_df = cleaned_df.drop('Unnamed: 0', axis=1)

# Handle missing values in load_time_seconds
if cleaned_df['load_time_seconds'].isnull().any():
    missing_count = cleaned_df['load_time_seconds'].isnull().sum()
    cleaned_df['load_time_seconds'] = cleaned_df['load_time_seconds'].fillna(cleaned_df['load_time_seconds'].median())
    print(f"Filled {missing_count} missing values in load_time_seconds with median")

# Handle missing values in categorical columns
for col in ['signup_source', 'platform']:
    if cleaned_df[col].isnull().any():
        mode_value = cleaned_df[col].mode()[0] if not cleaned_df[col].mode().empty else 'Unknown'
        cleaned_df[col] = cleaned_df[col].fillna(mode_value)
        print(f"Filled missing values in {col} with {mode_value}")

# Ensure proper data types
cleaned_df['signup_source'] = cleaned_df['signup_source'].astype('category')
cleaned_df['platform'] = cleaned_df['platform'].astype('category')
cleaned_df['onboarding_completed'] = cleaned_df['onboarding_completed'].astype('bool')
cleaned_df['is_weekly_active_user'] = cleaned_df['is_weekly_active_user'].astype('bool')

print(f"Dataset shape after cleaning: {cleaned_df.shape}")
print(f"Missing values: {cleaned_df.isnull().sum().sum()}")
print()

# ===== STATISTICAL ANALYSIS STAGE =====
print("2. STATISTICAL ANALYSIS")
print("=" * 50)

# Create numeric versions for analysis
cleaned_df_numeric = cleaned_df.copy()
cleaned_df_numeric['onboarding_completed_num'] = cleaned_df_numeric['onboarding_completed'].astype(int)
cleaned_df_numeric['is_weekly_active_user_num'] = cleaned_df_numeric['is_weekly_active_user'].astype(int)

# 2.1 SEGMENTATION ANALYSIS
print("2.1 Segmentation Analysis")
print("-" * 30)

# Weekly active user rates by signup source
signup_crosstab = pd.crosstab(cleaned_df['signup_source'], cleaned_df['is_weekly_active_user'])
signup_rates = pd.crosstab(cleaned_df['signup_source'], cleaned_df['is_weekly_active_user'], normalize='index')

print("Weekly Active User Rates by Signup Source:")
for source in signup_rates.index:
    active_rate = signup_rates.loc[source, True] if True in signup_rates.columns else 0
    total_users = signup_crosstab.loc[source].sum()
    print(f"  {source}: {active_rate:.1%} ({signup_crosstab.loc[source, True] if True in signup_crosstab.columns else 0}/{total_users})")

# Chi-square test for signup source
chi2_signup, p_signup, dof_signup, expected_signup = chi2_contingency(signup_crosstab)
print(f"Chi-square test: ={chi2_signup:.4f}, p={p_signup:.4f}")
print()

# Weekly active user rates by platform
platform_crosstab = pd.crosstab(cleaned_df['platform'], cleaned_df['is_weekly_active_user'])
platform_rates = pd.crosstab(cleaned_df['platform'], cleaned_df['is_weekly_active_user'], normalize='index')

print("Weekly Active User Rates by Platform:")
for platform in platform_rates.index:
    active_rate = platform_rates.loc[platform, True] if True in platform_rates.columns else 0
    total_users = platform_crosstab.loc[platform].sum()
    print(f"  {platform}: {active_rate:.1%} ({platform_crosstab.loc[platform, True] if True in platform_crosstab.columns else 0}/{total_users})")

# Chi-square test for platform
chi2_platform, p_platform, dof_platform, expected_platform = chi2_contingency(platform_crosstab)
print(f"Chi-square test: ={chi2_platform:.4f}, p={p_platform:.4f}")
print()

# Store segmentation analysis results
segmentation_analysis = {
    'signup_source_rates': signup_rates.to_dict(),
    'platform_rates': platform_rates.to_dict(),
    'signup_chi2_test': {'statistic': chi2_signup, 'p_value': p_signup},
    'platform_chi2_test': {'statistic': chi2_platform, 'p_value': p_platform}
}

# 2.2 CORRELATION ANALYSIS
print("2.2 Correlation Analysis")
print("-" * 30)

# Correlation between onboarding completion and weekly active users
onboarding_corr, onboarding_p = pearsonr(cleaned_df_numeric['onboarding_completed_num'], 
                                        cleaned_df_numeric['is_weekly_active_user_num'])
print(f"Onboarding completion vs Weekly active users: r={onboarding_corr:.4f}, p={onboarding_p:.4f}")

# Correlation between load time and weekly active users
load_time_corr, load_time_p = pearsonr(cleaned_df_numeric['load_time_seconds'], 
                                      cleaned_df_numeric['is_weekly_active_user_num'])
print(f"Load time vs Weekly active users: r={load_time_corr:.4f}, p={load_time_p:.4f}")

# Load time statistics by user activity
load_time_stats = cleaned_df.groupby('is_weekly_active_user')['load_time_seconds'].agg(['mean', 'std', 'count'])
print("\nLoad Time Statistics by User Activity:")
print(load_time_stats)
print()

# Store correlation analysis results
correlation_analysis = {
    'onboarding_correlation': {'coefficient': onboarding_corr, 'p_value': onboarding_p},
    'load_time_correlation': {'coefficient': load_time_corr, 'p_value': load_time_p},
    'load_time_stats': load_time_stats.to_dict()
}

# 2.3 FUNNEL ANALYSIS
print("2.3 Conversion Funnel Analysis")
print("-" * 30)

total_users = len(cleaned_df)
onboarding_completed = cleaned_df['onboarding_completed'].sum()
weekly_active_users = cleaned_df['is_weekly_active_user'].sum()

# Calculate conversion rates
signup_to_onboarding_rate = onboarding_completed / total_users
onboarding_to_active_rate = weekly_active_users / onboarding_completed if onboarding_completed > 0 else 0
overall_conversion_rate = weekly_active_users / total_users

print(f"Conversion Funnel:")
print(f"  Total Signups: {total_users}")
print(f"  Onboarding Completed: {onboarding_completed} ({signup_to_onboarding_rate:.1%})")
print(f"  Weekly Active Users: {weekly_active_users} ({overall_conversion_rate:.1%})")
print()
print(f"Stage Conversion Rates:")
print(f"  Signup  ->  Onboarding: {signup_to_onboarding_rate:.1%}")
print(f"  Onboarding  ->  Active: {onboarding_to_active_rate:.1%}")
print(f"  Overall Signup  ->  Active: {overall_conversion_rate:.1%}")

# Weekly active rates by onboarding completion
completed_onboarding = cleaned_df[cleaned_df['onboarding_completed'] == True]
not_completed_onboarding = cleaned_df[cleaned_df['onboarding_completed'] == False]

active_rate_completed = completed_onboarding['is_weekly_active_user'].mean()
active_rate_not_completed = not_completed_onboarding['is_weekly_active_user'].mean()

print(f"\nWeekly Active Rates:")
print(f"  Completed onboarding: {active_rate_completed:.1%}")
print(f"  Did not complete onboarding: {active_rate_not_completed:.1%}")
print()

# Store funnel analysis results
funnel_analysis = {
    'total_users': total_users,
    'onboarding_completed': onboarding_completed,
    'weekly_active_users': weekly_active_users,
    'conversion_rates': {
        'signup_to_onboarding': signup_to_onboarding_rate,
        'onboarding_to_active': onboarding_to_active_rate,
        'overall_conversion': overall_conversion_rate
    },
    'active_rates_by_onboarding': {
        'completed_onboarding': active_rate_completed,
        'not_completed_onboarding': active_rate_not_completed
    }
}

# ===== VISUALIZATION STAGE =====
print("3. CREATING VISUALIZATIONS")
print("=" * 50)

# 3.1 Weekly Active User Rates by Signup Source
fig1 = go.Figure()

signup_sources = list(signup_rates.index)
wau_rates_signup = [signup_rates.loc[source, True] if True in signup_rates.columns else 0 for source in signup_sources]

fig1.add_trace(go.Bar(
    x=signup_sources,
    y=wau_rates_signup,
    name='Weekly Active User Rate',
    marker_color=['#1f77b4', '#ff7f0e'],
    text=[f'{rate:.1%}' for rate in wau_rates_signup],
    textposition='outside'
))

fig1.update_layout(
    title='Weekly Active User Rate by Signup Source',
    xaxis_title='Signup Source',
    yaxis_title='Weekly Active User Rate',
    yaxis_tickformat='.1%',
    template='plotly_white',
    height=500
)

plotly_figs.append(fig1)

# 3.2 Weekly Active User Rates by Platform
fig2 = go.Figure()

platforms = list(platform_rates.index)
wau_rates_platform = [platform_rates.loc[platform, True] if True in platform_rates.columns else 0 for platform in platforms]

fig2.add_trace(go.Bar(
    x=platforms,
    y=wau_rates_platform,
    name='Weekly Active User Rate',
    marker_color=['#2ca02c', '#d62728'],
    text=[f'{rate:.1%}' for rate in wau_rates_platform],
    textposition='outside'
))

fig2.update_layout(
    title='Weekly Active User Rate by Platform',
    xaxis_title='Platform',
    yaxis_title='Weekly Active User Rate',
    yaxis_tickformat='.1%',
    template='plotly_white',
    height=500
)

plotly_figs.append(fig2)

# 3.3 Load Time Distribution by User Activity
fig3 = go.Figure()

active_users_load_time = cleaned_df[cleaned_df['is_weekly_active_user'] == True]['load_time_seconds']
inactive_users_load_time = cleaned_df[cleaned_df['is_weekly_active_user'] == False]['load_time_seconds']

fig3.add_trace(go.Histogram(
    x=active_users_load_time,
    name='Active Users',
    opacity=0.7,
    marker_color='#2ca02c',
    nbinsx=30
))

fig3.add_trace(go.Histogram(
    x=inactive_users_load_time,
    name='Inactive Users',
    opacity=0.7,
    marker_color='#d62728',
    nbinsx=30
))

fig3.update_layout(
    title='Load Time Distribution by User Activity Status',
    xaxis_title='Load Time (seconds)',
    yaxis_title='Count',
    template='plotly_white',
    height=500,
    barmode='overlay'
)

plotly_figs.append(fig3)

# 3.4 Onboarding Completion Impact
fig4 = go.Figure()

onboarding_categories = ['Completed Onboarding', 'Did Not Complete']
onboarding_wau_rates = [active_rate_completed, active_rate_not_completed]

fig4.add_trace(go.Bar(
    x=onboarding_categories,
    y=onboarding_wau_rates,
    name='Weekly Active User Rate',
    marker_color=['#2ca02c', '#d62728'],
    text=[f'{rate:.1%}' for rate in onboarding_wau_rates],
    textposition='outside'
))

fig4.update_layout(
    title='Weekly Active User Rate by Onboarding Completion',
    xaxis_title='Onboarding Status',
    yaxis_title='Weekly Active User Rate',
    yaxis_tickformat='.1%',
    template='plotly_white',
    height=500
)

plotly_figs.append(fig4)

# 3.5 Conversion Funnel Visualization
fig5 = go.Figure()

funnel_stages = ['Total Signups', 'Onboarding Completed', 'Weekly Active Users']
funnel_counts = [total_users, onboarding_completed, weekly_active_users]
funnel_rates = [1.0, signup_to_onboarding_rate, overall_conversion_rate]

fig5.add_trace(go.Funnel(
    y=funnel_stages,
    x=funnel_counts,
    textinfo="value+percent initial",
    marker_color=['#1f77b4', '#ff7f0e', '#2ca02c']
))

fig5.update_layout(
    title='User Conversion Funnel: Signup to Weekly Active User',
    template='plotly_white',
    height=500
)

plotly_figs.append(fig5)

# 3.6 Load Time vs Weekly Active User Status (Box Plot)
fig6 = go.Figure()

fig6.add_trace(go.Box(
    y=active_users_load_time,
    name='Active Users',
    marker_color='#2ca02c',
    boxpoints='outliers'
))

fig6.add_trace(go.Box(
    y=inactive_users_load_time,
    name='Inactive Users',
    marker_color='#d62728',
    boxpoints='outliers'
))

fig6.update_layout(
    title='Load Time Distribution by User Activity Status',
    xaxis_title='User Activity Status',
    yaxis_title='Load Time (seconds)',
    template='plotly_white',
    height=500
)

plotly_figs.append(fig6)

# 3.7 Correlation Heatmap
correlation_data = pd.DataFrame({
    'onboarding_completed': cleaned_df_numeric['onboarding_completed_num'],
    'is_weekly_active_user': cleaned_df_numeric['is_weekly_active_user_num'],
    'load_time_seconds': cleaned_df_numeric['load_time_seconds']
})

correlation_matrix = correlation_data.corr()

fig7 = go.Figure(data=go.Heatmap(
    z=correlation_matrix.values,
    x=correlation_matrix.columns,
    y=correlation_matrix.columns,
    colorscale='RdBu',
    zmid=0,
    text=correlation_matrix.round(3).values,
    texttemplate='%{text}',
    textfont={'size': 12},
    hoverongaps=False
))

fig7.update_layout(
    title='Correlation Matrix: User Behavior Metrics',
    template='plotly_white',
    height=500,
    width=600
)

plotly_figs.append(fig7)

# 3.8 Comprehensive Dashboard
fig8 = make_subplots(
    rows=2, cols=2,
    subplot_titles=[
        'WAU Rate by Signup Source',
        'WAU Rate by Platform',
        'Onboarding Impact on Activity',
        'Load Time vs Activity'
    ],
    specs=[[{'type': 'bar'}, {'type': 'bar'}],
           [{'type': 'bar'}, {'type': 'box'}]]
)

# Add traces to dashboard
fig8.add_trace(go.Bar(x=signup_sources, y=wau_rates_signup, name='Signup Source', 
                     marker_color=['#1f77b4', '#ff7f0e']), row=1, col=1)
fig8.add_trace(go.Bar(x=platforms, y=wau_rates_platform, name='Platform',
                     marker_color=['#2ca02c', '#d62728']), row=1, col=2)
fig8.add_trace(go.Bar(x=onboarding_categories, y=onboarding_wau_rates, name='Onboarding',
                     marker_color=['#2ca02c', '#d62728']), row=2, col=1)
fig8.add_trace(go.Box(y=active_users_load_time, name='Active', marker_color='#2ca02c'), row=2, col=2)
fig8.add_trace(go.Box(y=inactive_users_load_time, name='Inactive', marker_color='#d62728'), row=2, col=2)

fig8.update_layout(
    title_text='Weekly Active User Analysis Dashboard',
    height=800,
    template='plotly_white',
    showlegend=False
)

# Update y-axes formatting
fig8.update_yaxes(tickformat='.1%', row=1, col=1)
fig8.update_yaxes(tickformat='.1%', row=1, col=2)
fig8.update_yaxes(tickformat='.1%', row=2, col=1)

plotly_figs.append(fig8)

# Display all figures
for i, fig in enumerate(plotly_figs, 1):
    print(f"Displaying Figure {i}")
    fig.show()

# ===== SUMMARY INSIGHTS =====
print("4. KEY INSIGHTS SUMMARY")
print("=" * 50)

print("Question 1 - Signup Source Analysis:")
referral_rate = signup_rates.loc['Referral', True] if 'Referral' in signup_rates.index and True in signup_rates.columns else 0
other_sources = [src for src in signup_sources if src != 'Referral']
other_rate = signup_rates.loc[other_sources[0], True] if other_sources and True in signup_rates.columns else 0
print(f"  Referral users: {referral_rate:.1%} WAU rate")
print(f"  Other sources: {other_rate:.1%} WAU rate")
print(f"  Statistical significance: p={p_signup:.4f}")

print("\nQuestion 2 - Platform Analysis:")
ios_rate = platform_rates.loc['iOS', True] if 'iOS' in platform_rates.index and True in platform_rates.columns else 0
other_platforms = [plt for plt in platforms if plt != 'iOS']
other_platform_rate = platform_rates.loc[other_platforms[0], True] if other_platforms and True in platform_rates.columns else 0
print(f"  iOS users: {ios_rate:.1%} WAU rate")
print(f"  Other platforms: {other_platform_rate:.1%} WAU rate")
print(f"  Statistical significance: p={p_platform:.4f}")

print("\nQuestion 3 - Onboarding Impact:")
print(f"  Completed onboarding: {active_rate_completed:.1%} WAU rate")
print(f"  Did not complete: {active_rate_not_completed:.1%} WAU rate")
print(f"  Correlation coefficient: r={onboarding_corr:.4f}")

print("\nQuestion 4 - Load Time Impact:")
active_mean_load = load_time_stats.loc[True, 'mean'] if True in load_time_stats.index else 0
inactive_mean_load = load_time_stats.loc[False, 'mean'] if False in load_time_stats.index else 0
print(f"  Active users avg load time: {active_mean_load:.2f}s")
print(f"  Inactive users avg load time: {inactive_mean_load:.2f}s")
print(f"  Correlation coefficient: r={load_time_corr:.4f}")

print("\nQuestion 5 - Conversion Funnel:")
print(f"  Signup  ->  Onboarding: {signup_to_onboarding_rate:.1%}")
print(f"  Onboarding  ->  Active: {onboarding_to_active_rate:.1%}")
print(f"  Overall conversion: {overall_conversion_rate:.1%}")

biggest_dropoff_stage = "Onboarding" if (1 - signup_to_onboarding_rate) > (1 - onboarding_to_active_rate) else "Activation"
print(f"  Biggest drop-off at: {biggest_dropoff_stage} stage")

print(f"\nTotal figures created: {len(plotly_figs)}")
print("Analysis complete!")

Conclusion

The drop in weekly active users is primarily driven by onboarding failures, with nearly 1 in 4 users abandoning the process before completion.

## Key Takeaways

  • Onboarding is the critical bottleneck: Users who complete onboarding are 5.6x more likely to become weekly active (74.4% vs 13.2%)
  • Acquisition quality matters: Organic users significantly outperform paid channels (69.5% vs 45.0% weekly active rates)
  • Platform experience gaps exist: iOS users show 6.6 percentage points higher engagement than Android users
  • Performance impacts retention: Slower load times correlate with reduced user activity

## Recommended Next Steps

Immediate Priority (High Impact)

  • Redesign the onboarding experience - Focus on the 23% abandonment rate through simplified flows, progress indicators, and recovery campaigns
  • Optimize Android performance - Address platform-specific issues causing the engagement gap

Medium-Term Actions

  • Refine acquisition strategy - Reduce ad spend efficiency gaps or improve targeting quality
  • Implement performance monitoring - Target load time improvements, especially for users experiencing >2.5 second delays

Expected Impact: Improving onboarding completion by just 10 percentage points could increase overall weekly active user rates from 60.4% to 67-70%, representing a significant recovery in user engagement.

The data provides a clear roadmap: fix onboarding first, optimize platform experiences second, and refine acquisition channels third. This prioritized approach should deliver measurable improvements in weekly active user retention.