Why are users not using Feature X after launch?
How does Feature X usage vary across different user segments (Power User vs others) and plan types (Enterprise, Pro, etc.), and which combinations show the lowest adoption rates?
Is there a correlation between onboarding completion status and Feature X usage scores, suggesting that incomplete onboarding may be hindering feature discovery or understanding?
How does Feature X usage differ between users on the Old UI versus New UI, and could interface design be impacting feature accessibility or visibility?
What is the relationship between days_since_launch and feature_x_usage_score - are users who joined closer to launch using the feature more or less than those who joined later?
What constitutes the feature_x_usage_score scale, and what percentage of users fall into low usage categories (e.g., scores below 3 or 5) that would indicate non-adoption versus light usage?
Our analysis investigated why users are not adopting Feature X after its launch. The data reveals a concerning pattern: 68.9% of users have low usage (below score 3) and 50% have zero usage, indicating significant adoption challenges that require immediate attention.
We conducted a comprehensive multi-dimensional analysis examining user segmentation, onboarding effectiveness, UI design impact, temporal patterns, and overall usage distribution. The analysis processed 1,000 user records across 7 key variables, using statistical correlation analysis, comparative testing, and distribution analysis to identify root causes.
Problem Identified: Specific user combinations show extremely low feature adoption:
- Free Plan + New Users: Average usage score of only 0.08 (165 users)
- Free Plan + Casual Users: Average usage score of 0.24 (172 users)
- Pro Plan + New Users: Average usage score of 0.38 (133 users)
Insight: Free tier users, particularly new users, are almost completely ignoring Feature X, suggesting either lack of awareness, perceived value, or accessibility barriers.
Strong Correlation Discovered: Users who completed onboarding show significantly higher feature usage:
- Completed Onboarding: Average usage score of 2.98
- Incomplete Onboarding: Average usage score of 0.75
- Statistical Significance: Correlation of 0.329 (p < 0.001)
Insight: The onboarding process is critical for feature adoption, but many users aren't completing it or it's not effectively introducing Feature X.
Significant Design Impact: The UI version dramatically affects feature usage:
- Old UI Users: Average usage score of 3.15 (408 users)
- New UI Users: Average usage score of 1.64 (592 users)
- Statistical Significance: Highly significant difference (p < 0.001)
Critical Finding: The new UI design is actually hindering feature adoption compared to the old interface, suggesting usability or discoverability issues.
Declining Adoption Over Time: Analysis reveals a negative correlation (-0.228) between days since launch and feature usage, indicating that:
- Early adopters used the feature more
- Later users are increasingly less likely to adopt it
- The launch momentum has not been sustained
Adoption Reality Check:
- 56.9% of users score below 1 (minimal usage)
- 68.9% of users score below 3 (low usage threshold)
- 80.7% of users score below 5 (light usage threshold)
- 50% of users have zero usage
The convergence of evidence points to four primary factors:
The data clearly shows that Feature X's low adoption is not due to a single factor but rather a combination of onboarding, UI design, and user segment-specific challenges that require coordinated intervention across multiple product areas.
import pandas as pd
import numpy as np
import plotly.graph_objects as go
import plotly.express as px
from plotly.subplots import make_subplots
import statsmodels.api as sm
from scipy import stats
# Initialize list to store all Plotly figures
plotly_figs = []
# ===== DATA PREPROCESSING =====
print("=== DATA PREPROCESSING ===")
# Create a copy of the original DataFrame
cleaned_df = df.copy()
# Remove the unnamed index column as it's redundant
if 'Unnamed: 0' in cleaned_df.columns:
cleaned_df = cleaned_df.drop('Unnamed: 0', axis=1)
# Separate column types for targeted missing value handling
numeric_cols = cleaned_df.select_dtypes(include=['int64', 'float64']).columns
categorical_cols = cleaned_df.select_dtypes(include=['object']).columns
boolean_cols = cleaned_df.select_dtypes(include=['bool']).columns
# Handle missing values for numeric columns (use median)
for col in numeric_cols:
if cleaned_df[col].isnull().any():
cleaned_df[col] = cleaned_df[col].fillna(cleaned_df[col].median())
# Handle missing values for categorical columns (use mode or 'Unknown')
for col in categorical_cols:
if cleaned_df[col].isnull().any():
mode_value = cleaned_df[col].mode()
fill_value = mode_value[0] if not mode_value.empty else 'Unknown'
cleaned_df[col] = cleaned_df[col].fillna(fill_value)
# Handle missing values for boolean columns (use mode or False as default)
for col in boolean_cols:
if cleaned_df[col].isnull().any():
mode_value = cleaned_df[col].mode()
fill_value = mode_value[0] if not mode_value.empty else False
cleaned_df[col] = cleaned_df[col].fillna(fill_value)
# Ensure proper data types
categorical_columns_to_convert = ['plan', 'segment', 'ui_version']
for col in categorical_columns_to_convert:
if col in cleaned_df.columns:
cleaned_df[col] = cleaned_df[col].astype('category')
# Ensure numeric columns are properly typed
if 'days_since_launch' in cleaned_df.columns:
cleaned_df['days_since_launch'] = cleaned_df['days_since_launch'].astype('int64')
if 'feature_x_usage_score' in cleaned_df.columns:
cleaned_df['feature_x_usage_score'] = cleaned_df['feature_x_usage_score'].astype('float64')
# Ensure boolean column is properly typed
if 'onboarding_completed' in cleaned_df.columns:
cleaned_df['onboarding_completed'] = cleaned_df['onboarding_completed'].astype('bool')
print(f"Cleaned dataset shape: {cleaned_df.shape}")
print(f"Missing values per column:\n{cleaned_df.isnull().sum()}")
print(f"Data types:\n{cleaned_df.dtypes}")
# ===== STATISTICAL ANALYSIS =====
print("\n=== STATISTICAL ANALYSIS ===")
# 1. Segmentation Statistics - Feature usage by plan and segment combinations
def analyze_segmentation(df):
segmentation_stats = {}
# Group by plan and segment combinations
segment_groups = df.groupby(['plan', 'segment'])['feature_x_usage_score']
# Calculate statistics for each combination
combinations = []
for (plan, segment), group in segment_groups:
stats_dict = {
'plan': plan,
'segment': segment,
'count': len(group),
'mean': group.mean(),
'median': group.median(),
'std': group.std(),
'min': group.min(),
'max': group.max()
}
combinations.append(stats_dict)
# Convert to DataFrame for easier analysis
combo_df = pd.DataFrame(combinations)
# Identify lowest adoption rates
lowest_adoption = combo_df.nsmallest(3, 'mean')[['plan', 'segment', 'mean', 'count']]
segmentation_stats['combination_statistics'] = combinations
segmentation_stats['lowest_adoption_combinations'] = lowest_adoption.to_dict('records')
segmentation_stats['overall_plan_stats'] = df.groupby('plan')['feature_x_usage_score'].agg(['count', 'mean', 'median', 'std']).to_dict()
segmentation_stats['overall_segment_stats'] = df.groupby('segment')['feature_x_usage_score'].agg(['count', 'mean', 'median', 'std']).to_dict()
return segmentation_stats
# 2. Correlation Analysis
def analyze_correlations(df):
correlation_results = {}
# Convert boolean to numeric for correlation
df_corr = df.copy()
df_corr['onboarding_completed_numeric'] = df_corr['onboarding_completed'].astype(int)
# Correlation between onboarding completion and feature usage
onboarding_corr = df_corr['onboarding_completed_numeric'].corr(df_corr['feature_x_usage_score'])
# Correlation between days since launch and feature usage
days_usage_corr = df_corr['days_since_launch'].corr(df_corr['feature_x_usage_score'])
# Statistical significance tests
onboarding_stat, onboarding_p = stats.pearsonr(df_corr['onboarding_completed_numeric'], df_corr['feature_x_usage_score'])
days_stat, days_p = stats.pearsonr(df_corr['days_since_launch'], df_corr['feature_x_usage_score'])
# Group comparison for onboarding
completed_usage = df[df['onboarding_completed'] == True]['feature_x_usage_score']
not_completed_usage = df[df['onboarding_completed'] == False]['feature_x_usage_score']
# T-test for onboarding groups
if len(not_completed_usage) > 0:
onboarding_ttest_stat, onboarding_ttest_p = stats.ttest_ind(completed_usage, not_completed_usage)
else:
onboarding_ttest_stat, onboarding_ttest_p = None, None
correlation_results['onboarding_correlation'] = {
'correlation_coefficient': onboarding_corr,
'p_value': onboarding_p,
'completed_mean': completed_usage.mean(),
'completed_std': completed_usage.std(),
'not_completed_mean': not_completed_usage.mean() if len(not_completed_usage) > 0 else None,
'not_completed_std': not_completed_usage.std() if len(not_completed_usage) > 0 else None,
'ttest_statistic': onboarding_ttest_stat,
'ttest_p_value': onboarding_ttest_p
}
correlation_results['days_launch_correlation'] = {
'correlation_coefficient': days_usage_corr,
'p_value': days_p
}
return correlation_results
# 3. UI Comparison Statistics
def analyze_ui_comparison(df):
ui_comparison_stats = {}
# Separate groups
old_ui_usage = df[df['ui_version'] == 'Old UI']['feature_x_usage_score']
new_ui_usage = df[df['ui_version'] == 'New UI']['feature_x_usage_score']
# Descriptive statistics
ui_comparison_stats['old_ui_stats'] = {
'count': len(old_ui_usage),
'mean': old_ui_usage.mean(),
'median': old_ui_usage.median(),
'std': old_ui_usage.std(),
'min': old_ui_usage.min(),
'max': old_ui_usage.max()
}
ui_comparison_stats['new_ui_stats'] = {
'count': len(new_ui_usage),
'mean': new_ui_usage.mean(),
'median': new_ui_usage.median(),
'std': new_ui_usage.std(),
'min': new_ui_usage.min(),
'max': new_ui_usage.max()
}
# Statistical tests
# T-test (assuming normal distribution)
ttest_stat, ttest_p = stats.ttest_ind(old_ui_usage, new_ui_usage)
# Mann-Whitney U test (non-parametric alternative)
mannwhitney_stat, mannwhitney_p = stats.mannwhitneyu(old_ui_usage, new_ui_usage, alternative='two-sided')
ui_comparison_stats['statistical_tests'] = {
'ttest_statistic': ttest_stat,
'ttest_p_value': ttest_p,
'mannwhitney_statistic': mannwhitney_stat,
'mannwhitney_p_value': mannwhitney_p
}
return ui_comparison_stats
# 4. Usage Distribution Analysis
def analyze_usage_distribution(df):
usage_distribution = {}
usage_scores = df['feature_x_usage_score']
# Basic distribution statistics
usage_distribution['basic_stats'] = {
'count': len(usage_scores),
'mean': usage_scores.mean(),
'median': usage_scores.median(),
'std': usage_scores.std(),
'min': usage_scores.min(),
'max': usage_scores.max(),
'skewness': stats.skew(usage_scores),
'kurtosis': stats.kurtosis(usage_scores)
}
# Percentiles
percentiles = [10, 25, 50, 75, 90, 95, 99]
usage_distribution['percentiles'] = {f'{p}th_percentile': np.percentile(usage_scores, p) for p in percentiles}
# Low usage categories
total_users = len(usage_scores)
usage_distribution['low_usage_categories'] = {
'below_1': {
'count': (usage_scores < 1).sum(),
'percentage': (usage_scores < 1).mean() * 100
},
'below_3': {
'count': (usage_scores < 3).sum(),
'percentage': (usage_scores < 3).mean() * 100
},
'below_5': {
'count': (usage_scores < 5).sum(),
'percentage': (usage_scores < 5).mean() * 100
},
'zero_usage': {
'count': (usage_scores == 0).sum(),
'percentage': (usage_scores == 0).mean() * 100
}
}
# Usage score scale analysis
usage_distribution['scale_analysis'] = {
'unique_values': sorted(usage_scores.unique()),
'value_counts': usage_scores.value_counts().sort_index().to_dict()
}
return usage_distribution
# Execute all analyses
segmentation_stats = analyze_segmentation(cleaned_df)
correlation_results = analyze_correlations(cleaned_df)
ui_comparison_stats = analyze_ui_comparison(cleaned_df)
usage_distribution = analyze_usage_distribution(cleaned_df)
# ===== COMPREHENSIVE VISUALIZATIONS =====
print("\n=== CREATING VISUALIZATIONS ===")
# Performance optimization - sample if dataset is too large
viz_df = cleaned_df.copy()
if len(viz_df) > 50000:
viz_df = viz_df.sample(5000, random_state=42)
# 1. Heatmap: Feature usage by plan-segment combinations
print("Creating Plan-Segment Heatmap...")
plan_segment_pivot = viz_df.groupby(['plan', 'segment'])['feature_x_usage_score'].mean().unstack(fill_value=0)
fig1 = go.Figure(data=go.Heatmap(
z=plan_segment_pivot.values,
x=plan_segment_pivot.columns,
y=plan_segment_pivot.index,
colorscale='RdYlBu_r',
text=np.round(plan_segment_pivot.values, 2),
texttemplate="%{text}",
textfont={"size": 12},
colorbar=dict(title="Average Feature X Usage Score")
))
fig1.update_layout(
title="Feature X Usage by Plan Type and User Segment
Darker red indicates lower adoption rates",
xaxis_title="User Segment",
yaxis_title="Plan Type",
height=400,
font=dict(size=12)
)
# Add annotations for lowest adoption areas
min_usage = plan_segment_pivot.min().min()
min_locations = np.where(plan_segment_pivot.values == min_usage)
for i, j in zip(min_locations[0], min_locations[1]):
fig1.add_annotation(
x=j, y=i,
text=" Lowest",
showarrow=True,
arrowhead=2,
arrowcolor="red",
font=dict(color="red", size=10)
)
plotly_figs.append(fig1)
fig1.show()
# 2. Scatter plot: Onboarding completion vs Feature usage
print("Creating Onboarding Correlation Plot...")
onboarding_numeric = viz_df['onboarding_completed'].astype(int)
fig2 = px.scatter(
viz_df,
x=onboarding_numeric,
y='feature_x_usage_score',
color='onboarding_completed',
title="Feature X Usage vs Onboarding Completion Status",
labels={
'x': 'Onboarding Completed (0=No, 1=Yes)',
'feature_x_usage_score': 'Feature X Usage Score'
},
opacity=0.6
)
# Add trendline
z = np.polyfit(onboarding_numeric, viz_df['feature_x_usage_score'], 1)
p = np.poly1d(z)
fig2.add_traces(go.Scatter(
x=[0, 1],
y=p([0, 1]),
mode='lines',
name='Trendline',
line=dict(color='red', width=2, dash='dash')
))
# Add correlation annotation
correlation_coeff = correlation_results['onboarding_correlation']['correlation_coefficient']
fig2.add_annotation(
x=0.5, y=viz_df['feature_x_usage_score'].max() * 0.9,
text=f"Correlation: {correlation_coeff:.3f}",
showarrow=False,
bgcolor="white",
bordercolor="black",
borderwidth=1
)
fig2.update_layout(height=400)
plotly_figs.append(fig2)
fig2.show()
# 3. Box plots: UI version comparison
print("Creating UI Version Comparison...")
fig3 = px.box(
viz_df,
x='ui_version',
y='feature_x_usage_score',
color='ui_version',
title="Feature X Usage Distribution by UI Version",
labels={'feature_x_usage_score': 'Feature X Usage Score'}
)
# Add statistical significance annotation
old_ui_scores = viz_df[viz_df['ui_version'] == 'Old UI']['feature_x_usage_score']
new_ui_scores = viz_df[viz_df['ui_version'] == 'New UI']['feature_x_usage_score']
t_stat, p_value = stats.ttest_ind(old_ui_scores, new_ui_scores)
fig3.add_annotation(
x=0.5, y=viz_df['feature_x_usage_score'].max() * 0.95,
text=f"T-test p-value: {p_value:.4f}
{'Significant' if p_value < 0.05 else 'Not significant'} difference",
showarrow=False,
bgcolor="lightyellow",
bordercolor="orange",
borderwidth=1,
xref="paper"
)
fig3.update_layout(height=400)
plotly_figs.append(fig3)
fig3.show()
# 4. Scatter plot: Days since launch vs Feature usage
print("Creating Launch Timing Analysis...")
fig4 = px.scatter(
viz_df,
x='days_since_launch',
y='feature_x_usage_score',
title="Feature X Usage vs Days Since Launch",
labels={
'days_since_launch': 'Days Since Launch',
'feature_x_usage_score': 'Feature X Usage Score'
},
opacity=0.6,
trendline="ols"
)
# Add correlation annotation
launch_correlation = correlation_results['days_launch_correlation']['correlation_coefficient']
fig4.add_annotation(
x=viz_df['days_since_launch'].max() * 0.8,
y=viz_df['feature_x_usage_score'].max() * 0.9,
text=f"Correlation: {launch_correlation:.3f}",
showarrow=False,
bgcolor="white",
bordercolor="black",
borderwidth=1
)
fig4.update_layout(height=400)
plotly_figs.append(fig4)
fig4.show()
# 5. Usage distribution histogram and cumulative distribution
print("Creating Usage Distribution Analysis...")
fig5 = make_subplots(
rows=2, cols=1,
subplot_titles=('Usage Score Distribution', 'Cumulative Distribution'),
vertical_spacing=0.12
)
# Histogram
fig5.add_trace(
go.Histogram(
x=viz_df['feature_x_usage_score'],
nbinsx=20,
name='Usage Distribution',
opacity=0.7
),
row=1, col=1
)
# Add threshold lines for low usage (scores below 3 and 5)
low_usage_3 = usage_distribution['low_usage_categories']['below_3']['percentage']
low_usage_5 = usage_distribution['low_usage_categories']['below_5']['percentage']
fig5.add_vline(x=3, line_dash="dash", line_color="red", row=1, col=1)
fig5.add_vline(x=5, line_dash="dash", line_color="orange", row=1, col=1)
# Cumulative distribution
sorted_scores = np.sort(viz_df['feature_x_usage_score'])
cumulative_pct = np.arange(1, len(sorted_scores) + 1) / len(sorted_scores) * 100
fig5.add_trace(
go.Scatter(
x=sorted_scores,
y=cumulative_pct,
mode='lines',
name='Cumulative %',
line=dict(color='blue', width=2)
),
row=2, col=1
)
fig5.add_vline(x=3, line_dash="dash", line_color="red", row=2, col=1)
fig5.add_vline(x=5, line_dash="dash", line_color="orange", row=2, col=1)
# Add annotations for low usage percentages
fig5.add_annotation(
x=3, y=50,
text=f"<3: {low_usage_3:.1f}%",
showarrow=True,
arrowhead=2,
bgcolor="red",
font=dict(color="white"),
row=1, col=1
)
fig5.add_annotation(
x=5, y=70,
text=f"<5: {low_usage_5:.1f}%",
showarrow=True,
arrowhead=2,
bgcolor="orange",
font=dict(color="white"),
row=1, col=1
)
fig5.update_layout(
title="Feature X Usage Score Distribution Analysis
Red line: Score < 3 (Non-adoption), Orange line: Score < 5 (Light usage)",
height=600,
showlegend=False
)
fig5.update_xaxes(title_text="Feature X Usage Score", row=2, col=1)
fig5.update_yaxes(title_text="Count", row=1, col=1)
fig5.update_yaxes(title_text="Cumulative Percentage", row=2, col=1)
plotly_figs.append(fig5)
fig5.show()
# ===== COMPREHENSIVE RESULTS SUMMARY =====
print("\n" + "="*60)
print("FEATURE X USAGE ANALYSIS - COMPREHENSIVE RESULTS")
print("="*60)
print("\n1. SEGMENTATION ANALYSIS:")
print("Lowest Adoption Rate Combinations:")
for combo in segmentation_stats['lowest_adoption_combinations']:
print(f" {combo['plan']} - {combo['segment']}: Mean Usage = {combo['mean']:.2f} (n={combo['count']})")
print(f"\n2. CORRELATION ANALYSIS:")
print(f"Onboarding Completion vs Feature Usage:")
print(f" Correlation: {correlation_results['onboarding_correlation']['correlation_coefficient']:.3f}")
print(f" P-value: {correlation_results['onboarding_correlation']['p_value']:.3f}")
print(f" Completed Mean: {correlation_results['onboarding_correlation']['completed_mean']:.2f}")
if correlation_results['onboarding_correlation']['not_completed_mean'] is not None:
print(f" Not Completed Mean: {correlation_results['onboarding_correlation']['not_completed_mean']:.2f}")
print(f"\nDays Since Launch vs Feature Usage:")
print(f" Correlation: {correlation_results['days_launch_correlation']['correlation_coefficient']:.3f}")
print(f" P-value: {correlation_results['days_launch_correlation']['p_value']:.3f}")
print(f"\n3. UI VERSION COMPARISON:")
print(f" Old UI: Mean = {ui_comparison_stats['old_ui_stats']['mean']:.2f}, n = {ui_comparison_stats['old_ui_stats']['count']}")
print(f" New UI: Mean = {ui_comparison_stats['new_ui_stats']['mean']:.2f}, n = {ui_comparison_stats['new_ui_stats']['count']}")
print(f" T-test p-value: {ui_comparison_stats['statistical_tests']['ttest_p_value']:.3f}")
print(f" Mann-Whitney U p-value: {ui_comparison_stats['statistical_tests']['mannwhitney_p_value']:.3f}")
print(f"\n4. USAGE DISTRIBUTION:")
print(f" Overall Mean: {usage_distribution['basic_stats']['mean']:.2f}")
print(f" Median: {usage_distribution['basic_stats']['median']:.2f}")
print(f" Standard Deviation: {usage_distribution['basic_stats']['std']:.2f}")
print(f"Low Usage Categories:")
for category, stats in usage_distribution['low_usage_categories'].items():
print(f" {category.replace('_', ' ').title()}: {stats['count']} users ({stats['percentage']:.1f}%)")
print(f"\n5. KEY INSIGHTS:")
lowest_combo = segmentation_stats['lowest_adoption_combinations'][0]
print(f" Lowest adoption: {lowest_combo['plan']} - {lowest_combo['segment']} users ({lowest_combo['mean']:.2f} avg score)")
print(f" Onboarding impact: {'Strong' if abs(correlation_results['onboarding_correlation']['correlation_coefficient']) > 0.3 else 'Moderate' if abs(correlation_results['onboarding_correlation']['correlation_coefficient']) > 0.1 else 'Weak'} correlation ({correlation_results['onboarding_correlation']['correlation_coefficient']:.3f})")
print(f" UI version impact: {'Significant' if ui_comparison_stats['statistical_tests']['ttest_p_value'] < 0.05 else 'Not significant'} difference (p={ui_comparison_stats['statistical_tests']['ttest_p_value']:.4f})")
print(f" Launch timing effect: {'Positive' if correlation_results['days_launch_correlation']['correlation_coefficient'] > 0 else 'Negative'} correlation ({correlation_results['days_launch_correlation']['correlation_coefficient']:.3f})")
print(f" Non-adoption rate (score <3): {usage_distribution['low_usage_categories']['below_3']['percentage']:.1f}%")
print(f" Light usage rate (score <5): {usage_distribution['low_usage_categories']['below_5']['percentage']:.1f}%")
print(f"\nTotal Plotly figures created: {len(plotly_figs)}")
print("Analysis complete!")
Feature X is experiencing a critical adoption failure driven by three interconnected product issues that require immediate intervention.
Root Cause: Triple Product Failure
The low adoption isn't due to lack of user interest but rather systematic product failures:
Key Takeaways
Recommended Next Steps
This is a solvable problem - the data shows clear paths to improvement through onboarding, UI fixes, and targeted user segment strategies.