Causal Inference on Cloud Revenue Lift
AI Coding Tool Spillover Analysis
Built a reproducible causal analytics pipeline: panel construction with event-time alignment, K-means segmentation for heterogeneous effects, SMOTE + propensity matching for covariate balance, and difference-in-differences with clustered standard errors, bootstrap CIs, and placebo timing tests. Entire demo runs on synthetic data with a known injected lift.
Challenge
When customers adopt AI-assisted coding tools, does semi-related cloud service revenue actually rise? Rare adopters, unbalanced panels, and selection bias make naive before/after comparisons unreliable for program investment decisions.
Solution
Built a reproducible causal analytics pipeline: panel construction with event-time alignment, K-means segmentation for heterogeneous effects, SMOTE + propensity matching for covariate balance, and difference-in-differences with clustered standard errors, bootstrap CIs, and placebo timing tests. Entire demo runs on synthetic data with a known injected lift.
Impact Metrics
Results
- • 5.84% DiD percent lift on semi-related service revenue (clustered 95% CI: 3.50%–8.23%)
- • Bootstrap mean lift 5.95% (95% CI 3.55%–8.37%)
- • Placebo adoption date lift −0.31% (p = 0.84) — no spurious effect
- • Recovered ~5% ground-truth lift baked into synthetic generator
- • Covariate balance improved via SMOTE + matching (SMD reports in outputs/)
Business Impact
Demonstrates how to answer “did this initiative work?” with defensible causal methods—not just dashboards—so leaders can fund programs with confidence intervals, not anecdotes.
Architecture
Runnable portfolio MVP (synthetic data only). GitHub-ready repo with PLAN.md spec and case_study.md executive summary.
Causal Inference Pipeline
A proven approach combining statistical rigor, automation, and AWS best practices.
Panel & Event Time
Merge revenue and activity signals; align each account to first coding-tool adoption with a ±6 month event window.
Clustering
K-means on pre-treatment features (elbow + silhouette) to segment customer archetypes for heterogeneous DiD.
Matching + SMOTE
Balance treated vs control on pre-period covariates; report standardized mean differences before and after.
DiD + Robustness
Estimate treat×post with account-clustered SEs; event-study plot, bootstrap CIs, and placebo adoption timing.
Scale & Scope
Technology Stack
- Python 3.11 + pandas / statsmodels
- scikit-learn (K-means, SMOTE)
- Jupyter walkthrough notebook
- matplotlib / seaborn figures
Need a similar solution?
Let's replicate this success within your organization with a tailored engagement plan.