Overview
An end-to-end data science project that transforms transaction-level retail data into actionable customer insights. Using the UCI Online Retail dataset, the analysis focuses on revenue growth, customer retention, and marketing efficiency through behavioral analytics and predictive modeling.
Key Questions
- How concentrated is revenue across customers?
- Are repeat customers more valuable than new customers?
- Which segments drive the most revenue?
- Which customers have the highest future value?
- Which high-value customers are at risk of churn?
Approach
- Cleaned and explored real-world transactional data to analyze revenue distribution and seasonality
- Built a transaction-based customer lifecycle funnel in the absence of session data
- Segmented customers using RFM analysis and clustering into business-friendly groups
- Predicted Customer Lifetime Value (CLV) using time-aware modeling
- Modeled churn risk behaviorally and translated it into revenue at risk
Impact & Insights
- Identified that a small subset of customers drives a disproportionate share of revenue
- Ranked customers by future value, not just historical spend
- Quantified churn-driven revenue risk to support targeted retention strategies
- Demonstrated how focused retention delivers higher ROI than broad campaigns
Tools & Techniques
Python, SQL, scikit-learn, EDA, RFM segmentation, CLV modeling, churn prediction
Future Work
- Incorporate clickstream data for true conversion funnels
- Add cost data for profit-based CLV
- Validate retention strategies using A/B testing or uplift modeling
- Deploy insights via an interactive dashboard or API