Welcome to my Customer Churn Prediction Platform project! This project demonstrates a complete data science workflow — from raw data ingestion, transformation, and model training to deployment as an interactive web application.
Customer churn — when users stop subscribing or using a service — is a critical problem for subscription-based businesses. Losing customers not only reduces revenue but also increases the cost of acquiring replacements.
This project aims to build an automated system that can:
- Identify high-risk customers before they churn.
- Provide an interactive tool for business teams to perform "what-if" analysis on various customer profiles.
This project is built using a modern data stack, separating each phase of the data lifecycle. The workflow is as follows:
- Ingestion: Raw data is uploaded to Google Cloud Storage (GCS), acting as the data lake.
- Warehousing: Data is loaded into Google BigQuery for structured storage and querying.
- Transformation: dbt (data build tool) is used to clean, transform, and shape the raw data into analytics-ready models.
- Modeling: A Jupyter notebook pulls the transformed data from BigQuery, trains a classification model using Scikit-learn, and saves it as a model artifact.
- Deployment: An interactive web app is built using Streamlit, loading the trained model and serving predictions to end-users.
- Cloud Provider: Google Cloud Platform (GCP)
- Data Storage: Google Cloud Storage, Google BigQuery
- Data Transformation: dbt (data build tool)
- Machine Learning: Python, Pandas, Scikit-learn
- Web Application: Streamlit
- Orchestration & Automation (Local): Python Scripts, Jupyter Notebook
Follow these steps to run the project on your local machine:
-
Clone the Repository
git clone https://github.com/bestoism/ChurnPredictionPlatform.git cd ChurnPredictionPlatform -
Install Dependencies
Ensure Python 3.8+ and pip are installed.pip install -r requirements.txt
-
Set Up Google Cloud Authentication
gcloud auth application-default login
-
Run dbt Transformations
cd churn_analytics dbt run cd ..
-
Launch the Streamlit App
streamlit run app.py
The app will be available at
http://localhost:8501.
- The model achieved around 80% accuracy in predicting churn on the test dataset.
- Biggest Challenge: Handling data type inconsistencies between BigQuery, Pandas, and Scikit-learn — highlighting the importance of explicit and consistent data validation.
- This project reinforced the value of modular data architecture and using the right tool for the right task (e.g., dbt for transformations, Streamlit for interactivity).
Feel free to fork this project, open an issue, or contribute! 🚀
🚧 This project is under construction — stay tuned for more updates!
