Skip to content

MehmetcanMutlu/Telco-Churn-SQL-ML

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

9 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

Telco Churn Prediction with Advanced SQL & CatBoost

This project implements an end-to-end churn prediction pipeline. It simulates a telecom company's relational database to perform feature engineering using SQL Window Functions and trains a CatBoost classifier to identify high-risk customers.

The primary goal is to demonstrate the ability to handle data transformation at the database level (ELT) rather than relying solely on in-memory processing, followed by state-of-the-art machine learning modeling.

πŸ“‹ Project Overview

  1. Data Simulation: Generates a relational database (SQLite) with realistic patterns for customers, call logs, and complaints.
  2. SQL Feature Engineering: Uses CTEs, Aggregations, and Window Functions (e.g., calculating trend changes over time) to extract features directly from the raw database tables.
  3. Machine Learning: Trains a CatBoost Classifier to predict customer churn, optimizing for Recall to capture as many potential churners as possible.

πŸ›  Tech Stack

  • Language: Python 3.x
  • Database: SQLite
  • Data Manipulation: SQL (Window Functions, Joins), Pandas
  • Machine Learning: CatBoost, Scikit-learn
  • Version Control: Git

πŸ“Š Key Results

The model achieved high performance in distinguishing between loyal and churning customers.

  • ROC-AUC Score: 0.9367
  • Recall (Churn Class): 0.78 (Correctly identified 78% of leaving customers)
  • Precision (Churn Class): 0.70

Top Predictive Features:

  1. calls_last_30_days (Derived via SQL: Significant drop in usage)
  2. total_complaints (Customer dissatisfaction signal)
  3. total_calls (General usage volume)

πŸ“‚ Project Structure

β”œβ”€β”€ data/                  # Stores generated database and model artifacts
β”œβ”€β”€ src/
β”‚   β”œβ”€β”€ db_generator.py    # Simulates customers, calls, and complaints data
β”‚   β”œβ”€β”€ feature_store.py   # Extracts features using complex SQL queries
β”‚   └── train_model.py     # Trains, evaluates, and saves the CatBoost model
β”œβ”€β”€ requirements.txt       # Project dependencies
└── README.md              # Project documentation

About

End-to-end Telco Churn Prediction pipeline using advanced SQL for Feature Engineering (Window Functions) and Machine Learning for classification.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages