This project showcases an end-to-end ETL pipeline and data warehouse built for the Olist Brazilian e-commerce dataset. Using Apache Airflow, PostgreSQL, and Power BI, the pipeline enables analytical reporting and business insights through a structured star schema data model.
- ETL Orchestration: Apache Airflow (with Git-sync auto-deployment)
- Data Source: CSV → MySQL (simulated raw layer)
- Storage & Modeling: PostgreSQL with star schema design
- Languages: Python, SQL
- Visualization: Microsoft Power BI
- Containerization: Docker
- Build a modern data pipeline with clear separation of concerns (bronze, silver, gold layers)
- Design a scalable star schema for analytical queries
- Enable business users to explore key metrics via dashboards
The pipeline follows a layered architecture inspired by the Medallion model:
- Bronze (Raw): Load CSV data into MySQL → extract into PostgreSQL staging.
- Silver (Staging): Clean and transform data (resolve nulls, data types, joins).
- Gold (Warehouse): Load clean data into star schema tables for BI use.
DAGs are orchestrated with Airflow, and Git-sync ensures CI/CD workflow for DAGs deployment.
- Fact Table:
order_fact - Dimension Tables:
dim_customers,dim_products,dim_sellers,dim_dates,dim_geolocation
Visualizations were created using Power BI to help answer key business questions:
- Top-performing states, categories, sellers
- Delivery performance & review scores
- Monthly revenue trends
- Source: Kaggle - Olist Brazilian E-commerce
- Time Period: 2016–2018
- Data Volume: 100K+ orders, 70K+ products, 100K+ customers
Olist is a Brazil-based platform that connects small merchants with large marketplaces. It simplifies operations by centralizing product listings, logistics, and customer communication.
For any questions or collaboration, feel free to reach out!




