Skip to content

BeyzaNurSarikaya/energyzero-etl

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

⚡ EnergyZero ETL & Automation Pipeline

📌 Project Overview

This project is a high-performance ETL (Extract, Transform, Load) pipeline designed to handle real-time energy price data. By leveraging Dockerized Apache Airflow, it automates the transition from raw API data to analytical-ready storage.


🚀 Key Features

Phase Description Tools
Extract Retrieval of hourly energy prices from EnergyZero API. Requests, JSON
Transform Data cleaning, VAT calculation (21%), and date/time engineering. Pandas
Load Compressed and schema-enforced storage in Parquet format. PyArrow
Orchestrate Fully automated scheduling and monitoring. Airflow

🏗️ Architecture & Workflow

  1. Extract: A Python script fetches the last 7 days of electricity prices.
  2. Transform:
  • Splits ReadingDate into separate Date and Time columns.
  • Calculates Price_with_VAT (Base Price * 1.21).
  • Enforces correct data types for downstream analytics.
  1. Load: Saves the resulting dataframe as a .parquet file in the data/processed/ directory.

📂 Folder Structure

energyzero_etl/
├── 📁 dags/                # Airflow DAG definitions
├── 📁 scripts/             # Python ETL logic
├── 📁 data/
│   ├── 📁 raw/             # Raw JSON landing zone
│   └── 📁 processed/       # Optimized Parquet files
├── 🐳 Dockerfile           # Custom Airflow image
├── 🚢 docker-compose.yml   # Infrastructure as Code
└── 📄 requirements.txt     # Dependency list

🛠️ Quick Start Guide

Prerequisite: Ensure Docker Desktop is installed and running.

1. Deployment

git clone https://github.com/BeyzaNurSarikaya/energyzero-etl.git
cd energyzero-etl
docker-compose up --build -d

2. Monitoring

Access the Airflow Dashboard at http://localhost:8080:

  • User: admin
  • Pass: admin

📈 Future Enhancements

  • Integrate a PostgreSQL database as the final "Load" destination.
  • Add a Streamlit dashboard for real-time price visualization.
  • Implement Slack/Email alerts for failed pipeline tasks.

👩‍💻 Author

Beyza Nur Sarıkaya

About

"An end-to-end ETL pipeline fetching real-time energy prices from EnergyZero API, processed with Pandas, and orchestrated using Dockerized Apache Airflow."

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors