This project aims to predict electricity demand for rural areas using machine learning and time-series forecasting models. By analyzing factors such as weather conditions, solar availability, and time-based patterns, this tool helps estimate the load consumption patterns of a village — enabling smarter energy planning, sustainability evaluation, and intervention testing (e.g., solar, batteries). The project supports utility providers and policy makers in:
- Understanding hourly/daily electricity demand.
- Testing the impact of renewable energy setups.
- Predicting future power usage based on historical trends.
- Python
- Pandas, NumPy – Data handling & numerical computing
- Matplotlib, Seaborn – Data visualization
- Scikit-learn – Machine learning (Random Forest)
- Keras, TensorFlow – Deep learning (LSTM)
- Statsmodels – Time series forecasting (ARIMA)
- Google Colab – Interactive environment
- Data Sources: Historical electricity consumption data (final.csv) External testing dataset (gani.csv)
- Data Preprocessing: Combined date and time into a single timestamp Extracted temporal features (hour, day of week, month, quarter) Applied cyclical encoding for time variables Generated lag features and smoothed power readings Scaled data using MinMaxScaler
- Models Trained: Random Forest Regressor: Handles structured data well with minimal tuning. LSTM (Long Short-Term Memory): Deep learning model for sequence forecasting. ARIMA (AutoRegressive Integrated Moving Average): Statistical time-series model.
- Evaluation Metrics: MAE (Mean Absolute Error) RMSE (Root Mean Square Error)
- Testing: Predictions were made on unseen data (gani.csv) and results were visualized.
Model MAE ↓ RMSE ↓ ~ Random Forest 262.67 327.93 ~ ARIMA 789.35 988.36 ~ LSTM 2975.67 3081.67
Random Forest showed the best performance, balancing speed and accuracy. ARIMA captured temporal trends moderately well but lacked feature awareness. LSTM underperformed likely due to insufficient time-steps or hyperparameter tuning. Visual plots showed that Random Forest predictions closely followed actual demand trends.
Time-based features (hour, weekday) and lag values are crucial for accurate demand prediction. Machine learning models (especially Random Forest) can outperform deep learning on small, tabular datasets. Cyclical encoding of time (e.g., using sine/cosine for hours) significantly improves model understanding of periodic behavior. Scaling inputs is essential for neural networks like LSTM to converge properly. A hybrid approach (combining ML + ARIMA) could offer even more robust forecasting.
