A Linear Regression project to analyze and predict sales based on TV, Radio, and Newspaper advertisement budgets.
This project demonstrates how linear regression can be applied to a marketing dataset to determine the relationship between advertising spend and sales performance.
- Linear Regression using Scikit-learn
- Data visualization with Seaborn and Matplotlib
- Model evaluation using MAE, MSE, RΒ² Score, and cross-validation
- Outlier detection and comparison of model performance before and after removing outliers
| Metric | With Outliers | Without Outliers |
|---|---|---|
| Accuracy (RΒ²) | 89.94% | 87.82% |
| MAE | 0.28 | 1.39 |
| MSE | 0.12 | 3.26 |
Insights:
- TV and Radio have the strongest influence on sales.
- Newspaper budget has minimal impact.
- Removing outliers reduces accuracy slightly but offers more stable predictions.
- Use the one without outliers when you want a more stable and fair predictor.
- Use the one with outliers when you're okay with higher sensitivity but possible overfitting.
- Clone the repository
- Open the Jupyter/Colab notebook
- Run all cells to view analysis and predictions