IMDb Movies Analysis and Dashboard

Project Overview

This project involves web scraping, data preprocessing, database storage, and visualization of IMDb movie data from the last decade (2014-2024). The dataset includes details of 10,000 movies such as name, release year, genre, ratings, actors, directors, metascore, and more. The project culminates in an interactive Power BI dashboard for in-depth insights and reporting.

Technologies Used

Web Scraping: BeautifulSoup, Selenium, Requests, Time, Random
Data Processing: Pandas, NumPy
Machine Learning: Scikit-learn (RandomForestClassifier for genre prediction), TfidfVectorizer
Database Management: MySQL, SQLAlchemy, MySQL Connector
Visualization: Power BI

Project Steps

1. Web Scraping

Used Selenium for automated navigation (clicking, entering data, filtering, and loading more results).
Extracted 10,000 movie links from IMDb using BeautifulSoup.
Iterated through each movie link to collect metadata such as genres, ratings, directors, actors, and metascore.
Implemented exception handling to manage missing values and errors.
Created a structured dataset and saved it as a CSV file.

2. Data Preprocessing & Cleaning

Handled missing values and inconsistencies (e.g., correcting misplaced values in duration and rated columns).
Filled missing movie names based on actual movie titles.
Used RandomForestClassifier to predict missing genres using movie descriptions.
Finalized the cleaned dataset and stored it in a MySQL database.

3. Database Management (MySQL)

Created an imdb database and imported the cleaned dataset.
Executed various SQL queries for insights, such as:
- Movies released in 2024
- Top 5 highest-rated movies
- Top 10 horror movies
- Movies directed by Christopher Nolan
- Highest-rated movie per genre
- Ranking movies by Metascore and ratings
- Most reviewed movies
- Longest-duration movie per genre
- Best-rated movie for each content rating

4. Power BI Dashboard

The Power BI dashboard consists of four interactive pages:

Page 1: Individual Movie Insights

Slicer for selecting a movie name.
Display of key metrics such as duration, ratings, number of ratings, release year, genre, description, user reviews, critic reviews, and metascore.
TreeMap showing total ratings by genre.

Page 2: Genre and Year-wise Trends

Slicers for selecting genre and release year.
Bar chart showing total movies by genre.
Gauge chart displaying average Metascore.
Donut chart visualizing content rating breakdown.
Line chart illustrating the trend of total movie releases per year.
Key performance indicators (KPIs): total movies, average duration, average ratings, total number of ratings.

Page 3: Ratings & Metascore Analysis

Slicers for genre and release year.
Measures displaying best-rated movie and best Metascore movie, with images.
Scatter plot showing the relationship between user ratings and Metascore.
Table ranking movies by Metascore.

Page 4: Actors, Directors & Duration Insights

Slicers for genre and release year.
Measure displaying the longest movie by duration, with an image.
Bar chart showing top actors by the number of movies they appeared in.
Line chart comparing average user reviews vs. critic reviews by genre.
Column charts:
- Top 5 movies by ratings.
- Top 10 directors by the number of movies directed.
- Longest duration movies by genre.

5. Interactive Features

Filters for genre and release year applied across all pages.
Buttons to clear all slicer filters.
Navigation buttons to move between different pages.

How to Run the Project

1. Web Scraping & Data Processing

Open and run Scrapping_IMDb_movies.ipynb in Jupyter Notebook to extract movie data.
Run Preprocessing-Modelling-Data Ingestion.ipynb to clean, preprocess, and store data in MySQL.

2. Power BI Dashboard

Open IMDb Dashboard.pbix in Power BI Desktop to explore interactive visualizations.

Future Enhancements

Expand the dataset beyond 10,000 movies for deeper analysis.
Improve genre prediction using advanced NLP techniques.
Automate data updates to keep the dashboard current.
Deploy the dashboard online for wider accessibility.

Author

This project was created by Rafi Qamar. For any inquiries or collaborations, feel free to reach out!

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
IMDb Dashboard Demo.gif		IMDb Dashboard Demo.gif
IMDb Dashboard.pbix		IMDb Dashboard.pbix
IMDb Dashboard.pdf		IMDb Dashboard.pdf
IMDb Published Dashboard to interact.txt		IMDb Published Dashboard to interact.txt
Preprocessing-Modelling-Data Ingestion.ipynb		Preprocessing-Modelling-Data Ingestion.ipynb
README.md		README.md
Scrapping_IMDb_movies.ipynb		Scrapping_IMDb_movies.ipynb
Scrapping_To_Fill_Missing_Values.ipynb		Scrapping_To_Fill_Missing_Values.ipynb
imdb_sql_queries.sql		imdb_sql_queries.sql

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

IMDb Movies Analysis and Dashboard

Project Overview

Technologies Used

Project Steps

1. Web Scraping

2. Data Preprocessing & Cleaning

3. Database Management (MySQL)

4. Power BI Dashboard

Page 1: Individual Movie Insights

Page 2: Genre and Year-wise Trends

Page 3: Ratings & Metascore Analysis

Page 4: Actors, Directors & Duration Insights

5. Interactive Features

How to Run the Project

1. Web Scraping & Data Processing

2. Power BI Dashboard

Future Enhancements

Author

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

IMDb Movies Analysis and Dashboard

Project Overview

Technologies Used

Project Steps

1. Web Scraping

2. Data Preprocessing & Cleaning

3. Database Management (MySQL)

4. Power BI Dashboard

Page 1: Individual Movie Insights

Page 2: Genre and Year-wise Trends

Page 3: Ratings & Metascore Analysis

Page 4: Actors, Directors & Duration Insights

5. Interactive Features

How to Run the Project

1. Web Scraping & Data Processing

2. Power BI Dashboard

Future Enhancements

Author

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages