This repository provides a large-scale content recommendation system for movies and web series.
- Large Synthetic Dataset: Generates 100,000 entries that blend realistic movie and web series titles with diverse genres, casts, directors, and other attributes.
- Enhanced Diversity: Minimizes repetition through broad lists of unique titles, actors, and directors.
- TF-IDF Based Similarity: Merges metadata (genres, keywords, cast, director, and tagline) into a unified text field, then applies TF-IDF and cosine similarity to find similar content.
- 1 x N Computation: Computes similarity between the selected item and all others efficiently, avoiding the overhead of a full NxN comparison matrix.
├── generate_data.py # Script to generate the 100K-row dataset
├── recommendation_engine.py # Script to run the interactive recommendation system
├── final_100k_movies_webseries.csv # Generated CSV file (after you run generate_data.py)
├── MovieRecommendationSystem.ipynb # (Optional) Jupyter Notebook version (if any)
├── README.md # Project documentation
- Clone this repository.
- Ensure you have a Python 3.x environment set up..
Run:
python generate_data.pyThis creates a file called final_100k_movies_webseries.csv in the same directory, containing 100,000 rows of synthetic data.
After generating the dataset, run:
python recommendation_engine.pyThe program will:
- Ask you to choose Movie or Webseries.
- Display some titles from that category.
- Ask you to enter your favorite title.
- Find the closest match.
- Compute cosine similarity (1 x N) between your chosen item and all others.
- Print the top recommendations (up to 30).
- Python 3.x
- NumPy
- Pandas
- scikit-learn
Install them with:
pip install -r requirements.txtOr individually:
pip install numpy pandas scikit-learn