🎬 Video Scene Classification System

AI-powered video analysis tool — search inside any video using natural language.

📌 What It Does

Upload any video and describe a scene in plain English — the system finds the exact timestamp where that scene appears.

Example queries:

"a boat on the ocean at sunset"
"person walking in a forest"
"city street with traffic"

The system processes video frames using OpenAI's CLIP model (zero-shot image-text alignment) and returns the matched timestamp with video playback jumping directly to that moment.

🧠 How It Works

User uploads video + types scene description
        ↓
OpenCV extracts 1 frame/sec from video
        ↓
CLIP encodes each frame → image embeddings
CLIP encodes text query → text embedding
        ↓
Cosine similarity computed between text ↔ all frames
        ↓
Top matching frame returned with timestamp
        ↓
React frontend seeks video to that timestamp

🛠️ Tech Stack

Layer	Technology
Frontend	React.js, Bootstrap, react-router-dom
Backend	Python, Flask, Flask-CORS
AI / ML	OpenAI CLIP (ViT-B/32), Sentence Transformers
Video Processing	OpenCV (1 frame/sec extraction)
Similarity	Cosine Similarity (normalized dot product)

📁 Project Structure

Video-Scene-Classification-System/
├── Backend/
│   ├── app.py                    # Flask API server
│   ├── process_video_frames.py   # OpenCV frame extraction
│   ├── clip_model.py             # CLIP image & text embeddings
│   ├── nlp_model.py              # Sentence Transformer integration
│   └── requirements.txt
├── Frontend/
│   └── my-project/
│       ├── src/
│       │   ├── App.js            # Root component + routing
│       │   └── components/
│       │       ├── Home.js       # Landing page
│       │       ├── Upload.js     # Video upload + scene search (core)
│       │       ├── Team.js
│       │       └── FAQ.js
│       └── package.json
├── start-servers.bat             # One-click start (Windows)
└── README.md

🚀 Getting Started

Prerequisites

Python 3.8+
Node.js 16+

1. Clone the repo

git clone https://github.com/RishabThapliyal/Video-Scene-Classification-System.git
cd Video-Scene-Classification-System

2. Backend setup

cd Backend
python -m venv venv
venv\Scripts\activate        # Windows
# source venv/bin/activate   # Mac/Linux

pip install -r requirements.txt
python app.py

Backend runs at http://localhost:5000

3. Frontend setup

cd Frontend/my-project
npm install
npm start

Frontend runs at http://localhost:3000

Quick Start (Windows)

# Starts both servers at once:
start-servers.bat

🔌 API Reference

Method	Endpoint	Description
`POST`	`/api/upload_video`	Upload video, receive `video_id`
`POST`	`/api/search_scene`	Query with `video_id` + `query` string
`GET`	`/api/thumbnail/<video_id>/<ts>`	Get frame thumbnail at timestamp

Search request:

{
  "video_id": "abc123",
  "query": "a boat on calm water"
}

Response:

{
  "timestamp": "00:00:26",
  "confidence_score": 0.84
}

⚙️ Key Design Decision — 1 Frame/sec Sampling

Processing every frame at 30fps with CLIP is computationally expensive. Sampling at 1 frame/sec gives a 30x reduction in CLIP inference calls while preserving sufficient semantic coverage for scene-level search.

Video (10 min @ 30fps)	Frames processed
Full processing	18,000
1 frame/sec sampling	600 ✅

⚠️ Limitations

Not deployed — requires local setup (ML models too large for free hosting)
GPU recommended for videos longer than 5 minutes; CPU inference is slow
1 frame/sec sampling can miss very short (sub-second) events
Purely visual — no audio analysis

👥 Team

Built as B.Tech Major Project at Graphic Era Hill University, Dehradun (June 2025).

Name	Roll No
Rishab Thapliyal	2119013
Shubham Singh Karki	2119234
Vimal Singh Panwar	2119423
Yugraj	2119460

Guide: Dr. Amrish Sharma, Professor of Practice, CSE Dept.

📄 License

MIT License — open source, free to use.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🎬 Video Scene Classification System

📌 What It Does

🧠 How It Works

🛠️ Tech Stack

📁 Project Structure

🚀 Getting Started

Prerequisites

1. Clone the repo

2. Backend setup

3. Frontend setup

Quick Start (Windows)

🔌 API Reference

⚙️ Key Design Decision — 1 Frame/sec Sampling

⚠️ Limitations

👥 Team

📄 License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
Backend		Backend
Frontend		Frontend
NOTES		NOTES
.gitignore		.gitignore
README.md		README.md
check-servers.py		check-servers.py
start-servers.bat		start-servers.bat

Folders and files

Latest commit

History

Repository files navigation

🎬 Video Scene Classification System

📌 What It Does

🧠 How It Works

🛠️ Tech Stack

📁 Project Structure

🚀 Getting Started

Prerequisites

1. Clone the repo

2. Backend setup

3. Frontend setup

Quick Start (Windows)

🔌 API Reference

⚙️ Key Design Decision — 1 Frame/sec Sampling

⚠️ Limitations

👥 Team

📄 License

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages