Skip to content

Optimize vid_vec_rep_clip operator for long videos and add profiling …#624

Merged
aatmanvaidya merged 11 commits intotattle-made:developmentfrom
Ishaswm:feat/optimize-vid-vec-rep
May 19, 2025
Merged

Optimize vid_vec_rep_clip operator for long videos and add profiling …#624
aatmanvaidya merged 11 commits intotattle-made:developmentfrom
Ishaswm:feat/optimize-vid-vec-rep

Conversation

@Ishaswm
Copy link
Copy Markdown
Contributor

@Ishaswm Ishaswm commented May 2, 2025

Optimization Notes for vid_vec_rep_clip

Problem Statement

The current implementation of the vid_vec_rep_clip operator lacks support for processing longer videos efficiently and reliably. Specifically, we wanted to investigate:

  • Can the operator process longer videos (1min to 1hr) without breaking or exhausting system resources?
  • Is the model itself a bottleneck, or is the limitation due to code inefficiencies?
  • How does the operator perform in terms of CPU and memory usage for large video inputs?

Goals

  • Determine if the operator can process videos of varying lengths (1, 5, 10, 20, 30, 45, 60 mins).
  • Profile memory and CPU usage during execution.
  • Fix inefficiencies (if any) in the original implementation.
  • Ensure output vector correctness post-refactor.

Findings from Original Implementation

1

Inconsistencies: Memory usage for the 60-min video is unexpectedly lower than for the 30-min video, suggesting inefficient memory handling or potential leaks in intermediate steps.

Results After Refactor

2

Note: Longer videos show memory increase due to more efficient baseline measurement.

Performance Comparison

Memory-Optimized vs Original Implementation

3

Memory Optimization Highlights:

  • 81% reduction for 30-minute videos (1917MB → 365MB)

  • 73.7% savings for 10-minute videos (1858MB → 488MB)

  • More stable memory profile across all durations.

Processing Tradeoffs:

  • 37-121% longer processing for videos ≤30 minutes.

  • 14-22% faster for very long videos (>45 minutes)

  • More accurate performance measurements

Key Improvements

Efficient I-Frame Sampling

  • Switched to extracting only I-frames using ffmpeg, reducing unnecessary frame processing and improving memory efficiency.

Built-in Memory Profiling:

  • Integrated psutil and tracemalloc to monitor memory usage before and after processing.

  • Reports net memory change, helping diagnose scaling issues.

Scalable to Long Videos:

  • Successfully tested on videos up to 1 hour, showing stable memory growth.

  • Reports net memory change, helping diagnose scaling issues.

Enhanced Test Coverage:

  • Includes test cases for:
    • Local long videos (e.g., 1 hr)
    • Sample short videos
    • Remote video URLs

Summary of Changes

This PR introduces the following enhancements to the vid_vec_rep_clip operator:

  1. I-Frame Sampling Strategy:

    Instead of decoding every frame or relying on precomputed metadata, the updated operator uses ffmpeg to extract only I-frames for vector representation. This reduces redundancy and improves scalability.

  2. Streaming Feature Extraction:

    Frames are now loaded and processed in a streaming manner (one at a time) using temporary storage, preventing memory bloat.

  3. Detailed Profiling Added to Tests:

    The unittest suite has been enhanced to capture:

    • Memory usage before/after processing
    • Net memory consumption
    • CPU time and usage
    • Peak memory (from tracemalloc)
    • Total I-frames and vectors generated
  4. Average Vector Addition:

    The final output includes a mean vector of all I-frame features, maintaining consistency with prior behavior.

Limitations

  • I-Frame Distribution: The number of I-frames is determined by video encoding, so a shorter video could occasionally have more I-frames than a longer one. This is expected and valid behavior.

  • Processing Time: The new implementation may take slightly longer for short videos due to I-frame extraction overhead, but this tradeoff is acceptable given the lower memory usage and improved scalability.

  • No Parallelism Yet: Current implementation processes frames sequentially. There’s room for future speedup via batching or multithreading.

Checklist

  • ✅ Code handles long videos (1 min to 1 hour)
  • ✅ Memory and CPU profiling included
  • ✅ Documented tradeoffs (time vs. memory)
  • ✅ Old and new results clearly documented
  • ✅ Known limitations acknowledged

@aatmanvaidya aatmanvaidya self-requested a review May 2, 2025 11:10
Comment thread operators/vid_vec_rep_clip/vid_vec_rep_clip.py Outdated
@aatmanvaidya
Copy link
Copy Markdown
Collaborator

hello @Ishaswm thank you for the fix, there are still some changes needed to be made
please give me some time, I will get back with more detailed feedback soon

Comment thread operators/vid_vec_rep_clip/vid_vec_rep_clip.py Outdated
Comment thread operators/vid_vec_rep_clip/vid_vec_rep_clip.py Outdated
Comment thread operators/vid_vec_rep_clip/vid_vec_rep_clip.py Outdated
Comment thread operators/vid_vec_rep_clip/test.py
@aatmanvaidya
Copy link
Copy Markdown
Collaborator

hi @Ishaswm I have left some more comments above

since all the code changes you have made are related to bechmarking, I think you should the following

  • create a folder called "benchmark" at root
  • create a folder for the operator here and all your profiling code
  • don't add the operator file to the benchmark folder, instead call the run() function from the operator in the profiling code in the benchmark folder

- Create dedicated benchmark module for profiling
- Move performance tests to operators/benchmark/
- Keep operator code focused on core functionality
Comment thread operators/benchmark/__init__.py Outdated
Comment thread operators/vid_vec_rep_clip/1.png Outdated
Isha Swami added 2 commits May 11, 2025 07:13
…benchmarking

- Replaced shell-based ffmpeg call with cross-platform subprocess.run() for robust I-frame extraction. Improved error handling with check=True and stderr capture.
- Added comprehensive benchmark documentation with performance stats table.
…add benchmarking docs

- Replaced shell-based ffmpeg call with subprocess.run() for robust I-frame extraction.
- Improved error handling with check=True and stderr capture.
- Added benchmark README with performance stats table.
@Ishaswm
Copy link
Copy Markdown
Contributor Author

Ishaswm commented May 11, 2025

@aatmanvaidya I would love to join the Tattle Slack to stay in the loop. Could you please send me an invite to your workspace at my email address: ishaswami52003@gmail.com

@dennyabrain
Copy link
Copy Markdown
Contributor

Hi @Ishaswm, have sent you an invite.

Comment thread operators/vid_vec_rep_clip/1.png Outdated
Comment thread operators/vid_vec_rep_clip/OPTIMIZATION_NOTES.md Outdated
Comment thread operators/vid_vec_rep_clip/TECHNICAL_ANALYSIS.md
Comment thread operators/vid_vec_rep_clip/test.py Outdated
Comment thread operators/vid_vec_rep_clip/test.py Outdated
Comment on lines +9 to +10
import sys
sys.path.insert(0, os.path.abspath(os.path.join(os.path.dirname(__file__), '../../..')))
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is an windows specific change right? If yes, let's remove it

Comment thread operators/vid_vec_rep_clip/1.png Outdated
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you move this file to the benchmark folder?

Comment thread operators/vid_vec_rep_clip/TECHNICAL_ANALYSIS.md
Comment thread operators/vid_vec_rep_clip/vid_vec_rep_clip.py Outdated
@aatmanvaidya
Copy link
Copy Markdown
Collaborator

hi @Ishaswm have left some minor comments above

@omkar-334
Copy link
Copy Markdown
Contributor

omkar-334 commented May 18, 2025

Tests are failing here because frame_sample_rate is not defined in class __init__(), but used later on.

__init__()
Screenshot 2025-05-18 192230

run
Screenshot 2025-05-18 192245

@aatmanvaidya aatmanvaidya changed the base branch from main to development May 19, 2025 06:31
@aatmanvaidya aatmanvaidya merged commit 03df960 into tattle-made:development May 19, 2025
3 of 4 checks passed
@aatmanvaidya aatmanvaidya linked an issue May 19, 2025 that may be closed by this pull request
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Video Operator should process video of any length and size

4 participants