Optimize vid_vec_rep_clip operator for long videos and add profiling …#624
Conversation
|
hello @Ishaswm thank you for the fix, there are still some changes needed to be made |
|
hi @Ishaswm I have left some more comments above since all the code changes you have made are related to bechmarking, I think you should the following
|
- Create dedicated benchmark module for profiling - Move performance tests to operators/benchmark/ - Keep operator code focused on core functionality
…benchmarking - Replaced shell-based ffmpeg call with cross-platform subprocess.run() for robust I-frame extraction. Improved error handling with check=True and stderr capture. - Added comprehensive benchmark documentation with performance stats table.
…add benchmarking docs - Replaced shell-based ffmpeg call with subprocess.run() for robust I-frame extraction. - Improved error handling with check=True and stderr capture. - Added benchmark README with performance stats table.
|
@aatmanvaidya I would love to join the Tattle Slack to stay in the loop. Could you please send me an invite to your workspace at my email address: ishaswami52003@gmail.com |
|
Hi @Ishaswm, have sent you an invite. |
| import sys | ||
| sys.path.insert(0, os.path.abspath(os.path.join(os.path.dirname(__file__), '../../..'))) |
There was a problem hiding this comment.
this is an windows specific change right? If yes, let's remove it
There was a problem hiding this comment.
can you move this file to the benchmark folder?
|
hi @Ishaswm have left some minor comments above |


Optimization Notes for
vid_vec_rep_clipProblem Statement
The current implementation of the
vid_vec_rep_clipoperator lacks support for processing longer videos efficiently and reliably. Specifically, we wanted to investigate:Goals
Findings from Original Implementation
Inconsistencies: Memory usage for the 60-min video is unexpectedly lower than for the 30-min video, suggesting inefficient memory handling or potential leaks in intermediate steps.
Results After Refactor
Note: Longer videos show memory increase due to more efficient baseline measurement.
Performance Comparison
Memory-Optimized vs Original Implementation
Memory Optimization Highlights:
81% reduction for 30-minute videos (1917MB → 365MB)
73.7% savings for 10-minute videos (1858MB → 488MB)
More stable memory profile across all durations.
Processing Tradeoffs:
37-121% longer processing for videos ≤30 minutes.
14-22% faster for very long videos (>45 minutes)
More accurate performance measurements
Key Improvements
Efficient I-Frame Sampling
ffmpeg, reducing unnecessary frame processing and improving memory efficiency.Built-in Memory Profiling:
Integrated
psutilandtracemallocto monitor memory usage before and after processing.Reports net memory change, helping diagnose scaling issues.
Scalable to Long Videos:
Successfully tested on videos up to 1 hour, showing stable memory growth.
Reports net memory change, helping diagnose scaling issues.
Enhanced Test Coverage:
Summary of Changes
This PR introduces the following enhancements to the
vid_vec_rep_clipoperator:I-Frame Sampling Strategy:
Instead of decoding every frame or relying on precomputed metadata, the updated operator uses
ffmpegto extract only I-frames for vector representation. This reduces redundancy and improves scalability.Streaming Feature Extraction:
Frames are now loaded and processed in a streaming manner (one at a time) using temporary storage, preventing memory bloat.
Detailed Profiling Added to Tests:
The
unittestsuite has been enhanced to capture:tracemalloc)Average Vector Addition:
The final output includes a mean vector of all I-frame features, maintaining consistency with prior behavior.
Limitations
I-Frame Distribution: The number of I-frames is determined by video encoding, so a shorter video could occasionally have more I-frames than a longer one. This is expected and valid behavior.
Processing Time: The new implementation may take slightly longer for short videos due to I-frame extraction overhead, but this tradeoff is acceptable given the lower memory usage and improved scalability.
No Parallelism Yet: Current implementation processes frames sequentially. There’s room for future speedup via batching or multithreading.
Checklist