VTrim

VTrim is a lightweight, efficient video analysis and trimming tool. It automatically finds segments containing people or speech and can output a trimmed video instantly—without re-encoding, preserving original quality at blazing speed.

• ⚡ Lossless • 🎥 Professional edit-ready XML • 🔍 AI-powered detection • 🎤 Voice Activity Detection

Features

🚀 Fast Analysis: Model caching and batch inference for 50-80% faster processing
✂️ Lossless Trimming: FFmpeg stream copy (-c copy) - no quality degradation
🎬 Professional XML: Export FCP7 XML for DaVinci Resolve/Premiere Pro
🤖 AI Detection: YOLOv8 human detection with configurable sensitivity
🎤 Voice Activity Detection: Silero VAD for detecting speech segments
⚙️ Flexible Configuration: Centralized config for easy customization
📊 JSON Output: Machine-readable results for automation

Installation

Install via pip:

pip install vtrim

Quick Start

Basic Usage

Analyze Video (Detect Humans)

vtrim --input video.mp4
# or use the short form:
vtrim -i video.mp4

Output:

{"segments": [{"start": 2.3, "end": 5.8}, {"start": 10.1, "end": 14.7}]}

Detect Speech Segments (VAD Enabled by Default)

vtrim --input video.mp4
# or use the short form:
vtrim -i video.mp4

By default, VTrim runs both human detection (YOLOv8) and voice activity detection (Silero VAD). This ensures comprehensive coverage:

Segments where people are visible on camera
Segments where someone is speaking, even if not visible
Perfect for lectures, meetings, interviews, and podcasts

To disable VAD and use only human detection:

vtrim -i video.mp4 --no-vad

Trim Video Directly

vtrim --input your_video.mp4 --output output.mp4
# or use short forms:
vtrim -i your_video.mp4 -o output.mp4

Uses FFmpeg stream copy (-c copy) → no re-encoding, no quality loss.
Automatically merges nearby detections and adds padding for smooth transitions.

Trim Video with Comprehensive Detection (Default)

vtrim --input lecture.mp4 --output complete_trim.mp4

By default, this keeps segments where either:

A person is visible on camera (human detection), OR
Someone is speaking (VAD detection)

Ideal for lectures, meetings, or any content where important audio might occur without visual presence.

To use only human detection (disable VAD):

vtrim --input video.mp4 --no-vad --output human_only.mp4

Export Edit Timeline to DaVinci Resolve / Premiere Pro

Preserve the full timeline (including gaps) as an FCP7 XML for professional editing:

vtrim --input your_video.mp4 --export-xml timeline.xml
# or:
vtrim -i your_video.mp4 --export-xml timeline.xml

Audio and video are perfectly synchronized and split per segment.

Advanced Examples

High Sensitivity Human Detection

vtrim --input video.mp4 \
      --conf-threshold 0.15 \
      --output sensitive_trim.mp4

Lower threshold = more detections (including false positives)

Conservative Detection with Large Padding

vtrim --input video.mp4 \
      --conf-threshold 0.4 \
      --padding 3.0 \
      --output conservative_trim.mp4

Higher threshold + more padding = fewer, longer segments

Merge All Nearby Detections

vtrim --input video.mp4 \
      --gap-tolerance 10.0 \
      --output merged_trim.mp4

Large gap tolerance merges nearby segments into continuous blocks

Sensitive Speech Detection

vtrim --input podcast.mp4 \
      --vad-threshold 0.3 \
      --output complete_podcast.mp4

Lower VAD threshold captures quieter speech, combined with human detection for comprehensive coverage.

Strict Speech Detection

vtrim --input interview.mp4 \
      --vad-threshold 0.7 \
      --padding 0.5 \
      --output focused_interview.mp4

Higher VAD threshold ensures only clear speech is added to human-detected segments.

Disable VAD (Human Detection Only)

vtrim --input video.mp4 --no-vad --output human_only.mp4

Use this when you only want to detect visual presence of people, without audio detection.

Combined Workflow

vtrim --input your_video.mp4 \
      --output output.mp4 \
      --export-xml timeline.xml
# or use short forms for faster typing:
vtrim -i your_video.mp4 -o output.mp4 --export-xml timeline.xml

Get Raw Detection Results (JSON)

Print detected time segments to stdout for scripting or integration:

vtrim --input meeting.mp4

Output:

{
  "segments": [
    { "start": 2.3, "end": 5.8 },
    { "start": 10.1, "end": 14.7 }
  ]
}

Programmatic Usage (Python API)

from vtrim.analyzer import detect_human
from vtrim.vad_analyzer import detect_speech
from vtrim.segment_utils import merge_segments, apply_padding
from vtrim.ffmpeg_utils import cut_video_with_ffmpeg
from vtrim.xml_export import export_fcp7_xml
from vtrim import Config

# Detect humans
raw_segments = detect_human("video.mp4", conf_threshold=0.25)

# OR detect speech using VAD (can be combined with human detection)
speech_segments = detect_speech("video.mp4", vad_threshold=0.5)

# Combine both detection results
all_segments = raw_segments + speech_segments

# Process segments
merged = merge_segments(raw_segments, gap_tolerance=4.0)
padded = apply_padding(merged, padding=1.0)

# Cut video
cut_video_with_ffmpeg("video.mp4", padded, "output.mp4")

# Export XML
export_fcp7_xml("video.mp4", padded, "timeline.xml", video_duration=120.5)

# Access configuration
print(f"Default threshold: {Config.CONF_THRESHOLD}")
print(f"Default VAD threshold: {Config.VAD_THRESHOLD}")
print(f"Default padding: {Config.PADDING}")

Command-Line Options

Option	Type	Default	Description
`--input`, `-i`	Required	-	Path to input video file
`--output`, `-o`	Optional	-	Path to save trimmed video
`--export-xml`	String	-	Path to export FCP7 XML
`--no-vad`	Flag	Off	Disable Voice Activity Detection (VAD is enabled by default)
`--conf-threshold`	Float	0.25	Detection confidence (0.0-1.0), for human detection
`--vad-threshold`	Float	0.5	Speech detection confidence (0.0-1.0)
`--padding`	Float	1.0	Seconds added before/after segments
`--gap-tolerance`	Float	4.0	Max gap to merge segments
`--verbose`	Flag	Off	Show detailed progress

📌 Note: By default, VTrim runs both human detection (YOLOv8) and voice activity detection (Silero VAD). Use --no-vad to disable speech detection if you only want visual presence.

Performance

Optimizations (v0.1.4+)

Model Caching: 50-80% faster on subsequent runs (singleton pattern)
Batch Inference: 20-30% faster processing (batch size = 4)
Dynamic Resolution: Automatic video metadata detection
Enhanced Error Handling: Better validation and error messages

Benchmark Example

For a 10-minute video at 30 FPS:

Before: ~3-4 minutes total
After: ~2-2.5 minutes (with cached model)

Configuration

All defaults are defined in vtrim/config.py:

from vtrim import Config

# Customize settings
Config.CONF_THRESHOLD = 0.15  # Higher sensitivity
Config.PADDING = 2.0          # More padding
Config.GAP_TOLERANCE = 10.0   # Merge nearby detections
Config.SAMPLE_FPS = 2.0       # Analysis sample rate (2 FPS)
Config.BATCH_SIZE = 4         # Inference batch size

Output Formats

JSON Output (stdout)

Machine-readable format for scripting:

{
  "segments": [
    {"start": 2.3, "end": 5.8},
    {"start": 10.1, "end": 14.7}
  ]
}

FCP7 XML (DaVinci Resolve / Premiere Pro)

Compatible with:

DaVinci Resolve
Adobe Premiere Pro
Final Cut Pro 7

Features:

Full timeline (valid + invalid segments)
Color-coded clips (blue=keep, gray=skip)
Synchronized audio/video
Frame-accurate timing

Trimmed Video

Format: MP4 (same as input)
Codec: Unchanged (stream copy)
Quality: Lossless (no re-encoding)

Troubleshooting

Error: "FFmpeg not found"

Solution: Install FFmpeg:

# macOS
brew install ffmpeg

# Ubuntu/Debian
sudo apt-get install ffmpeg

# Windows
# Download from https://ffmpeg.org/download.html

Error: "Input video file not found"

Solution: Check that the file path is correct (absolute or relative to current directory).

Error: "No human segments detected"

Solutions:

Lower --conf-threshold (e.g., 0.15 for higher sensitivity)
Verify the video actually contains people
Check that vtrim/yolov8n.pt model file exists

Slow Analysis

Tips:

First run downloads the model (one-time delay)
Subsequent runs are 50-80% faster (model cached)
Reduce Config.SAMPLE_FPS for faster but less accurate analysis

Best Practices

Test with short videos first: Verify settings before processing long videos
Keep original backups: Always preserve source files until satisfied
Use verbose mode for debugging: vtrim --input video.mp4 --verbose
Combine outputs for flexibility: Generate both trimmed video AND XML timeline

Requirements

Python 3.7+
FFmpeg (must be in PATH)
Dependencies:
- opencv-python
- ultralytics
- silero-vad
- torch
- torchaudio
- onnxruntime
- setuptools

Environment Variables

Variable	Values	Effect
`ANALYZER_PROGRESS_JSON`	"0" (default), "1"	Output progress as JSON to stderr

Example:

ANALYZER_PROGRESS_JSON=1 vtrim --input video.mp4

Project Structure

vtrim/
├── __init__.py          # Package initialization, exports Config
├── analyzer.py          # Human detection logic
├── vad_analyzer.py      # Voice Activity Detection logic
├── cli.py              # Command-line interface
├── config.py           # Configuration settings
├── ffmpeg_utils.py     # FFmpeg video processing
├── model.py            # YOLO model loading
├── segment_utils.py    # Segment merging/padding
├── xml_export.py       # FCP7 XML export
└── yolov8n.pt          # Pre-trained YOLO model

Notes

The underlying model is YOLOv8n (PyTorch format), optimized for CPU inference.
By default, VTrim uses both YOLOv8 (human detection) and Silero VAD (speech detection) for comprehensive coverage.
Use --no-vad to disable speech detection if you only need visual presence detection.
Video trimming uses FFmpeg stream copy (-c copy), so it's fast and lossless—no quality degradation.
Progress updates are printed to stderr during analysis (every 5% for known-length videos).
For automation, set the environment variable ANALYZER_PROGRESS_JSON=1 to receive machine-readable progress messages on stderr.

Documentation

README.md: This file - comprehensive overview and quick start guide
CHANGELOG.md: Detailed version history, optimizations, and upgrade notes

For more detailed usage examples and advanced configurations, see the inline documentation in vtrim/config.py and individual module docstrings.

Support

GitHub: https://github.com/chiaweilee/vtrim
Issues: https://github.com/chiaweilee/vtrim/issues
License: Apache License v2

Version

Current version: 0.3.0

See CHANGELOG.md for the latest updates and migration notes.

Name		Name	Last commit message	Last commit date
Latest commit History 21 Commits
vtrim		vtrim
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt
setup.py		setup.py

Folders and files

Latest commit

History

Repository files navigation