The Problem: Creating engaging video content from long-form footage is painfully time-consuming. A typical 60-minute recording requires 4-6 hours of manual editingβwatching every frame, identifying interesting moments, cutting boring sections, adjusting speeds, adding music, and polishing transitions. For hobbyists creating scale model builds, DIY projects, or tutorial content, this workload is unsustainable. Videos pile up unedited, creative momentum dies, and content never reaches an audience.
The Solution: This AI-powered pipeline compresses weeks of manual editing into minutes of automated processing. By leveraging vision-language models, computer vision, and intelligent scene classification, the system watches your footage for you, identifies what's worth keeping, eliminates dead time, and generates broadcast-ready timelinesβcomplete with music, transitions, and dynamic speed ramping.
The Value:
- Time Savings: 60 min β 15 min final video in ~20 minutes of processing (vs. 6 hours manual editing)
- Consistency: AI applies uniform quality standards across all footage, eliminating subjective editing fatigue
- Discoverability: Automatic teaser generation highlights the best moments upfront, boosting viewer retention
- Scalability: Process entire video backlogs overnight; edit 10 videos as easily as 1
- Creative Freedom: Spend time creating content, not editing it
This pipeline isn't just a toolβit's a force multiplier for solo creators who want to share their work without drowning in post-production.
# Complete automated pipeline: Raw video β Edited timeline
python run_pipeline.pyThe pipeline supports three modes, set via "mode" in project_config.json or --mode on the command line (CLI overrides config):
| Mode | Audio | Speed | Boring detection | Use case |
|---|---|---|---|---|
build |
Muted β background music | Variable (1xβ6x by scene rating) | LLM visual analysis only | Silent build/craft videos β no narration |
unboxing |
Kept (narration preserved) | 1.0x always | Audio silence + video freeze + LLM | Voice-over videos β unboxing, reviews, tutorials |
reels |
Muted β music overlay | 1.0x | N/A (uses existing analysis) | Short-form 9:16 vertical clips |
| Feature | Build | Unboxing | Reels |
|---|---|---|---|
| AI scene classification (Qwen2.5-VL) | β | β | β |
| Speed ramping (1xβ6x) | β | β | β |
| Audio silence detection (ffmpeg) | β | β | β |
| Video freeze detection (ffmpeg) | β | β | β |
| Original narration preserved | β | β | β |
| Background music overlay | β | β | β |
| Teaser section generated | β | β | β |
| 9:16 vertical crop | β | β | β |
| Duplicate scene detection | β | β | β |
| Watermark overlay | β | β | β |
Build mode β silent workshop footage, speed-ramped with background music:
| Stage | What happens |
|---|---|
| 1 β Analysis | Frames sampled every 2 s β ResNet-50 + CLIP + Qwen2.5-VL classify each scene (boring / low / moderate / interesting) and assign speed 1xβ6x |
| 2 β Extraction | FFmpeg (NVENC) renders each clip at its assigned speed; audio is discarded (speed > 1x uses atempo chain) |
| 3 β Timeline | FCPXML built with teaser + intro + main + outro; video audio muted (β96 dB); background music shuffled on lane 2; cross-dissolves + watermark |
Unboxing mode β narrated video, audio preserved, boring = silent + static:
| Stage | What happens |
|---|---|
| 1 β Analysis | Same AI vision pass plus analyze_audio.py runs ffmpeg silencedetect (< β35 dB, β₯ 3 s) and freezedetect (threshold 0.02) on each video. LLM prompt tuned for narration quality, reveals, close-ups. All speeds forced to 1.0x |
| 1b β Boring merge | Scenes where silence AND freeze overlap β₯ 60 % are downgraded to boring and excluded |
| 2 β Extraction | FFmpeg renders at 1.0x β audio stays intact (no atempo, no mute) |
| 3 β Timeline | FCPXML keeps original audio on every clip (no β96 dB mute); no background music added; teaser, intro/outro, watermark still included |
Reels mode β short vertical clips from existing analysis:
| Stage | What happens |
|---|---|
| 1β2 | Skipped (reuses existing scene_analysis_*.json + ai_clips/) |
| 3 β Timeline | Builds timeline_reels.fcpxml with 9:16 crop, vertical layout, music from assets/music-teaser/ |
Build mode (default) β silent workshop footage, speed-ramped with background music:
{
"mode": "build"
}Unboxing mode β narrated video, audio preserved, boring = silent + static segments cut:
{
"mode": "unboxing",
"unboxing": {
"keep_audio": true,
"keep_speed": true,
"silence_threshold_db": -35,
"silence_min_duration": 3.0,
"motion_threshold": 0.02,
"boring_requires_both": true
}
}| Config key | Purpose | Default |
|---|---|---|
keep_audio |
Preserve original narration in extracted clips | true |
keep_speed |
Force all scenes to 1.0x (no speedup) | true |
silence_threshold_db |
dB level below which audio counts as "silent" | -35 |
silence_min_duration |
Minimum seconds of silence to flag a segment | 3.0 |
motion_threshold |
Freeze-detect pixel-diff threshold (0 = identical frames) | 0.02 |
boring_requires_both |
Require both silence + freeze to mark boring (false = either) |
true |
Reels mode β skip analysis/extract, only build vertical short timeline:
{
"mode": "reels"
}# Run as build (default)
python run_pipeline.py
# Run as unboxing β keep narration audio, no speedup
python run_pipeline.py --mode unboxing
# Run as reels only
python run_pipeline.py --mode reels
# or equivalently:
python run_pipeline.py --reels-only
# Non-interactive (auto-confirm all prompts)
python run_pipeline.py --mode unboxing --yes
# Full unboxing pipeline + reels, no prompts
python run_pipeline.py --mode unboxing --reels-only --yesIn unboxing mode the pipeline additionally runs audio silence detection (ffmpeg silencedetect) and freeze/static frame detection (ffmpeg freezedetect) on each video. Segments where both silence AND static video overlap are marked as boring and excluded. All other scenes keep their original 1.0x speed and narration audio intact β no background music is added.
What it does:
- Stage 1: Analyzes all videos with AI (ResNet-50, CLIP, Qwen2.5-VL)
- Stage 2: Extracts scenes and creates speed-adjusted clips
- Stage 3: Generates DaVinci Resolve timeline with music and effects
Output: timeline_davinci_resolve.fcpxml ready to import into DaVinci Resolve
π‘ That's it! One command processes hours of footage into an edit-ready timeline in ~20 minutes.
An intelligent video editing automation system that uses computer vision and large language models to analyze, classify, and automatically edit long-form videos into engaging, compressed timelines ready for DaVinci Resolve.
This pipeline transforms lengthy raw footage (30-60+ minutes) into polished, watchable videos by automatically detecting scene quality, adjusting playback speeds, extracting highlight moments, generating professional timelines with music/transitions/watermarks, rendering in DaVinci Resolve, and uploading to YouTube.
Key Features:
- AI-powered scene classification (boring, low, moderate, interesting)
- Automated speed ramping (1x-6x) based on content quality
- Showcase moment extraction for teaser sections
- Intelligent duplicate scene detection across multiple videos
- DaVinci Resolve FCPXML timeline generation
- Optional LUT application in Resolve Media Pool
- YouTube rendering (H.265, 4K, bitrate control)
- YouTube upload with OAuth 2.0, playlist support, and thumbnails
- YouTube Shorts / Reels vertical 9:16 pipeline
- Instagram photo carousel and Reel upload (Meta Graph API)
- Facebook Page photo post and Reel upload (Meta Graph API)
- Auto-transcoding HEVC β H.264 for Instagram compatibility
- Multi-track audio with background music and teaser soundtracks
- Configurable watermarks with opacity and positioning
- GPU-accelerated video processing (NVENC)
graph TB
%% Input Stage
RAW[πΉ Raw Video Files<br/>MOV/MP4 30-60 min]
%% Stage 1: Analysis
subgraph S1[" π§ STAGE 1: AI ANALYSIS "]
ANALYZE[analyze_advanced5.py]
MODELS[ResNet-50 + CLIP + Qwen2.5-VL<br/>Frame sampling every 2s]
CLASSIFY[Scene Classification<br/>Quality rating 1-10<br/>Speed assignment 1x-6x]
JSON[scene_analysis_*.json]
end
%% Stage 2: Extraction
subgraph S2[" βοΈ STAGE 2: CLIP EXTRACTION "]
EXTRACT[extract_scenes.py]
FFMPEG[FFmpeg + NVENC H.265<br/>Speed-adjusted clips<br/>Showcase highlights]
CLIPS[ai_clips/ folder]
end
%% Stage 3: Timeline
subgraph S3[" π¬ STAGE 3: TIMELINE GENERATION "]
TIMELINE[export_resolve.py]
BUILD[Teaser + Intro + Main + Outro<br/>Audio mix + Watermark<br/>Cross-dissolves]
FCPXML[timeline_davinci_resolve.fcpxml]
end
%% Stage 4: Resolve
subgraph S4[" π¨ STAGE 4: DAVINCI RESOLVE "]
IMPORT[Import Timeline<br/>File β Import β Timeline]
LUT[apply_lut_resolve.py<br/>Optional LUT application]
RENDER[render_youtube.py<br/>H.265 4K @ 30 Mbps]
MP4[Final MP4]
end
%% Stage 5: Upload
subgraph S5[" βοΈ STAGE 5: YOUTUBE UPLOAD "]
UPLOAD[upload_youtube.py]
AUTH[OAuth 2.0 + Thumbnail<br/>Playlist + Metadata]
YT[βΆοΈ YouTube Video]
end
%% Reels Pipeline
subgraph SR[" π± REELS / SHORTS PIPELINE "]
REELS_EXP[export_reels.py<br/>9:16 vertical 1080x1920]
REELS_XML[timeline_reels.fcpxml]
REELS_RENDER[render_reels.py<br/>H.265 NVIDIA @ 15 Mbps]
REELS_UP[upload_youtube.py --shorts]
SHORTS[π± YouTube Shorts]
end
%% Social Media
subgraph SS[" π£ SOCIAL MEDIA DISTRIBUTION "]
IG_REEL[upload_instagram.py --video<br/>Reel via Resumable Upload]
IG_PHOTO[upload_instagram.py --photo<br/>Carousel via CDN Relay]
FB_REEL[upload_facebook.py --video<br/>Reel via Graph API]
FB_PHOTO[upload_facebook.py --all<br/>Multi-Photo Post]
IG[πΈ Instagram Reel + Carousel]
FB[π Facebook Reel + Photos]
end
%% Flow
RAW --> ANALYZE
ANALYZE --> MODELS
MODELS --> CLASSIFY
CLASSIFY --> JSON
JSON --> EXTRACT
EXTRACT --> FFMPEG
FFMPEG --> CLIPS
CLIPS --> TIMELINE
TIMELINE --> BUILD
BUILD --> FCPXML
FCPXML --> IMPORT
IMPORT --> LUT
LUT --> RENDER
RENDER --> MP4
MP4 --> UPLOAD
UPLOAD --> AUTH
AUTH --> YT
%% Reels flow
RAW -.-> REELS_EXP
REELS_EXP --> REELS_XML
REELS_XML --> REELS_RENDER
REELS_RENDER --> REELS_UP
REELS_UP --> SHORTS
%% Social media flow
REELS_RENDER --> IG_REEL
REELS_RENDER --> FB_REEL
IG_REEL --> IG
IG_PHOTO --> IG
FB_REEL --> FB
FB_PHOTO --> FB
MP4 -.-> IG_PHOTO
MP4 -.-> FB_PHOTO
%% Styling
classDef stageR fill:#e0f7fa,stroke:#00838f,stroke-width:2px
classDef stage1 fill:#fff3e0,stroke:#f57c00,stroke-width:2px
classDef stage2 fill:#f3e5f5,stroke:#7b1fa2,stroke-width:2px
classDef stage3 fill:#e8f5e9,stroke:#388e3c,stroke-width:2px
classDef stage4 fill:#fce4ec,stroke:#c2185b,stroke-width:2px
classDef stage5 fill:#ffebee,stroke:#d32f2f,stroke-width:2px
classDef stageS fill:#e8eaf6,stroke:#283593,stroke-width:2px
class ANALYZE,MODELS,CLASSIFY,JSON stage1
class EXTRACT,FFMPEG,CLIPS stage2
class TIMELINE,BUILD,FCPXML stage3
class IMPORT,LUT,RENDER,MP4 stage4
class UPLOAD,AUTH,YT stage5
class REELS_EXP,REELS_XML,REELS_RENDER,REELS_UP,SHORTS stageR
class IG_REEL,IG_PHOTO,FB_REEL,FB_PHOTO,IG,FB stageS
π For detailed component breakdown and performance metrics, see PIPELINE_DIAGRAM.md
Once content is rendered, run_pipeline.py distributes it across all platforms automatically.
The main video and reels each follow their own publication path:
graph LR
%% Rendered assets
MP4["π¬ Main Video<br/>(4K H.265 MP4)"]
REELS_MP4["π± Reels Video<br/>(1080x1920 H.265)"]
PHOTOS["πΌοΈ Photos<br/>(from config)"]
%% ββ Main video publication ββ
subgraph MAIN[" Main Video Publication "]
direction TB
YT_UP["[7/7] upload_youtube.py<br/>OAuth 2.0 + thumbnail"]
YT["βΆοΈ YouTube<br/>4K Video"]
YT_UP --> YT
end
%% ββ Reels / Shorts publication ββ
subgraph REELS[" Reels / Shorts Publication "]
direction TB
YT_SHORTS["[R4/8] upload_youtube.py --shorts<br/>YouTube Data API v3"]
IG_REEL["[R5/8] upload_instagram.py --video<br/>Resumable Upload Protocol"]
FB_REEL["[R6/8] upload_facebook.py --video<br/>Graph API /{page}/videos"]
SHORTS["π± YouTube Shorts"]
IG_R["πΈ Instagram Reel"]
FB_R["π Facebook Reel"]
YT_SHORTS --> SHORTS
IG_REEL --> IG_R
FB_REEL --> FB_R
end
%% ββ Photo publication ββ
subgraph PHOTO[" Photo Publication "]
direction TB
FB_PHOTO["[R7/8] upload_facebook.py --all<br/>Multi-Photo Post"]
IG_PHOTO["[R8/8] upload_instagram.py --photo<br/>Carousel via CDN Relay"]
FB_P["π Facebook Photos"]
IG_P["πΈ Instagram Carousel"]
FB_PHOTO --> FB_P
IG_PHOTO --> IG_P
end
%% Connections
MP4 --> MAIN
REELS_MP4 --> REELS
PHOTOS --> PHOTO
%% Styling
classDef asset fill:#fff9c4,stroke:#f9a825,stroke-width:2px
classDef yt fill:#ffcdd2,stroke:#d32f2f,stroke-width:2px
classDef ig fill:#f3e5f5,stroke:#7b1fa2,stroke-width:2px
classDef fb fill:#bbdefb,stroke:#1565c0,stroke-width:2px
class MP4,REELS_MP4,PHOTOS asset
class YT_UP,YT,YT_SHORTS,SHORTS yt
class IG_REEL,IG_PHOTO,IG_R,IG_P ig
class FB_REEL,FB_PHOTO,FB_R,FB_P fb
Publication stages in run_pipeline.py:
| Stage | Script | Platform | Content | API / Method |
|---|---|---|---|---|
[7/7] |
upload_youtube.py |
YouTube | Main 4K video | YouTube Data API v3 (OAuth 2.0) |
[R4/8] |
upload_youtube.py --shorts |
YouTube Shorts | Vertical reel | YouTube Data API v3 (OAuth 2.0) |
[R5/8] |
upload_instagram.py --video |
Reel | Meta Graph API β Resumable Upload | |
[R6/8] |
upload_facebook.py --video |
Reel | Meta Graph API β /{page_id}/videos |
|
[R7/8] |
upload_facebook.py --all |
Photos | Meta Graph API β Multi-Photo Post | |
[R8/8] |
upload_instagram.py --photo |
Carousel | Meta Graph API β CDN Relay |
All upload stages require valid credentials in their respective JSON files (see Credentials Setup). Use
--yesto skip confirmation prompts for fully automated publishing.
The AI analyzes video content and assigns classifications that determine playback speed:
| Classification | Speed | Use Case | Description |
|---|---|---|---|
| Interesting | 1.0x | Key moments | High-action, critical content, showcase-worthy |
| Moderate | 2.0x | Standard content | Average interest, clear context needed |
| Low | 4.0x | Background activity | Minor details, setup, transitions |
| Boring | 6.0x | Filler content | Repetitive, minimal value (optional skip) |
| Skip | N/A | Excluded | Unusable footage (not exported) |
Original Video: 45.3 minutes
ββ Interesting: 1 scene @1x β 0.6 min
ββ Moderate: 35 scenes @2x β 5.2 min
ββ Low: 31 scenes @4x β 2.4 min
ββ Boring: 12 scenes @6x β 0.5 min (excluded)
βββββββββ
Final Timeline: 14.7 minutes (64% compression)
graph TB
QWEN["π§ Qwen2.5-VL-7B<br/>(Vision-Language Model)<br/>ββββββββββββββββββββ<br/>β’ Frame captions<br/>β’ Quality rating 1-10<br/>β’ Scene classification<br/>ββββββββββββββββββββ<br/>π¦ Cache: ~/.cache/huggingface/<br/>πΎ Size: ~4.7GB Q4_K_M GGUF"]
CLIP["π¨ CLIP ViT-B/32<br/>(Contrastive Learning)<br/>ββββββββββββββββββββ<br/>β’ Semantic embeddings<br/>β’ Text-image matching<br/>β’ Cross-modal similarity<br/>ββββββββββββββββββββ<br/>π¦ Cache: ~/.cache/clip/"]
RESNET["π ResNet-50<br/>(Feature Extraction)<br/>ββββββββββββββββββββ<br/>β’ Visual features 2048-dim<br/>β’ Perceptual similarity<br/>β’ Scene detection<br/>ββββββββββββββββββββ<br/>π¦ Cache: ~/.cache/torch/hub/"]
QWEN --> CLIP
CLIP --> RESNET
style QWEN fill:#fff3e0,stroke:#f57c00,stroke-width:3px
style CLIP fill:#e8f5e9,stroke:#388e3c,stroke-width:3px
style RESNET fill:#e3f2fd,stroke:#1976d2,stroke-width:3px
- Frame Sampling: Extract frames at 2-second intervals
- Caption Generation: Qwen2.5-VL describes visual content
- Feature Extraction: ResNet-50 extracts 2048-dim features
- Semantic Encoding: CLIP generates embeddings
- Hash Computation: Perceptual hashing for scene detection
- Scene Segmentation: Group frames into logical scenes
- LLM Classification: Rate and classify each scene
- Speed Assignment: Map classification to playback speed
- Python 3.9+
- CUDA-capable GPU (recommended for analysis and encoding)
- FFmpeg with NVENC support
- DaVinci Resolve (for final editing)
- OS: Fedora 43 (Workstation)
- GPU: NVIDIA GPU with NVENC support
- RAM: 32 GB+ recommended (16 GB minimum)
- Storage: SSD recommended (20 GB+ free for cache and outputs)
- DaVinci Resolve: 20.x (automation verified on 20.0.1)
Downloads:
- DaVinci Resolve: https://www.blackmagicdesign.com/support/family/davinci-resolve-and-fusion
- Filmic LUT Pack (iPhone): https://www.filmicpro.com/products/luts/
- Filmic LUT Pack direct download: https://www.filmicpro.com/downloads/Filmic_Pro_deLOG_LUT_Pack_May_2022.zip
# Clone repository and navigate to project directory
cd ~/video
# Create virtual environment
python3 -m venv .venv
source .venv/bin/activate
# Install dependencies
pip install -r requirements.txt
# Download AI models (automatic on first run)
# Models will be cached to ~/.cache/huggingface/ and ~/.cache/clip/
### Python Requirements
All Python dependencies are pinned in [requirements.txt](requirements.txt).~/video/
βββ run_pipeline.py # Master orchestrator (all stages)
βββ analyze_advanced5.py # Stage 1: AI video analysis
βββ extract_scenes.py # Stage 2: Scene extraction
βββ export_resolve.py # Stage 3: Timeline export (16:9)
βββ export_reels.py # Reels timeline export (9:16 vertical)
βββ apply_lut_resolve.py # LUT application via Resolve API
βββ render_youtube.py # Render 4K MP4 via Resolve API
βββ render_reels.py # Render 1080x1920 Shorts MP4 via Resolve
βββ upload_youtube.py # Upload to YouTube (main + shorts)
βββ upload_instagram.py # Upload photos/reels to Instagram
βββ upload_facebook.py # Upload photos/reels to Facebook Page
βββ instagram_credentials.json # Meta API credentials (not in git)
βββ project_config.json # Project configuration
βββ assets/
β βββ Start-Intro-V3.mov # Intro video (10-bit)
β βββ Finish-Intro-V3.mov # Outro video (10-bit)
β βββ qr-code.jpg # Watermark image
β βββ music-background/ # Background music (WAV)
β βββ music-teaser/ # Teaser/reels music (WAV)
β βββ photos/ # Project photos
β βββ photo-index/ # Index/thumbnail photos
β βββ videos-reels/ # Local reels source videos
β βββ teaser-videos/ # Teaser source videos
β βββ watermark/ # Watermark assets
βββ ai_clips/ # Extracted scene clips
β βββ {video_stem}/
β βββ *_scene_*.mov
β βββ *_showcase_*.mov
βββ tools/
β βββ install_gcc12.sh # Build GCC 12 for CUDA compatibility
β βββ build_llama_cpp_with_gcc12.sh # Build llama-cpp-python with CUDA
β βββ patch_cuda_math.sh # Patch CUDA math headers
β βββ test_video_gpu.py # GPU video processing smoke test
βββ timeline_davinci_resolve.fcpxml # Main timeline (16:9)
βββ timeline_reels.fcpxml # Reels timeline (9:16)
# Run the full pipeline - analyze, extract, and generate timeline
python run_pipeline.py
# Non-interactive mode (auto-confirm all prompts)
python run_pipeline.py --yes
# Unboxing mode with reels, fully automated
python run_pipeline.py --mode unboxing --reels-only --yes
# Output: timeline_davinci_resolve.fcpxml + ai_clips/ folderThis orchestrates all stages automatically:
- Stage 1: AI analysis of all videos in input directory
- Stage 2: Scene extraction with speed adjustments
- Stage 3: Timeline generation with music, transitions, and effects
- Stage 4: Import to DaVinci Resolve + apply LUT (via Resolve API)
- Stage 5: Render 4K MP4 (via Resolve API)
- Stage 6: Upload to YouTube (OAuth 2.0)
- Stage 7: Upload to YouTube (OAuth 2.0)
- Stage R1βR8 (with
--reels-only): Reels/Shorts export β Resolve β render β YouTube Shorts β Instagram Reel β Facebook Reel β Facebook Photos β Instagram Photos
If running stages manually after run_pipeline.py:
# 1. Import timeline to DaVinci Resolve
# File β Import β Timeline β timeline_davinci_resolve.fcpxml
# 2. Apply LUTs (optional)
python apply_lut_resolve.py --config project_config.json
# 3. Render from DaVinci Resolve
python render_youtube.py --output ~/Videos/output.mp4 --config project_config.json
# 4. Upload to YouTube (uses project_config.json defaults)
python upload_youtube.py --video ~/Videos/output.mp4 --config project_config.jsonThe reels pipeline generates 9:16 vertical shorts from dedicated short clips:
# Run only the reels pipeline (skips main video stages)
python run_pipeline.py --reels-only --yes
# Or manually step by step:
# 1. Export vertical timeline
python export_reels.py --config project_config.json --output timeline_reels.fcpxml
# 2. Import to Resolve, apply LUT, then render
python render_reels.py --output my_shorts.mp4 --config project_config.json
# 3. Upload as YouTube Shorts (with related video link)
python upload_youtube.py --video ~/Videos/my_shorts.mp4 --config project_config.json --shorts --related-video VIDEO_IDReels pipeline stages (automated via --reels-only):
| Stage | Script | What happens |
|---|---|---|
| R1 | export_reels.py |
Build 9:16 FCPXML (1080x1920), add music from assets/music-teaser/ |
| R2 | Resolve API | Create project, import timeline, apply LUT |
| R3 | render_reels.py |
Render H.265 NVIDIA @ 15 Mbps (1080x1920) |
| R4 | upload_youtube.py --shorts |
Upload as YouTube Shorts with #shorts tag |
| R5 | upload_instagram.py --video |
Upload as Instagram Reel (auto-transcodes HEVC β H.264) |
| R6 | upload_facebook.py --video |
Upload as Facebook Reel on Page |
| R7 | upload_facebook.py --all |
Publish project photos to Facebook Page |
| R8 | upload_instagram.py --photo |
Publish project photos as Instagram carousel |
If you prefer to run stages individually:
# Stage 1: Analyze video (generates scene_analysis_*.json)
python analyze_advanced5.py --video INPUT.MOV
# Stage 2: Extract clips (creates ai_clips/ directory)
python extract_scenes.py --analysis-dir . --output-dir ai_clips
# Stage 3: Export timeline (generates timeline_davinci_resolve.fcpxml)
python export_resolve.py --config project_config.json \
--analysis . \
--video-dir . \
--clips-dir ai_clips \
--output timeline_davinci_resolve.fcpxml--video PATH # Input video file
--sample-interval SECS # Frame sampling rate (default: 2)
--llm-batch-size N # LLM processing batch size (default: 10)
--gpu # Enable GPU acceleration--config PATH # Project config file
--analysis-dir PATH # Directory with scene_analysis_*.json
--video-dir PATH # Source video directory
--output-dir PATH # Output directory for clips
--exclude-boring # Skip boring scenes during extraction--input PATH # Input video directory (overrides config)
--config PATH # Project config file
--mode MODE # Pipeline mode: build, unboxing, reels
--skip-analysis # Skip Stage 1 (AI analysis)
--skip-extract # Skip Stage 2 (scene extraction)
--skip-export # Skip Stage 3 (timeline export)
--reels-only # Run only the Reels/Shorts pipeline
--yes, -y # Auto-confirm all interactive prompts--config PATH # Project config file
--analysis PATH # Analysis JSON or directory
--video-dir PATH # Source video directory
--clips-dir PATH # Extracted clips directory
--output PATH # Output FCPXML file
--use-rendered # Use pre-rendered clips (default)
--use-original # Use original videos with speed changes
--exclude-boring # Exclude boring scenes from timeline
--dedupe # Remove duplicate scenes across videos
--hash-threshold N # Hamming distance for deduplication (default: 6)--config PATH # Project config file
--output PATH # Output FCPXML file (default: timeline_reels.fcpxml)--output PATH # Output MP4 filename
--config PATH # Project config file--video PATH # Video file to upload
--config PATH # Project config file
--shorts # Upload as YouTube Shorts (adds #shorts to title)
--related-video ID # Link to related main video (YouTube video ID)
--thumbnail PATH # Custom thumbnail image--photo [PATH] # Upload photo(s) to Instagram (carousel if multiple)
# No path = all photos from config as carousel
--video PATH # Upload video as Instagram Reel (MP4)
# Auto-transcodes HEVC to H.264 if needed
--all # Upload all photos from config paths.photos directory
--caption TEXT # Custom caption (default: from project config)
--config PATH # Project config file
--credentials PATH # Credentials file (default: instagram_credentials.json)--photo [PATH] # Upload photo to Facebook Page
# No path = latest from config; no arg = multi-photo post
--video PATH # Upload video as Facebook Reel (MP4)
--all # Upload all photos as multi-photo post
--caption TEXT # Custom caption (default: from project config)
--config PATH # Project config file
--credentials PATH # Credentials file (default: instagram_credentials.json){
"paths": {
"input_dir": "./",
"output_dir": "./",
"clips_dir": "./ai_clips",
"timeline": "./timeline_davinci_resolve.fcpxml"
},
"analysis": {
"sample_interval": 2,
"target_output_ratio": 0.15,
"max_speed_multiplier": 8.0,
"captioning": {
"enabled": true,
"model": "Qwen/Qwen2.5-VL-3B-Instruct",
"device": "cuda"
}
},
"pipeline": {
"dedupe": false,
"hash_threshold": 6,
"use_rendered": true,
"exclude_boring": true
},
"timeline": {
"intro_clip": "./assets/Start-Intro-V3.mkv",
"outro_clip": "./assets/Finish-Intro-V3.mkv",
"teaser_enabled": true,
"teaser_max_duration": 45.0,
"exclude_boring": true,
"rotation_zoom": 1.78,
"transition_duration": 1.0,
"watermark": {
"path": "./qr-code.jpg",
"position": {"x": 3059.0, "y": -890.0},
"transparency": 0.3
},
"background_music": {
"folder": "./assets/music-background",
"audio_lane": 2,
"fade_duration": 3.0
},
"snippet_audio_volume_db": -96
},
"audio": {
"teaser_music": {
"folder": "./assets/music-teaser",
"audio_lane": 1,
"fade_duration": 1.0
}
},
"youtube": {
"channel_url": "https://www.youtube.com/@modernhackers",
"upload_title": "Scale Model Car Build",
"default_description": "...",
"category_id": "26",
"default_privacy": "unlisted",
"made_for_kids": false,
"altered_content": false,
"default_playlist_id": "PLxxxxxxxxxxxxxxxxx"
}
}| Section | Key | Description | Default |
|---|---|---|---|
pipeline |
exclude_boring |
Skip boring scenes globally | true |
pipeline |
use_rendered |
Use pre-rendered clips | true |
pipeline |
dedupe |
Remove duplicate scenes | false |
timeline |
teaser_enabled |
Include teaser section | true |
timeline |
teaser_max_duration |
Teaser length (seconds) | 45.0 |
timeline |
rotation_zoom |
Zoom factor for rotated clips | 1.78 |
timeline |
transition_duration |
Cross-dissolve duration | 1.0 |
watermark |
transparency |
Watermark transparency (0-1) | 0.3 |
background_music |
fade_duration |
Music fade in/out (seconds) | 3.0 |
reels |
max_duration |
Maximum shorts duration (seconds) | 59 |
reels |
resolution |
Shorts resolution | 1080x1920 |
reels |
related_video_id |
YouTube ID of the main video | "" |
paths |
videos_reels |
Source folder for reels/shorts clips | ./assets/videos-reels |
Three credential files are required for social media uploads. All are git-ignored.
Used by upload_instagram.py and upload_facebook.py. Must be created manually.
{
"app_id": "YOUR_META_APP_ID",
"ig_user_id": "YOUR_INSTAGRAM_BUSINESS_ACCOUNT_ID",
"page_id": "YOUR_FACEBOOK_PAGE_ID",
"page_name": "YourPageName",
"page_access_token": "YOUR_NEVER_EXPIRING_PAGE_ACCESS_TOKEN"
}| Field | Required | Description |
|---|---|---|
app_id |
Yes | Meta Developer App ID (from Meta Developer Portal) |
ig_user_id |
Yes | Instagram Business Account ID (linked to FB Page) |
page_id |
Yes | Facebook Page ID (used as CDN relay for uploads) |
page_name |
No | Display name for logging only |
page_access_token |
Yes | Never-expiring Page Access Token |
How to create:
- Create a Meta Developer App at https://developers.facebook.com/
- Add Instagram Graph API and Facebook Login products
- Link your Facebook Page to an Instagram Business Account
- Generate a User Access Token with permissions:
instagram_basic,instagram_content_publish,pages_manage_posts,pages_read_engagement,pages_show_list - Exchange for a long-lived token (60-day):
GET /oauth/access_token?grant_type=fb_exchange_token&client_id={app_id}&client_secret={app_secret}&fb_exchange_token={short_token} - Exchange for a permanent Page Access Token:
GET /{user_id}/accounts?access_token={long_lived_token}β use theaccess_tokenfrom the Page entry - Get IG Business Account ID:
GET /{page_id}?fields=instagram_business_account&access_token={page_token} - Save all values to
instagram_credentials.json
See INSTAGRAM_SETUP.md for detailed step-by-step instructions.
Used by upload_youtube.py for the initial OAuth flow. Downloaded from Google Cloud Console.
{
"installed": {
"client_id": "YOUR_CLIENT_ID.apps.googleusercontent.com",
"project_id": "your-project-id",
"auth_uri": "https://accounts.google.com/o/oauth2/auth",
"token_uri": "https://oauth2.googleapis.com/token",
"auth_provider_x509_cert_url": "https://www.googleapis.com/oauth2/v1/certs",
"client_secret": "YOUR_CLIENT_SECRET",
"redirect_uris": ["http://localhost"]
}
}How to create:
- Go to Google Cloud Console β APIs & Services β Credentials
- Create an OAuth 2.0 Client ID (Application type: Desktop app)
- Download the JSON file and save as
client_secrets.jsonin the project root - Enable the YouTube Data API v3 in your project
See YOUTUBE_UPLOAD_SETUP.md for detailed instructions.
Auto-generated on first upload_youtube.py run via OAuth browser flow. Do not create manually.
{
"token": "ya29.a0AfH6SM...",
"refresh_token": "1//03xxx...",
"token_uri": "https://oauth2.googleapis.com/token",
"client_id": "YOUR_CLIENT_ID.apps.googleusercontent.com",
"client_secret": "YOUR_CLIENT_SECRET",
"scopes": ["https://www.googleapis.com/auth/youtube.force-ssl"],
"universe_domain": "googleapis.com",
"account": "",
"expiry": "2026-04-05T18:07:57Z"
}| Field | Description |
|---|---|
token |
OAuth access token (auto-refreshed when expired, ~1 hour lifetime) |
refresh_token |
Used to obtain new access tokens without re-authentication |
scopes |
youtube.force-ssl β required for Brand Account compatibility |
expiry |
Token expiration timestamp (auto-managed) |
First-time setup: Run python upload_youtube.py --video <file> β a browser window opens for Google OAuth consent. After granting access, youtube_credentials.json is created automatically and tokens auto-refresh on subsequent runs.
graph TB
TIMELINE["πΉ timeline_davinci_resolve.fcpxml<br/>FCPXML 1.13 Format"]
subgraph VIDEO[" π₯ Video Tracks "]
V1["V1 Lane 0: Main Video<br/>ββββββββββββββββββββββ"]
V1_1["1οΈβ£ Teaser clips<br/>Showcase moments"]
V1_2["2οΈβ£ Intro video<br/>Start-Intro-V3.mov"]
V1_3["3οΈβ£ Scene clips<br/>Classified & speed-adjusted"]
V1_4["4οΈβ£ Outro video<br/>Finish-Intro-V3.mov"]
V2["V2 Lane 1: Watermark<br/>qr-code.jpg @ 70% opacity"]
end
subgraph AUDIO[" π Audio Tracks "]
A1["A1 Lane 1: Teaser Music<br/>One random track<br/>Fade: 1s in/out"]
A2["A2 Lane 2: Background Music<br/>Shuffled & crossfaded<br/>Fade: 3s in/out"]
A3["Video Audio<br/>-96dB (muted)"]
end
subgraph EFFECTS[" β¨ Effects "]
E1["Cross-dissolve<br/>1s overlap"]
E2["Rotation transform<br/>270Β° portrait"]
E3["Zoom adjust<br/>1.78x for rotated"]
E4["Audio fades<br/>1s/3s"]
end
TIMELINE --> VIDEO
TIMELINE --> AUDIO
TIMELINE --> EFFECTS
V1 --> V1_1
V1_1 --> V1_2
V1_2 --> V1_3
V1_3 --> V1_4
style TIMELINE fill:#f3e5f5,stroke:#7b1fa2,stroke-width:3px
style V1 fill:#e3f2fd,stroke:#1976d2,stroke-width:2px
style V2 fill:#e3f2fd,stroke:#1976d2,stroke-width:2px
style A1 fill:#fff9c4,stroke:#f9a825,stroke-width:2px
style A2 fill:#fff9c4,stroke:#f9a825,stroke-width:2px
style A3 fill:#efebe9,stroke:#5d4037,stroke-width:2px
-
Import Media First:
File β Import Media - Select all files in ai_clips/*/ directories - Include Start-Intro-V3.mkv and Finish-Intro-V3.mkv - Add music files from assets/music-* - Add watermark image (qr-code.jpg) -
Import Timeline:
File β Import β Timeline β Import AAF/EDL/XML - Select timeline_davinci_resolve.fcpxml - Verify all media is linked (no red clips) -
Verify Settings:
- Timeline resolution: 3840x2160 (4K)
- Frame rate: 24 fps
- Audio channels: Stereo (48kHz)
- Color space: Rec. 709
The pipeline uses GPU acceleration at multiple stages:
- Analysis: CUDA for model inference (Qwen, CLIP, ResNet)
- Extraction: NVENC for hardware video encoding
- Speed: 3-5x faster than CPU-only processing
Input Videos: ~10GB (45 min @ 1080p)
Analysis Data: ~50MB (JSON + embeddings)
Extracted Clips: ~3GB (pre-rendered with speed)
AI Model Cache: ~6GB (one-time download)
βββββ
Total: ~19GB per project
| Stage | Duration | GPU | CPU-Only |
|---|---|---|---|
| Analysis (45 min video) | Pass 1 | 5 min | 20 min |
| Analysis (45 min video) | Pass 2 | 3 min | 8 min |
| Extraction (79 scenes) | GPU Encode | 8 min | 25 min |
| Timeline Export | XML Gen | 5 sec | 5 sec |
| Total | 16 min | 53 min |
Issue: Missing AI models on first run
# Solution: Models download automatically
# Check cache: ls -lh ~/.cache/huggingface/hub/Issue: NVENC encoding fails
# Solution: Falls back to CPU (libx265)
# Check GPU: nvidia-smi
# Verify NVENC: ffmpeg -encoders | grep nvencIssue: DaVinci Resolve shows red clips
# Solution: Import media before timeline
# Verify paths in FCPXML match actual file locationsIssue: Watermark opacity incorrect
# Solution: Set transparency in config (0.0-1.0)
# 0.3 transparency = 70% opaqueIssue: YouTube upload fails or shows 0% in Studio
# Solution: Use resumable upload (default) and keep the terminal open
# Large files take time to process in Studio after upload completesIssue: Thumbnail rejected or stretched
# Solution: Use upload_youtube.py thumbnail support (auto-resize to 1280x720)
# Provide --thumbnail or place images in assets/photos/Issue: Timeline too long/short
# Solution: Adjust exclude_boring setting
# Enable: 59% compression (excludes boring)
# Disable: 64% compression (includes all)Automatically creates a 30-50 second teaser from:
- Top-rated showcase moments (rating 9-10)
- Interesting scene clips (rating 8+)
Sorted by quality score and limited to teaser_max_duration.
Cross-video deduplication using perceptual hashing:
python export_resolve.py --dedupe --hash-threshold 6Hamming distance threshold:
- 0-5: Identical/near-identical scenes
- 6-10: Similar scenes (default)
- 11-15: Visually related
- 16+: Different scenes
Process multiple videos in one timeline:
# Analyze all videos
for video in *.MOV; do
python analyze_advanced5.py --video "$video"
done
# Extract all scenes
python extract_scenes.py --analysis-dir .
# Export combined timeline
python export_resolve.py --analysis . --dedupeExtraction (HEVC NVENC):
Codec: HEVC (H.265)
Encoder: hevc_nvenc
Preset: p4 (balanced)
Quality: CQ 23
Container: Matroska (MKV)
Audio: PCM 16-bit 48kHz stereo
Speed Adjustment:
Video: setpts=PTS/{speed},fps=24
Audio: atempo chain (max 2.0 per stage)
DaVinci Resolve-compatible FCPXML 1.13 with:
- Asset references (file:// URIs)
- Ref-clip format for original videos
- Asset-clip format for rendered clips
- TimeMap elements for speed changes
- Adjust-transform for rotation/zoom
- Adjust-blend for opacity
- Audio automation for fades
Copyright 2026. All rights reserved.
For issues, questions, or contributions, please refer to the project documentation or contact the development team.
Version: 1.2.0
Last Updated: April 11, 2026
Platform: Linux (CUDA required for GPU acceleration)