This document describes the high-level architecture and design principles of this CLI archiver application.
The codebase is organized into distinct layers with clear responsibilities:
- Command Layer: CLI interface, scheduler loop, and user interaction
- Archive Flow Layer: Repository query expansion, deduplication, and archive execution
- Command Adapter Layer: Git and GitHub CLI process execution
- Utility Layer: Reusable helpers for retry, logging, placeholder expansion, and config parsing
- Runtime validation using Zod
- Strict TypeScript configuration for compile-time safety
- Project-specific error types for validation and repository failures
- Exponential backoff for transient Git/GitHub CLI failures
- Internal concurrency limiting for repository work
- Health signal files for external monitoring
graph TD
CLI[CLI Commands<br/>User-facing interface with argument parsing]
Scheduler[Scheduler<br/>Cron-based repeated execution]
GitHub[GitHub Query Expansion<br/>URL and gh api queries]
Archive[Archive Flow<br/>Deduplication, clone/fetch, progress/logging]
Git[Git Mirror Operations<br/>clone --mirror and fetch]
Health[Health Signals<br/>Heartbeat and completion status]
CLI --> Archive
Scheduler --> GitHub
GitHub --> Archive
Archive --> Git
Scheduler --> Health
- Input Parsing: CLI URLs or scheduled query definitions
- Query Expansion: Direct URLs and GitHub API query results become repository targets
- Deduplication: Repositories gathered from URLs and
gh apiresponses are keyed byowner/repo - Archive Coordination: Repositories are processed with an internal concurrency limit
- Mirror Operation: Clone missing archives and
fetchexisting mirror directories
graph LR
Input[Repository URLs] --> Parse[Parse GitHub URLs]
Parse --> Archive[Clone/Fetch]
Archive --> Output[Mirror Archives]
graph TB
Config[Config File] --> Cron[Cron Trigger]
Cron --> Expand[Query Expansion]
Expand --> Archive[Clone/Fetch]
Archive --> Health[Health Signals]
- Validation errors are wrapped in
GitHubArchiverZodParseError. - CLI execution collects repository failures and reports them together.
- Scheduled execution logs per-repository failures and continues through the remaining repositories.
- Pure utilities and small behavior helpers
- Retry/backoff behavior
- Config parsing and scheduler helpers
- URL parsing and placeholder replacement
- CLI command execution
- Scheduler
--runOnceflow - Existing archive behavior
- Local/fake Git execution without network dependency