Capture Your Entire Digital Footprint: Lightweight & Vectorless & Powerful.
Features · How It Works · LLM Config · Get Started · Cost · Community
「 Just do your thing. CatchMe captures everything else — stored locally to ensure privacy and security. 」
🦞 Makes Your Agents Truly Personal. CatchMe ships as an agent-compatible skill for CLI agents (OpenClaw, NanoBot, Claude, Cursor, etc.). Run CatchMe independently. Your agents query memories via CLI commands only.
- Event-Driven Recording: No timer or delays - catch mouse actions with crosshair annotation instantly.
- Comprehensive Context: Five recorders track windows, keyboard, clipboard, notifications, and files around mouse actions.
- Auto-Organization: Raw streams structure into five tiers: Day → Session → App → Location → Action.
- Smart Summaries: LLM summaries at each level, transforming logs into searchable knowledge trees.
- No Vector Complexity: Skip embeddings and VDBs — our system uses tree-based reasoning for navigation.
- Top-Down Search: LLM reads summaries, selects relevant branches, and drills down to evidence.
- One-File Setup: Drop a single skill file into any AI agent for instant integration.
- Immediate Access: CLI-based screen history queries with zero configuration required.
- Minimal Footprint: ~0.2GB runtime RAM with efficient SQLite + FTS5 storage.
- Local & Offline: All data stays on your machine with full offline mode via Ollama/vLLM/LM Studio.
- Visual Exploration: Interactive timelines, memory tree navigation, and real-time system monitoring.
- Natural Conversation: Chat with your complete digital footprint using natural language.
CatchMe transforms raw digital activity into structured, searchable memory through three concurrent stages:
Capture. Six background recorders silently track your activity. They monitor window focus, keystrokes, mouse movement, screenshots, clipboard, and notifications.
Index. Raw events auto-organize into a Hierarchical Activity Tree: Day → Session → App → Location → Action. Each node gets LLM-generated summaries. Fast, meaningful recall without vector embeddings.
Retrieve. You ask a question. The LLM traverses your memory tree top-down. It selects relevant nodes and inspects raw data like screenshots or keystrokes. Then synthesizes a precise answer.
The Activity Tree is CatchMe's memory core. It provides structured, multi-level views of your digital life. Browse high-level summaries or dive into granular details.
CatchMe skips traditional vector search. Instead, the LLM directly navigates your Activity Tree. This enables complex, cross-day reasoning. Precise evidence gathering from raw activity history.
📖 Learn More: Detailed design insights and technical deep-dive available in our blog.
• 100% Local Storage: All raw data (screenshots, keystrokes, activity trees) stays in ~/data/ and never leaves your machine.
• Offline-First Options: Local LLMs (Ollama, vLLM, LM Studio) enable fully offline operation without any cloud dependency.
•
• Multimodal support: Your model should be able to handle text + images.
• Context window: Make sure the context window of your model exceed max_tokens limits in config.json.
• Cost control: For forced cost control, set limits via llm.max_calls or increase filter.mouse_cluster_gap to reduce summarization frequency.
CatchMe requires an LLM for background summarization and intelligent retrieval. Use catchme init (in Get Started)for guided setup or follow the manual configuration steps below.
For cloud API services:
{
"llm": {
"provider": "openrouter",
"api_key": "sk-or-...",
"api_url": null,
"model": "google/gemini-3-flash-preview"
}
}For local/offline operation:
{
"llm": {
"provider": "ollama",
"api_key": null,
"api_url": null,
"model": "gemma3:4b"
}
}Supported LLM Providers
| Provider | Config name | Default API URL | Get Key |
|---|---|---|---|
| OpenRouter (gateway) | openrouter |
https://openrouter.ai/api/v1 |
openrouter.ai/keys |
| AiHubMix (gateway) | aihubmix |
https://aihubmix.com/v1 |
aihubmix.com |
| SiliconFlow (gateway) | siliconflow |
https://api.siliconflow.cn/v1 |
cloud.siliconflow.cn |
| OpenAI | openai |
https://api.openai.com/v1 |
platform.openai.com |
| Anthropic | anthropic |
https://api.anthropic.com/v1 |
console.anthropic.com |
| DeepSeek | deepseek |
https://api.deepseek.com/v1 |
platform.deepseek.com |
| Gemini | gemini |
https://generativelanguage.googleapis.com/v1beta |
aistudio.google.com |
| Groq | groq |
https://api.groq.com/openai/v1 |
console.groq.com |
| Mistral | mistral |
https://api.mistral.ai/v1 |
console.mistral.ai |
| Moonshot / Kimi | moonshot |
https://api.moonshot.ai/v1 |
platform.moonshot.cn |
| MiniMax | minimax |
https://api.minimax.io/v1 |
platform.minimaxi.com |
| Zhipu AI (GLM) | zhipu |
https://open.bigmodel.cn/api/paas/v4 |
open.bigmodel.cn |
| DashScope (Qwen) | dashscope |
https://dashscope.aliyuncs.com/compatible-mode/v1 |
dashscope.console.aliyun.com |
| VolcEngine | volcengine |
https://ark.cn-beijing.volces.com/api/v3 |
console.volcengine.com |
| VolcEngine Coding | volcengine_coding_plan |
https://ark.cn-beijing.volces.com/api/coding/v3 |
console.volcengine.com |
| BytePlus | byteplus |
https://ark.ap-southeast.bytepluses.com/api/v3 |
console.byteplus.com |
| BytePlus Coding | byteplus_coding_plan |
https://ark.ap-southeast.bytepluses.com/api/coding/v3 |
console.byteplus.com |
| Ollama (local) | ollama |
http://localhost:11434/v1 |
— |
| vLLM (local) | vllm |
http://localhost:8000/v1 |
— |
| LM Studio (local) | lmstudio |
http://localhost:1234/v1 |
— |
Any OpenAI-compatible endpoint works — just set
api_urlandapi_keydirectly.
All Configuration Parameters
| Section | Parameter | Default | Description |
|---|---|---|---|
| web | host |
127.0.0.1 |
Dashboard bind address |
port |
8765 |
Dashboard port | |
| llm | provider |
— | LLM provider name (see table above) |
api_key |
— | API key for the provider | |
api_url |
(auto) | Custom endpoint; auto-set per provider if omitted | |
model |
— | Model name (provider-specific) | |
max_calls |
0 |
Max LLM calls per cycle (0 = unlimited; set to limit costs) |
|
max_images_per_cluster |
5 |
Max screenshots sent per event cluster | |
| filter | window_min_dwell |
3.0 |
Min window dwell time (sec) before recording |
keyboard_cluster_gap |
3.0 |
Keyboard event clustering gap (sec) | |
mouse_cluster_gap |
3.0 |
Time gap (sec) to merge mouse events; larger values reduce LLM summaries | |
| summarize | language |
en |
Summary output language (en, zh, etc.) |
max_tokens_l0–l3 |
1200 |
Max tokens per tree level (L0=Action … L3=Session) | |
temperature |
0.4 |
LLM temperature for summarization | |
max_workers |
2 |
Concurrent summarization workers | |
debounce_sec |
3.0 |
Debounce before triggering summary | |
save_interval_sec |
5.0 |
Tree auto-save interval | |
| retrieve | max_prompt_chars |
42000 |
Max chars in retrieval prompt |
max_iterations |
15 |
Max tree traversal iterations | |
max_file_chars |
8000 |
Max chars from extracted files | |
max_select_nodes |
7 |
Max nodes selected per iteration | |
max_tokens_step |
4096 |
Max tokens per retrieval step | |
max_tokens_answer |
8192 |
Max tokens for final answer | |
temperature_select |
0.3 |
Temperature for node selection | |
temperature_answer |
0.5 |
Temperature for answer generation | |
temperature_time_resolve |
0.1 |
Temperature for time resolution | |
max_tokens_time_resolve |
1000 |
Max tokens for time resolution |
git clone https://github.com/HKUDS/catchme.git && cd catchme
conda create -n catchme python=3.11 -y && conda activate catchme
pip install -e .macOS — grant Accessibility, Input Monitoring, Screen Recording in System Settings → Privacy & Security Windows — run as Administrator for global input monitoring
catchme init # interactive setup: provider, API key, llm modelcatchme awake # start recording
catchme web # visualize and chat
# or through cli
catchme ask -- "What am I doing today?"Full CLI Reference
| Command | Description |
|---|---|
catchme awake |
Start the recording daemon |
catchme web [-p PORT] |
Launch web dashboard (default http://127.0.0.1:8765) |
catchme ask -- "question" |
Query your activity in natural language |
catchme cost |
Show LLM token usage (last 10 min / today / all time) |
catchme disk |
Show storage breakdown & event count |
catchme ram |
Show memory usage of running processes |
catchme init |
Interactive setup: LLM provider, API key & model |
CatchMe ships as an agent-compatible skill for CLI agents (OpenClaw, NanoBot, Claude, Cursor, etc.).
🪶 Agent Integration: Run CatchMe independently. Your agents query memories via CLI commands only.
# 1. Start CatchMe yourself
catchme awake
# 2. Give the light skill to your agent
cp CATCHME-light.md ~/.cursor/skills/catchme/SKILL.mdOption B — Full Skill (agent manages the full CatchMe lifecycle autonomously):
cp CATCHME-full.md ~/.cursor/skills/catchme/SKILL.mdfrom catchme import CatchMe
from catchme.pipelines.retrieve import retrieve
# 1. One-line search — fast keyword lookup over all recorded activity
with CatchMe() as mem:
for e in mem.search("meeting notes"):
print(e.timestamp, e.data)
# 2. LLM-powered retrieval — natural language Q&A over your screen history
for step in retrieve("What was I working on this morning?"):
if step["type"] == "answer":
print(step["content"])Benchmarked with 2 hours of intensive, continuous computer use on MacBook Air M4.
| Metric | Value |
|---|---|
| Runtime RAM | ~0.2 GB |
| Disk Usage | ~ 200 MB |
| Token Throughput | input ~ 6 M , output ~ 0.7 M |
LLM cost — qwen-3.5-plus |
~ $0.42 via Aliyun DashScope |
LLM cost — gemini-3-flash-preview |
~ $5.00 via OpenRouter |
| Full Retrieval Speed (depends on question) | 5 - 20s per query using gemini-3-flash-preview |
CatchMe evolves with community input. Upcoming features include:
Multi-Device Recording. Capture and unify GUI activities across all your machines via LAN synchronization.
Dynamic Clustering. Adaptive clustering algorithms that better reflect your actual work patterns and flows, reducing unnecessary costs.
Enhanced Data Utilization. Unlock deeper insights from screenshots and metadata beyond current processing pipelines.
🌟 Star this repo to follow our future updates — your interest keeps us motivated!
We welcome contributions of any kind - whether it's a comment, a bug report, a feature idea, or a pull request. See CONTRIBUTING.md to get started.
CatchMe is inspired by these excellent open-source projects:
| Project | Inspiration |
|---|---|
| ActivityWatch | Pioneering open-source activity tracking |
| Screenpipe | Screen recording infrastructure for AI agents |
| Windrecorder | Personal screen recording & search on Windows |
| OpenRecall | Open-source alternative to Windows Recall |
| Selfspy | Classic daemon-style activity logging |
| PageIndex | Tree-structured document retrieval without embeddings |
| MineContext | Proactive context-aware AI partner & screen capture |
CatchMe is part of the HKUDS agent ecosystem — building the infrastructure layer for personal AI agents:
|
NanoBot Ultra-Lightweight Personal AI Assistant |
CLI-Anything Making All Software Agent-Native |
ClawWork AI Assistant → AI Coworker Evolution |
ClawTeam Agent Awarm Intelligence for Full Team Automation |
Thanks for visiting ✨ CatchMe








