An AI-powered personal assistant device built on Raspberry Pi 5, featuring voice interaction, computer vision, memory systems, and a knowledge graph. Think of it as Jarvis in a pocket-sized form factor.
This project is built upon and extends the excellent Whisplay AI Chatbot by PiSugar. The original project provides the foundation for the display, audio, and basic chatbot functionality.
Original Whisplay Resources:
- Whisplay HAT Repository - Audio/display drivers
- Original Tutorial Video
- Offline RPi 5 Build Tutorial
- Raspberry Pi 5 (16GB RAM recommended for full features)
- PiSugar Whisplay HAT - LCD screen (240x280), speaker, microphone
- PiSugar 3 Battery - 1200mAh portable power
- Coral USB/Dual Edge TPU (optional) - EdgeTPU for fast ML inference
- LLM8850 AI Accelerator (optional) - NVMe card for offline ASR, TTS, LLM
- Pi Camera Module (optional) - For vision capabilities
- Meshtastic Radio (optional) - For mesh network communication
These features work entirely offline using local models and processing:
| Feature | Description | Technology |
|---|---|---|
| Local LLM | Conversational AI without internet | Ollama (Qwen3, Llama, etc.) or LLM8850 |
| Local ASR | Speech-to-text transcription | Whisper, Vosk, or LLM8850 Whisper |
| Local TTS | Text-to-speech synthesis | Piper TTS or LLM8850 MeloTTS |
| Object Detection | Real-time object detection with bounding boxes | YOLO + EdgeTPU |
| Person Segmentation | Semantic segmentation masks | EdgeTPU DeepLab |
| Pose Estimation | Human pose detection and tracking | MoveNet + EdgeTPU |
| Exercise Counting | Count push-ups, squats, pull-ups, crunches | Pose analysis |
| Species Classification | Identify birds, insects, plants | EdgeTPU iNaturalist models |
| Product Classification | Identify retail products | EdgeTPU product model |
| Knowledge Base | 6.9M Wikipedia articles searchable offline | SQLite + FTS5 |
| Memory System | Episodic memory with knowledge graph | NetworkX + JSON |
| Video Recording | Record and playback video clips | Picamera2 + FFmpeg |
| Camera Capture | Take photos and display on screen | Picamera2 |
These features require external services or API keys:
| Feature | Description | Service Required |
|---|---|---|
| Cloud LLM | Advanced conversational AI | OpenAI GPT-4, Google Gemini, Grok |
| Cloud ASR | High-quality speech recognition | Google, OpenAI Whisper API |
| Cloud TTS | Natural voice synthesis | OpenAI, Google, Volcengine |
| Vision Analysis | Describe images, answer questions | GPT-4o, Gemini Vision |
| Image Generation | Generate images from text | DALL-E, Gemini Imagen |
| Web Search | Real-time web information | Serper API |
| Telegram Notifications | Send alerts and photos | Telegram Bot API |
These require specific additional hardware:
| Feature | Description | Hardware Required |
|---|---|---|
| TPU Acceleration | Fast ML inference | Coral USB Accelerator |
| LLM8850 Acceleration | Fast offline ASR, TTS, LLM | LLM8850 AI Accelerator via NVMe |
| Camera Vision | All camera-based features | Pi Camera Module |
| Mesh Network | Off-grid communication | Meshtastic radio device |
| VR Passthrough | Object detection in VR | VR headset with passthrough |
- Press button to speak, get spoken response
- Adjustable volume via voice command
- Conversation history with auto-reset after inactivity
- Live Detection: Real-time object detection with bounding boxes on display
- Smart Observer: Monitor for specific objects, alert when found
- Semantic Sentry: Detect interactions between objects (e.g., "dog on couch")
- Object Search: Find specific items by scanning with camera
- Pose Detection: Track human poses, detect actions (waving, hands up, sitting)
- Exercise Counter: Count reps for workouts with live feedback
- Episodic Memory: Records observations with timestamps, objects detected, transcriptions
- Knowledge Graph: Entities, concepts, and relationships stored in graph structure
- Mission System: Create surveillance tasks, reminders, monitoring objectives
- Memory Recall: Query past events by date, time, or content
- Local Knowledge Base: 6.9M Wikipedia articles for offline factual queries
- ImageNet: 1,000 general object classes
- Products: 100,000 US retail products
- Birds: 965 species (iNaturalist)
- Insects: 1,022 species (iNaturalist)
- Plants: 2,102 species (iNaturalist)
- List nodes on the mesh network
- Send/receive text messages
- View battery and signal strength
-
Install Whisplay HAT audio drivers:
# Follow instructions at https://github.com/PiSugar/whisplay -
Install PiSugar Power Manager (for battery display):
wget https://cdn.pisugar.com/release/pisugar-power-manager.sh bash pisugar-power-manager.sh -c release
-
Clone and enter the repository:
git clone <repository-url> optidex cd optidex
-
Install dependencies:
bash install_dependencies.sh source ~/.bashrc
-
Create environment configuration:
cp .env.template .env # Edit .env with your API keys and preferences -
Build the project:
bash build.sh
-
Start the chatbot:
bash run_chatbot.sh
-
(Optional) Enable auto-start on boot:
sudo bash startup.sh
For larger-scale memory storage with semantic search:
cd docker
./start-db.sh start
python3 ../python/migrate_to_postgres.pypython3 python/knowledge_base.py downloadIf you have an LLM8850 AI accelerator card, you can run ASR, TTS, and LLM entirely offline:
# Run the setup script
bash setup-llm8850.sh
# Download models from Hugging Face:
# - Whisper: https://huggingface.co/M5Stack/whisper-small-axmodel
# - MeloTTS: https://huggingface.co/M5Stack/MeloTTS-English-ax650
# Set up Qwen3 on LLM8850 for chat (language model on accelerator)
bash setup-qwen3-llm8850.sh
# Start all LLM8850 services
bash start-llm8850-services.shLLM8850 Integration Files:
| File | Description |
|---|---|
src/cloud-api/llm8850-asr.ts |
Whisper ASR client for port 8801 |
src/cloud-api/llm8850-tts.ts |
MeloTTS client for port 8802 |
src/cloud-api/llm8850-llm.ts |
Qwen3 LLM client for port 8000 (chat on accelerator) |
setup-llm8850.sh |
Full setup script for Whisper + MeloTTS |
setup-qwen3-llm8850.sh |
Setup Qwen3:1.7B on LLM8850 for chat (port 8000) |
start-llm8850-services.sh |
Quick script to start all services |
Set in .env for chat on LLM8850 (no internet):
LLM_SERVER=LLM8850
LLM8850_LLM_HOST=http://localhost:8000
Optional: ASR_SERVER=LLM8850, TTS_SERVER=LLM8850 or keep TTS_SERVER=PIPER for your Jarvis voice.
Resources:
Key configuration options in .env:
| Variable | Description | Options |
|---|---|---|
LLM_SERVER |
Language model provider | OLLAMA, OPENAI, GEMINI, LLM8850 |
ASR_SERVER |
Speech recognition | WHISPER, VOSK, OPENAI, GEMINI, LLM8850 |
TTS_SERVER |
Text-to-speech | PIPER, OPENAI, GEMINI, VOLCENGINE, LLM8850 |
IMAGE_GENERATION_SERVER |
Image generation | OPENAI, GEMINI, VOLCENGINE |
SERVE_OLLAMA |
Auto-start Ollama server | true, false |
LLM8850_LLM_HOST |
Qwen3 on LLM8850 (chat) | http://localhost:8000 |
LLM8850_ASR_ENDPOINT |
LLM8850 Whisper endpoint | http://localhost:8801 |
LLM8850_TTS_ENDPOINT |
LLM8850 MeloTTS endpoint | http://localhost:8802 |
optidex/
├── src/ # TypeScript source (Node.js backend)
│ ├── core/ # ChatFlow, StreamResponsor
│ ├── cloud-api/ # LLM, TTS, ASR integrations
│ │ ├── llm8850-asr.ts # LLM8850 Whisper ASR client
│ │ ├── llm8850-tts.ts # LLM8850 MeloTTS client
│ │ ├── llm8850-llm.ts # LLM8850 Qwen3 LLM client (chat)
│ │ └── ...
│ ├── config/
│ │ └── custom-tools/ # LLM tool definitions
│ ├── device/ # Audio, display, ESP32 voice
│ └── utils/ # Helpers, image utils
├── python/ # Python components
│ ├── chatbot-ui.py # LCD display UI
│ ├── periodic_observer.py # Autonomous observation
│ ├── live_detection.py # Real-time object detection
│ ├── pose_estimation.py # Pose tracking and exercise counting
│ ├── memory.py # Unified memory interface
│ ├── knowledge_base.py # Wikipedia search
│ └── ...
├── docker/ # PostgreSQL + pgvector setup
├── data/ # Runtime data (videos, images, memory)
├── docs/ # Documentation
├── setup-llm8850.sh # LLM8850 AI accelerator setup
└── start-llm8850-services.sh # Start LLM8850 services
Voice Commands:
- "Take a picture"
- "What do you see?"
- "Start detecting people"
- "Count my push-ups"
- "What happened yesterday at 3pm?"
- "Watch for when a package arrives"
- "What do you know about photosynthesis?"
- "Search the web for today's weather"
- "Show me your memory"
- "List my missions"
After modifying TypeScript code:
bash build.shAfter adding Python dependencies:
cd python
pip install -r requirements.txt --break-system-packagesRestart the service:
# If running via systemd
systemctl restart whisplay-ai-chatbot.service
# If running manually
pkill -f run_chatbot.sh
bash run_chatbot.sh