Optidex

An AI-powered personal assistant device built on Raspberry Pi 5, featuring voice interaction, computer vision, memory systems, and a knowledge graph. Think of it as Jarvis in a pocket-sized form factor.

Origins

This project is built upon and extends the excellent Whisplay AI Chatbot by PiSugar. The original project provides the foundation for the display, audio, and basic chatbot functionality.

Original Whisplay Resources:

Whisplay HAT Repository - Audio/display drivers
Original Tutorial Video
Offline RPi 5 Build Tutorial

Hardware

Raspberry Pi 5 (16GB RAM recommended for full features)
PiSugar Whisplay HAT - LCD screen (240x280), speaker, microphone
PiSugar 3 Battery - 1200mAh portable power
Coral USB/Dual Edge TPU (optional) - EdgeTPU for fast ML inference
LLM8850 AI Accelerator (optional) - NVMe card for offline ASR, TTS, LLM
Pi Camera Module (optional) - For vision capabilities
Meshtastic Radio (optional) - For mesh network communication

Capabilities Overview

Self-Contained (No External APIs Required)

These features work entirely offline using local models and processing:

Feature	Description	Technology
Local LLM	Conversational AI without internet	Ollama (Qwen3, Llama, etc.) or LLM8850
Local ASR	Speech-to-text transcription	Whisper, Vosk, or LLM8850 Whisper
Local TTS	Text-to-speech synthesis	Piper TTS or LLM8850 MeloTTS
Object Detection	Real-time object detection with bounding boxes	YOLO + EdgeTPU
Person Segmentation	Semantic segmentation masks	EdgeTPU DeepLab
Pose Estimation	Human pose detection and tracking	MoveNet + EdgeTPU
Exercise Counting	Count push-ups, squats, pull-ups, crunches	Pose analysis
Species Classification	Identify birds, insects, plants	EdgeTPU iNaturalist models
Product Classification	Identify retail products	EdgeTPU product model
Knowledge Base	6.9M Wikipedia articles searchable offline	SQLite + FTS5
Memory System	Episodic memory with knowledge graph	NetworkX + JSON
Video Recording	Record and playback video clips	Picamera2 + FFmpeg
Camera Capture	Take photos and display on screen	Picamera2

External Services (Require API Keys/Internet)

These features require external services or API keys:

Feature	Description	Service Required
Cloud LLM	Advanced conversational AI	OpenAI GPT-4, Google Gemini, Grok
Cloud ASR	High-quality speech recognition	Google, OpenAI Whisper API
Cloud TTS	Natural voice synthesis	OpenAI, Google, Volcengine
Vision Analysis	Describe images, answer questions	GPT-4o, Gemini Vision
Image Generation	Generate images from text	DALL-E, Gemini Imagen
Web Search	Real-time web information	Serper API
Telegram Notifications	Send alerts and photos	Telegram Bot API

Hardware-Dependent Features

These require specific additional hardware:

Feature	Description	Hardware Required
TPU Acceleration	Fast ML inference	Coral USB Accelerator
LLM8850 Acceleration	Fast offline ASR, TTS, LLM	LLM8850 AI Accelerator via NVMe
Camera Vision	All camera-based features	Pi Camera Module
Mesh Network	Off-grid communication	Meshtastic radio device
VR Passthrough	Object detection in VR	VR headset with passthrough

Feature Details

Voice Interaction

Press button to speak, get spoken response
Adjustable volume via voice command
Conversation history with auto-reset after inactivity

Computer Vision

Live Detection: Real-time object detection with bounding boxes on display
Smart Observer: Monitor for specific objects, alert when found
Semantic Sentry: Detect interactions between objects (e.g., "dog on couch")
Object Search: Find specific items by scanning with camera
Pose Detection: Track human poses, detect actions (waving, hands up, sitting)
Exercise Counter: Count reps for workouts with live feedback

Memory & Intelligence

Episodic Memory: Records observations with timestamps, objects detected, transcriptions
Knowledge Graph: Entities, concepts, and relationships stored in graph structure
Mission System: Create surveillance tasks, reminders, monitoring objectives
Memory Recall: Query past events by date, time, or content
Local Knowledge Base: 6.9M Wikipedia articles for offline factual queries

Classification (TPU-Accelerated)

ImageNet: 1,000 general object classes
Products: 100,000 US retail products
Birds: 965 species (iNaturalist)
Insects: 1,022 species (iNaturalist)
Plants: 2,102 species (iNaturalist)

Mesh Networking (Meshtastic)

List nodes on the mesh network
Send/receive text messages
View battery and signal strength

Installation

Prerequisites

Install Whisplay HAT audio drivers:

# Follow instructions at https://github.com/PiSugar/whisplay

Install PiSugar Power Manager (for battery display):

wget https://cdn.pisugar.com/release/pisugar-power-manager.sh
bash pisugar-power-manager.sh -c release

Setup

Clone and enter the repository:

git clone <repository-url> optidex
cd optidex

Install dependencies:

bash install_dependencies.sh
source ~/.bashrc

Create environment configuration:

cp .env.template .env
# Edit .env with your API keys and preferences

Build the project:
```
bash build.sh
```
Start the chatbot:
```
bash run_chatbot.sh
```
(Optional) Enable auto-start on boot:
```
sudo bash startup.sh
```

Optional: PostgreSQL Memory Backend

For larger-scale memory storage with semantic search:

cd docker
./start-db.sh start
python3 ../python/migrate_to_postgres.py

Optional: Download Wikipedia Knowledge Base

python3 python/knowledge_base.py download

Optional: LLM8850 AI Accelerator (Fully Offline AI)

If you have an LLM8850 AI accelerator card, you can run ASR, TTS, and LLM entirely offline:

# Run the setup script
bash setup-llm8850.sh

# Download models from Hugging Face:
# - Whisper: https://huggingface.co/M5Stack/whisper-small-axmodel
# - MeloTTS: https://huggingface.co/M5Stack/MeloTTS-English-ax650

# Set up Qwen3 on LLM8850 for chat (language model on accelerator)
bash setup-qwen3-llm8850.sh

# Start all LLM8850 services
bash start-llm8850-services.sh

LLM8850 Integration Files:

File	Description
`src/cloud-api/llm8850-asr.ts`	Whisper ASR client for port 8801
`src/cloud-api/llm8850-tts.ts`	MeloTTS client for port 8802
`src/cloud-api/llm8850-llm.ts`	Qwen3 LLM client for port 8000 (chat on accelerator)
`setup-llm8850.sh`	Full setup script for Whisper + MeloTTS
`setup-qwen3-llm8850.sh`	Setup Qwen3:1.7B on LLM8850 for chat (port 8000)
`start-llm8850-services.sh`	Quick script to start all services

Set in .env for chat on LLM8850 (no internet):

LLM_SERVER=LLM8850
LLM8850_LLM_HOST=http://localhost:8000

Optional: ASR_SERVER=LLM8850, TTS_SERVER=LLM8850 or keep TTS_SERVER=PIPER for your Jarvis voice.

Resources:

Environment Variables

Key configuration options in .env:

Variable	Description	Options
`LLM_SERVER`	Language model provider	`OLLAMA`, `OPENAI`, `GEMINI`, `LLM8850`
`ASR_SERVER`	Speech recognition	`WHISPER`, `VOSK`, `OPENAI`, `GEMINI`, `LLM8850`
`TTS_SERVER`	Text-to-speech	`PIPER`, `OPENAI`, `GEMINI`, `VOLCENGINE`, `LLM8850`
`IMAGE_GENERATION_SERVER`	Image generation	`OPENAI`, `GEMINI`, `VOLCENGINE`
`SERVE_OLLAMA`	Auto-start Ollama server	`true`, `false`
`LLM8850_LLM_HOST`	Qwen3 on LLM8850 (chat)	`http://localhost:8000`
`LLM8850_ASR_ENDPOINT`	LLM8850 Whisper endpoint	`http://localhost:8801`
`LLM8850_TTS_ENDPOINT`	LLM8850 MeloTTS endpoint	`http://localhost:8802`

Project Structure

optidex/
├── src/                    # TypeScript source (Node.js backend)
│   ├── core/               # ChatFlow, StreamResponsor
│   ├── cloud-api/          # LLM, TTS, ASR integrations
│   │   ├── llm8850-asr.ts  # LLM8850 Whisper ASR client
│   │   ├── llm8850-tts.ts  # LLM8850 MeloTTS client
│   │   ├── llm8850-llm.ts  # LLM8850 Qwen3 LLM client (chat)
│   │   └── ...
│   ├── config/
│   │   └── custom-tools/   # LLM tool definitions
│   ├── device/             # Audio, display, ESP32 voice
│   └── utils/              # Helpers, image utils
├── python/                 # Python components
│   ├── chatbot-ui.py       # LCD display UI
│   ├── periodic_observer.py # Autonomous observation
│   ├── live_detection.py   # Real-time object detection
│   ├── pose_estimation.py  # Pose tracking and exercise counting
│   ├── memory.py           # Unified memory interface
│   ├── knowledge_base.py   # Wikipedia search
│   └── ...
├── docker/                 # PostgreSQL + pgvector setup
├── data/                   # Runtime data (videos, images, memory)
├── docs/                   # Documentation
├── setup-llm8850.sh        # LLM8850 AI accelerator setup
└── start-llm8850-services.sh # Start LLM8850 services

Usage Examples

Voice Commands:

"Take a picture"
"What do you see?"
"Start detecting people"
"Count my push-ups"
"What happened yesterday at 3pm?"
"Watch for when a package arrives"
"What do you know about photosynthesis?"
"Search the web for today's weather"
"Show me your memory"
"List my missions"

Building After Changes

After modifying TypeScript code:

bash build.sh

After adding Python dependencies:

cd python
pip install -r requirements.txt --break-system-packages

Restart the service:

# If running via systemd
systemctl restart whisplay-ai-chatbot.service

# If running manually
pkill -f run_chatbot.sh
bash run_chatbot.sh

License

GPL-3.0

Acknowledgments

PiSugar - Whisplay HAT, original chatbot, and LLM8850 integration
M5Stack - LLM8850 AI Accelerator
Coral - EdgeTPU acceleration
Ultralytics - YOLO models
OpenAI - GPT and Whisper
Google - Gemini API

Name		Name	Last commit message	Last commit date
Latest commit History 356 Commits
.github/workflows		.github/workflows
docker		docker
docs		docs
lib		lib
patches		patches
python		python
scripts		scripts
src		src
.bluetooth_audio_env.sh		.bluetooth_audio_env.sh
.env.template		.env.template
.gitignore		.gitignore
FIX_SUMMARY.md		FIX_SUMMARY.md
IMPLEMENTATION_COMPLETE.md		IMPLEMENTATION_COMPLETE.md
LICENSE		LICENSE
LIVE_DETECTION_GUIDE.md		LIVE_DETECTION_GUIDE.md
POSE_ESTIMATION_GUIDE.md		POSE_ESTIMATION_GUIDE.md
README.md		README.md
SIMPLIFIED_VISUAL_MODE.md		SIMPLIFIED_VISUAL_MODE.md
TOOL_UPDATES_SUMMARY.md		TOOL_UPDATES_SUMMARY.md
VIDEO_PLAYBACK_FIXES_FINAL.md		VIDEO_PLAYBACK_FIXES_FINAL.md
VISION_GUIDE.md		VISION_GUIDE.md
VISION_MODELS_FOR_PI5.md		VISION_MODELS_FOR_PI5.md
VISUAL_MODE_FIX.md		VISUAL_MODE_FIX.md
build.sh		build.sh
chatbot-run.log		chatbot-run.log
glados_test.wav		glados_test.wav
install_dependencies.sh		install_dependencies.sh
install_ollama.sh		install_ollama.sh
jarvis		jarvis
package-lock.json		package-lock.json
package.json		package.json
pisugar-power-manager.sh		pisugar-power-manager.sh
run_chatbot.sh		run_chatbot.sh
run_chatbot.sh.backup		run_chatbot.sh.backup
run_chatbot.sh.bak.1763194937		run_chatbot.sh.bak.1763194937
service-error.log		service-error.log
setup-llm8850.sh		setup-llm8850.sh
setup-qwen3-llm8850.sh		setup-qwen3-llm8850.sh
setup-vision.sh		setup-vision.sh
setup-yoloe.sh		setup-yoloe.sh
simple_start.sh		simple_start.sh
start-llm8850-services.sh		start-llm8850-services.sh
start_whisplay.sh		start_whisplay.sh
startup.sh		startup.sh
startup.sh.bak		startup.sh.bak
test-video-fixes.sh		test-video-fixes.sh
test.wav		test.wav
tsconfig.json		tsconfig.json
view-logs-streaming.sh		view-logs-streaming.sh
view-logs.sh		view-logs.sh
vitest.config.ts		vitest.config.ts
yarn.lock		yarn.lock
yolov8n.pt		yolov8n.pt
yolov8s-world.pt		yolov8s-world.pt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Optidex

Origins

Hardware

Capabilities Overview

Self-Contained (No External APIs Required)

External Services (Require API Keys/Internet)

Hardware-Dependent Features

Feature Details

Voice Interaction

Computer Vision

Memory & Intelligence

Classification (TPU-Accelerated)

Mesh Networking (Meshtastic)

Installation

Prerequisites

Setup

Optional: PostgreSQL Memory Backend

Optional: Download Wikipedia Knowledge Base

Optional: LLM8850 AI Accelerator (Fully Offline AI)

Environment Variables

Project Structure

Usage Examples

Building After Changes

License

Acknowledgments

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Optidex

Origins

Hardware

Capabilities Overview

Self-Contained (No External APIs Required)

External Services (Require API Keys/Internet)

Hardware-Dependent Features

Feature Details

Voice Interaction

Computer Vision

Memory & Intelligence

Classification (TPU-Accelerated)

Mesh Networking (Meshtastic)

Installation

Prerequisites

Setup

Optional: PostgreSQL Memory Backend

Optional: Download Wikipedia Knowledge Base

Optional: LLM8850 AI Accelerator (Fully Offline AI)

Environment Variables

Project Structure

Usage Examples

Building After Changes

License

Acknowledgments

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages