Skip to content

Latest commit

 

History

History
321 lines (250 loc) · 11 KB

File metadata and controls

321 lines (250 loc) · 11 KB

Optidex

An AI-powered personal assistant device built on Raspberry Pi 5, featuring voice interaction, computer vision, memory systems, and a knowledge graph. Think of it as Jarvis in a pocket-sized form factor.

Origins

This project is built upon and extends the excellent Whisplay AI Chatbot by PiSugar. The original project provides the foundation for the display, audio, and basic chatbot functionality.

Original Whisplay Resources:

Hardware

  • Raspberry Pi 5 (16GB RAM recommended for full features)
  • PiSugar Whisplay HAT - LCD screen (240x280), speaker, microphone
  • PiSugar 3 Battery - 1200mAh portable power
  • Coral USB/Dual Edge TPU (optional) - EdgeTPU for fast ML inference
  • LLM8850 AI Accelerator (optional) - NVMe card for offline ASR, TTS, LLM
  • Pi Camera Module (optional) - For vision capabilities
  • Meshtastic Radio (optional) - For mesh network communication

Capabilities Overview

Self-Contained (No External APIs Required)

These features work entirely offline using local models and processing:

Feature Description Technology
Local LLM Conversational AI without internet Ollama (Qwen3, Llama, etc.) or LLM8850
Local ASR Speech-to-text transcription Whisper, Vosk, or LLM8850 Whisper
Local TTS Text-to-speech synthesis Piper TTS or LLM8850 MeloTTS
Object Detection Real-time object detection with bounding boxes YOLO + EdgeTPU
Person Segmentation Semantic segmentation masks EdgeTPU DeepLab
Pose Estimation Human pose detection and tracking MoveNet + EdgeTPU
Exercise Counting Count push-ups, squats, pull-ups, crunches Pose analysis
Species Classification Identify birds, insects, plants EdgeTPU iNaturalist models
Product Classification Identify retail products EdgeTPU product model
Knowledge Base 6.9M Wikipedia articles searchable offline SQLite + FTS5
Memory System Episodic memory with knowledge graph NetworkX + JSON
Video Recording Record and playback video clips Picamera2 + FFmpeg
Camera Capture Take photos and display on screen Picamera2

External Services (Require API Keys/Internet)

These features require external services or API keys:

Feature Description Service Required
Cloud LLM Advanced conversational AI OpenAI GPT-4, Google Gemini, Grok
Cloud ASR High-quality speech recognition Google, OpenAI Whisper API
Cloud TTS Natural voice synthesis OpenAI, Google, Volcengine
Vision Analysis Describe images, answer questions GPT-4o, Gemini Vision
Image Generation Generate images from text DALL-E, Gemini Imagen
Web Search Real-time web information Serper API
Telegram Notifications Send alerts and photos Telegram Bot API

Hardware-Dependent Features

These require specific additional hardware:

Feature Description Hardware Required
TPU Acceleration Fast ML inference Coral USB Accelerator
LLM8850 Acceleration Fast offline ASR, TTS, LLM LLM8850 AI Accelerator via NVMe
Camera Vision All camera-based features Pi Camera Module
Mesh Network Off-grid communication Meshtastic radio device
VR Passthrough Object detection in VR VR headset with passthrough

Feature Details

Voice Interaction

  • Press button to speak, get spoken response
  • Adjustable volume via voice command
  • Conversation history with auto-reset after inactivity

Computer Vision

  • Live Detection: Real-time object detection with bounding boxes on display
  • Smart Observer: Monitor for specific objects, alert when found
  • Semantic Sentry: Detect interactions between objects (e.g., "dog on couch")
  • Object Search: Find specific items by scanning with camera
  • Pose Detection: Track human poses, detect actions (waving, hands up, sitting)
  • Exercise Counter: Count reps for workouts with live feedback

Memory & Intelligence

  • Episodic Memory: Records observations with timestamps, objects detected, transcriptions
  • Knowledge Graph: Entities, concepts, and relationships stored in graph structure
  • Mission System: Create surveillance tasks, reminders, monitoring objectives
  • Memory Recall: Query past events by date, time, or content
  • Local Knowledge Base: 6.9M Wikipedia articles for offline factual queries

Classification (TPU-Accelerated)

  • ImageNet: 1,000 general object classes
  • Products: 100,000 US retail products
  • Birds: 965 species (iNaturalist)
  • Insects: 1,022 species (iNaturalist)
  • Plants: 2,102 species (iNaturalist)

Mesh Networking (Meshtastic)

  • List nodes on the mesh network
  • Send/receive text messages
  • View battery and signal strength

Installation

Prerequisites

  1. Install Whisplay HAT audio drivers:

    # Follow instructions at https://github.com/PiSugar/whisplay
  2. Install PiSugar Power Manager (for battery display):

    wget https://cdn.pisugar.com/release/pisugar-power-manager.sh
    bash pisugar-power-manager.sh -c release

Setup

  1. Clone and enter the repository:

    git clone <repository-url> optidex
    cd optidex
  2. Install dependencies:

    bash install_dependencies.sh
    source ~/.bashrc
  3. Create environment configuration:

    cp .env.template .env
    # Edit .env with your API keys and preferences
  4. Build the project:

    bash build.sh
  5. Start the chatbot:

    bash run_chatbot.sh
  6. (Optional) Enable auto-start on boot:

    sudo bash startup.sh

Optional: PostgreSQL Memory Backend

For larger-scale memory storage with semantic search:

cd docker
./start-db.sh start
python3 ../python/migrate_to_postgres.py

Optional: Download Wikipedia Knowledge Base

python3 python/knowledge_base.py download

Optional: LLM8850 AI Accelerator (Fully Offline AI)

If you have an LLM8850 AI accelerator card, you can run ASR, TTS, and LLM entirely offline:

# Run the setup script
bash setup-llm8850.sh

# Download models from Hugging Face:
# - Whisper: https://huggingface.co/M5Stack/whisper-small-axmodel
# - MeloTTS: https://huggingface.co/M5Stack/MeloTTS-English-ax650

# Set up Qwen3 on LLM8850 for chat (language model on accelerator)
bash setup-qwen3-llm8850.sh

# Start all LLM8850 services
bash start-llm8850-services.sh

LLM8850 Integration Files:

File Description
src/cloud-api/llm8850-asr.ts Whisper ASR client for port 8801
src/cloud-api/llm8850-tts.ts MeloTTS client for port 8802
src/cloud-api/llm8850-llm.ts Qwen3 LLM client for port 8000 (chat on accelerator)
setup-llm8850.sh Full setup script for Whisper + MeloTTS
setup-qwen3-llm8850.sh Setup Qwen3:1.7B on LLM8850 for chat (port 8000)
start-llm8850-services.sh Quick script to start all services

Set in .env for chat on LLM8850 (no internet):

LLM_SERVER=LLM8850
LLM8850_LLM_HOST=http://localhost:8000

Optional: ASR_SERVER=LLM8850, TTS_SERVER=LLM8850 or keep TTS_SERVER=PIPER for your Jarvis voice.

Resources:


Environment Variables

Key configuration options in .env:

Variable Description Options
LLM_SERVER Language model provider OLLAMA, OPENAI, GEMINI, LLM8850
ASR_SERVER Speech recognition WHISPER, VOSK, OPENAI, GEMINI, LLM8850
TTS_SERVER Text-to-speech PIPER, OPENAI, GEMINI, VOLCENGINE, LLM8850
IMAGE_GENERATION_SERVER Image generation OPENAI, GEMINI, VOLCENGINE
SERVE_OLLAMA Auto-start Ollama server true, false
LLM8850_LLM_HOST Qwen3 on LLM8850 (chat) http://localhost:8000
LLM8850_ASR_ENDPOINT LLM8850 Whisper endpoint http://localhost:8801
LLM8850_TTS_ENDPOINT LLM8850 MeloTTS endpoint http://localhost:8802

Project Structure

optidex/
├── src/                    # TypeScript source (Node.js backend)
│   ├── core/               # ChatFlow, StreamResponsor
│   ├── cloud-api/          # LLM, TTS, ASR integrations
│   │   ├── llm8850-asr.ts  # LLM8850 Whisper ASR client
│   │   ├── llm8850-tts.ts  # LLM8850 MeloTTS client
│   │   ├── llm8850-llm.ts  # LLM8850 Qwen3 LLM client (chat)
│   │   └── ...
│   ├── config/
│   │   └── custom-tools/   # LLM tool definitions
│   ├── device/             # Audio, display, ESP32 voice
│   └── utils/              # Helpers, image utils
├── python/                 # Python components
│   ├── chatbot-ui.py       # LCD display UI
│   ├── periodic_observer.py # Autonomous observation
│   ├── live_detection.py   # Real-time object detection
│   ├── pose_estimation.py  # Pose tracking and exercise counting
│   ├── memory.py           # Unified memory interface
│   ├── knowledge_base.py   # Wikipedia search
│   └── ...
├── docker/                 # PostgreSQL + pgvector setup
├── data/                   # Runtime data (videos, images, memory)
├── docs/                   # Documentation
├── setup-llm8850.sh        # LLM8850 AI accelerator setup
└── start-llm8850-services.sh # Start LLM8850 services

Usage Examples

Voice Commands:

  • "Take a picture"
  • "What do you see?"
  • "Start detecting people"
  • "Count my push-ups"
  • "What happened yesterday at 3pm?"
  • "Watch for when a package arrives"
  • "What do you know about photosynthesis?"
  • "Search the web for today's weather"
  • "Show me your memory"
  • "List my missions"

Building After Changes

After modifying TypeScript code:

bash build.sh

After adding Python dependencies:

cd python
pip install -r requirements.txt --break-system-packages

Restart the service:

# If running via systemd
systemctl restart whisplay-ai-chatbot.service

# If running manually
pkill -f run_chatbot.sh
bash run_chatbot.sh

License

GPL-3.0

Acknowledgments