Skip to content

mitmedialab/jibo-workshop

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

10 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Jibo Workshop

Code examples from the Jibo Workshop. Each numbered script in scripts/ matches a step in the workshop, and pipelines/ has the advanced GPT examples. Both Node.js and Python versions are available.

Note: On macOS (and some Linux systems), python and pip may not be recognized. If that happens, use python3 and pip3 instead (e.g. python3 scripts/python/01_hello.py).

Setup

Make sure Jibo is in rosbridge mode: open http://your-jibo-hostname.local:9090 in your browser and press the "Enter Rosbridge" button in the top right. You should see a robot image appear on Jibo's screen.

git clone https://github.com/mitmedialab/jibo-workshop.git
cd jibo-workshop
cp .env.example .env

Edit .env and set JIBO_HOST to your robot's hostname (printed on the bottom of the base) or IP address. For the GPT pipelines, also set OPENAI_API_KEY.

JavaScript dependencies

npm install

Python dependencies

pip install roslibpy python-dotenv websockets

Scripts

Every example is available in both JS and Python. They do the same thing — pick whichever language you prefer.

# What it does JS Python
01 Make Jibo speak node scripts/js/01_hello.js python scripts/python/01_hello.py
02 Play an animation node scripts/js/02_animate.js python scripts/python/02_animate.py
03 LED ring control node scripts/js/03_led.js python scripts/python/03_led.py
04 Play a sound effect node scripts/js/04_sound.js python scripts/python/04_sound.py
05 Stop current action node scripts/js/05_stop.js python scripts/python/05_stop.py
06 Listen to Jibo's state node scripts/js/06_state.js python scripts/python/06_state.py
07 Speak + animate simultaneously node scripts/js/07_speak_animate.js python3 scripts/python/07_speak_animate.py
08 Chain actions in sequence node scripts/js/08_chain.js python scripts/python/08_chain.py
09 Cycle LED colors node scripts/js/09_led_cycle.js python scripts/python/09_led_cycle.py
10 Capture microphone audio node scripts/js/10_mic.js python scripts/python/10_mic.py
11 Capture a photo node scripts/js/11_camera.js python scripts/python/11_camera.py

Pipelines

These are standalone scripts that connect Jibo to OpenAI for bilingual Arabic/English AI conversations. They build on the basics from scripts/ (mic streaming, camera capture, TTS) and add GPT on top. Requires OPENAI_API_KEY in .env.

File What it does
jibo_realtime.js / .py Live voice conversation via OpenAI Realtime API
jibo_whisper_gpt.js / .py Step-by-step Whisper STT + GPT-4o pipeline
jibo_vision.js / .py Camera photo + GPT-4o Vision analysis

Live Voice Conversation (Realtime API)

node pipelines/js/jibo_realtime.js        # JS
python pipelines/python/jibo_realtime.py  # Python

Streams Jibo's microphone directly to OpenAI's Realtime API over WebSocket for real-time voice conversation. Jibo naturally mixes Arabic and English in its responses. The mic is muted while Jibo speaks to prevent it from hearing itself.

Flow: Jibo mic (port 3838) -> OpenAI Realtime API -> GPT generates text -> Jibo speaks via TTS -> mic unmutes -> repeat

Whisper + GPT-4o

node pipelines/js/jibo_whisper_gpt.js        # JS
python pipelines/python/jibo_whisper_gpt.py  # Python

A step-by-step pipeline that gives you more control than the Realtime API. Each stage runs independently, so you can swap out Whisper for another STT or change GPT models without touching the rest.

Flow: Jibo mic records (VAD auto-stop) -> Whisper transcribes -> GPT-4o generates bilingual response -> Jibo speaks -> loop

VAD (Voice Activity Detection) runs locally to detect when you start and stop speaking. The defaults work well for most rooms, but you can tune them directly in the file if needed.

Camera + GPT-4o Vision

node pipelines/js/jibo_vision.js                              # JS
python pipelines/python/jibo_vision.py                        # Python

# Options (same for both):
node pipelines/js/jibo_vision.js --loop                       # continuous mode
node pipelines/js/jibo_vision.js --prompt "what color is this?"
python pipelines/python/jibo_vision.py --loop
python pipelines/python/jibo_vision.py --prompt "what color is this?"

Captures a photo from Jibo's camera (port 8486) and sends it to GPT-4o Vision for analysis. Jibo describes what it sees in mixed Arabic/English. In loop mode it takes a new photo every 8 seconds.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors