Code examples from the Jibo Workshop. Each numbered script in scripts/ matches a step in the workshop, and pipelines/ has the advanced GPT examples. Both Node.js and Python versions are available.
Note: On macOS (and some Linux systems), python and pip may not be recognized. If that happens, use python3 and pip3 instead (e.g. python3 scripts/python/01_hello.py).
Make sure Jibo is in rosbridge mode: open http://your-jibo-hostname.local:9090 in your browser and press the "Enter Rosbridge" button in the top right. You should see a robot image appear on Jibo's screen.
git clone https://github.com/mitmedialab/jibo-workshop.git
cd jibo-workshop
cp .env.example .envEdit .env and set JIBO_HOST to your robot's hostname (printed on the bottom of the base) or IP address. For the GPT pipelines, also set OPENAI_API_KEY.
npm installpip install roslibpy python-dotenv websocketsEvery example is available in both JS and Python. They do the same thing — pick whichever language you prefer.
| # | What it does | JS | Python |
|---|---|---|---|
| 01 | Make Jibo speak | node scripts/js/01_hello.js |
python scripts/python/01_hello.py |
| 02 | Play an animation | node scripts/js/02_animate.js |
python scripts/python/02_animate.py |
| 03 | LED ring control | node scripts/js/03_led.js |
python scripts/python/03_led.py |
| 04 | Play a sound effect | node scripts/js/04_sound.js |
python scripts/python/04_sound.py |
| 05 | Stop current action | node scripts/js/05_stop.js |
python scripts/python/05_stop.py |
| 06 | Listen to Jibo's state | node scripts/js/06_state.js |
python scripts/python/06_state.py |
| 07 | Speak + animate simultaneously | node scripts/js/07_speak_animate.js |
python3 scripts/python/07_speak_animate.py |
| 08 | Chain actions in sequence | node scripts/js/08_chain.js |
python scripts/python/08_chain.py |
| 09 | Cycle LED colors | node scripts/js/09_led_cycle.js |
python scripts/python/09_led_cycle.py |
| 10 | Capture microphone audio | node scripts/js/10_mic.js |
python scripts/python/10_mic.py |
| 11 | Capture a photo | node scripts/js/11_camera.js |
python scripts/python/11_camera.py |
These are standalone scripts that connect Jibo to OpenAI for bilingual Arabic/English AI conversations. They build on the basics from scripts/ (mic streaming, camera capture, TTS) and add GPT on top. Requires OPENAI_API_KEY in .env.
| File | What it does |
|---|---|
jibo_realtime.js / .py |
Live voice conversation via OpenAI Realtime API |
jibo_whisper_gpt.js / .py |
Step-by-step Whisper STT + GPT-4o pipeline |
jibo_vision.js / .py |
Camera photo + GPT-4o Vision analysis |
node pipelines/js/jibo_realtime.js # JS
python pipelines/python/jibo_realtime.py # PythonStreams Jibo's microphone directly to OpenAI's Realtime API over WebSocket for real-time voice conversation. Jibo naturally mixes Arabic and English in its responses. The mic is muted while Jibo speaks to prevent it from hearing itself.
Flow: Jibo mic (port 3838) -> OpenAI Realtime API -> GPT generates text -> Jibo speaks via TTS -> mic unmutes -> repeat
node pipelines/js/jibo_whisper_gpt.js # JS
python pipelines/python/jibo_whisper_gpt.py # PythonA step-by-step pipeline that gives you more control than the Realtime API. Each stage runs independently, so you can swap out Whisper for another STT or change GPT models without touching the rest.
Flow: Jibo mic records (VAD auto-stop) -> Whisper transcribes -> GPT-4o generates bilingual response -> Jibo speaks -> loop
VAD (Voice Activity Detection) runs locally to detect when you start and stop speaking. The defaults work well for most rooms, but you can tune them directly in the file if needed.
node pipelines/js/jibo_vision.js # JS
python pipelines/python/jibo_vision.py # Python
# Options (same for both):
node pipelines/js/jibo_vision.js --loop # continuous mode
node pipelines/js/jibo_vision.js --prompt "what color is this?"
python pipelines/python/jibo_vision.py --loop
python pipelines/python/jibo_vision.py --prompt "what color is this?"Captures a photo from Jibo's camera (port 8486) and sends it to GPT-4o Vision for analysis. Jibo describes what it sees in mixed Arabic/English. In loop mode it takes a new photo every 8 seconds.