Skip to content

feat: add NVIDIA Jetson GPU support#364

Open
toolboc wants to merge 5 commits intoahmetoner:mainfrom
toolboc:feat/jetson-gpu-support
Open

feat: add NVIDIA Jetson GPU support#364
toolboc wants to merge 5 commits intoahmetoner:mainfrom
toolboc:feat/jetson-gpu-support

Conversation

@toolboc
Copy link
Copy Markdown

@toolboc toolboc commented Feb 27, 2026

Summary

Add Dockerfile.jetson and docker-compose.jetson.yml for building and running the whisper-asr-webservice on NVIDIA Jetson devices with full GPU acceleration.

Closes #359
Relates to #54, #133

Target Platform

  • NVIDIA Jetson Orin (Nano / NX / AGX) — JetPack 6.x, L4T R36.x, CUDA 12.6, aarch64
  • Configurable via build args for other Jetson generations

What's Included

File Purpose
Dockerfile.jetson Multi-stage build with CUDA support
docker-compose.jetson.yml Compose file for Jetson with GPU passthrough

Key Technical Decisions

  1. CTranslate2 compiled from source — PyPI ships CPU-only aarch64 wheels. Built with -DWITH_CUDA=ON -DCUDA_ARCH_LIST="8.7" against JetPack's CUDA toolkit.
  2. Jetson AI Lab pip index — PyTorch, torchaudio, and onnxruntime-gpu installed from https://pypi.jetson-ai-lab.io/jp6/cu126/+simple/ using --index-url (not --extra-index-url) because pip prefers manylinux_2_28 (CPU) over linux_aarch64 (CUDA) wheels when both are available.
  3. Bypasses Poetry resolver — poetry-core's PEP 517 metadata generation merges [tool.poetry.dependencies] source mappings with [project.optional-dependencies], producing incorrect version constraints (e.g. torch==2.7.1+cu126 instead of the actual version). Dependencies are installed explicitly via pip with a constraints file protecting CUDA packages.
  4. torchaudio compatibility shim — Jetson AI Lab torchaudio builds strip AudioMetaData, info(), and list_audio_backends(). A soundfile-based .pth monkey-patch restores them for pyannote.audio 3.x compatibility.
  5. torch.load compatibility shim — PyTorch >=2.6 defaults weights_only=True, but pyannote VAD checkpoints contain omegaconf.ListConfig globals. The shim defaults weights_only=False when None is passed (as lightning_fabric does).
  6. huggingface_hub use_auth_token→token shimhuggingface_hub 1.5.0 removed the deprecated use_auth_token parameter. pyannote.audio and whisperx still pass use_auth_token=. The shim translates it to token= for hf_hub_download, model_info, and hf_hub_url across all submodules.
  7. Guard step — Force-reinstalls torch, torchaudio, and onnxruntime-gpu from the Jetson index after dependency resolution, ensuring CPU-only PyPI wheels haven't overwritten them.

Verified On

  • Jetson Orin, JetPack 6.2.2, L4T R36.5.0, CUDA 12.6
  • torch.cuda.is_available() = True, device = Orin (8, 7)
  • CTranslate2 4.4.0 CUDA compute types: float16, bfloat16, int8, float32
  • All three ASR engines tested and passing:
    • faster_whisper — 200 OK, GPU transcription ✓
    • openai_whisper — 200 OK, GPU transcription ✓
    • whisperx — 200 OK, word-level timestamps ✓
  • HF_TOKEN support for gated model access (diarization) ✓

Build & Run

# Build
docker compose -f docker-compose.jetson.yml build

# Run
docker compose -f docker-compose.jetson.yml up

Docker Hub

Pre-built image available:

docker pull toolboc/whisper-asr-webservice-jetson:jp6.1-cu12.6-py3.10

Add Dockerfile.jetson and docker-compose.jetson.yml for building and
running the whisper-asr-webservice on NVIDIA Jetson devices (JetPack 6.x,
L4T R36.x, aarch64, CUDA 12.6).

Key features:
- Multi-stage build: CTranslate2 compiled from source with CUDA for Orin
- PyTorch, torchaudio, and onnxruntime-gpu from Jetson AI Lab pip index
- nvidia-cudss-cu12 for libcudss.so.0 required by Jetson torch wheels
- torchaudio compatibility shim for pyannote.audio 3.x (Jetson builds
  strip AudioMetaData/info()/list_audio_backends())
- Pip constraints file to protect pre-installed CUDA packages from being
  overwritten by CPU-only PyPI wheels during dependency resolution
- Guard step to force-reinstall CUDA packages from Jetson index
- Bypasses Poetry resolver (poetry-core PEP 517 metadata bug produces
  incorrect torch version constraints on aarch64)

Tested on Jetson Orin with JetPack 6.2.2:
- torch.cuda.is_available() = True (Orin, compute 8.7)
- CTranslate2 CUDA compute types: float16, bfloat16, int8, float32
- faster-whisper model loads on CUDA with float16
- All three ASR engines import successfully
- Webservice starts and serves on port 9000
nvidia-cudss-cu12 pulls in nvidia-cublas-cu12 (v12.9.1.4) as a transitive
dependency.  When its lib path was included in LD_LIBRARY_PATH alongside
JetPack's system cuBLAS (v12.6.1.4), both versions loaded into the same
process, causing CUBLAS_STATUS_ALLOC_FAILED at runtime.

Fix:
- Remove nvidia/cublas/lib from LD_LIBRARY_PATH (system cuBLAS is correct)
- Uninstall nvidia-cublas-cu12 pip package after nvidia-cudss-cu12 install
  (we only need libcudss.so.0 from that package)
- Update torchaudio compat shim to also monkey-patch torch.load,
  defaulting weights_only=False when None is passed (PyTorch >=2.6
  changed the default to True, breaking pyannote VAD checkpoints
  that contain omegaconf globals)
- Update image tag to whisper-asr-webservice-jetson:jp6.1-cu12.6-py3.10
- Set default ASR_ENGINE to whisperx in compose file
- All three engines tested and verified: faster_whisper, openai_whisper, whisperx
huggingface_hub >= 1.0 removed the deprecated use_auth_token parameter
from hf_hub_download() and related functions, but pyannote.audio 3.x
and whisperx still pass it. The compatibility shim now translates
use_auth_token -> token at startup, before pyannote imports the
function, so HF_TOKEN works correctly for diarization model access.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add Arm support for GPU container

1 participant