feat: add NVIDIA Jetson GPU support#364
Open
toolboc wants to merge 5 commits intoahmetoner:mainfrom
Open
Conversation
Add Dockerfile.jetson and docker-compose.jetson.yml for building and running the whisper-asr-webservice on NVIDIA Jetson devices (JetPack 6.x, L4T R36.x, aarch64, CUDA 12.6). Key features: - Multi-stage build: CTranslate2 compiled from source with CUDA for Orin - PyTorch, torchaudio, and onnxruntime-gpu from Jetson AI Lab pip index - nvidia-cudss-cu12 for libcudss.so.0 required by Jetson torch wheels - torchaudio compatibility shim for pyannote.audio 3.x (Jetson builds strip AudioMetaData/info()/list_audio_backends()) - Pip constraints file to protect pre-installed CUDA packages from being overwritten by CPU-only PyPI wheels during dependency resolution - Guard step to force-reinstall CUDA packages from Jetson index - Bypasses Poetry resolver (poetry-core PEP 517 metadata bug produces incorrect torch version constraints on aarch64) Tested on Jetson Orin with JetPack 6.2.2: - torch.cuda.is_available() = True (Orin, compute 8.7) - CTranslate2 CUDA compute types: float16, bfloat16, int8, float32 - faster-whisper model loads on CUDA with float16 - All three ASR engines import successfully - Webservice starts and serves on port 9000
nvidia-cudss-cu12 pulls in nvidia-cublas-cu12 (v12.9.1.4) as a transitive dependency. When its lib path was included in LD_LIBRARY_PATH alongside JetPack's system cuBLAS (v12.6.1.4), both versions loaded into the same process, causing CUBLAS_STATUS_ALLOC_FAILED at runtime. Fix: - Remove nvidia/cublas/lib from LD_LIBRARY_PATH (system cuBLAS is correct) - Uninstall nvidia-cublas-cu12 pip package after nvidia-cudss-cu12 install (we only need libcudss.so.0 from that package)
- Update torchaudio compat shim to also monkey-patch torch.load, defaulting weights_only=False when None is passed (PyTorch >=2.6 changed the default to True, breaking pyannote VAD checkpoints that contain omegaconf globals) - Update image tag to whisper-asr-webservice-jetson:jp6.1-cu12.6-py3.10 - Set default ASR_ENGINE to whisperx in compose file - All three engines tested and verified: faster_whisper, openai_whisper, whisperx
huggingface_hub >= 1.0 removed the deprecated use_auth_token parameter from hf_hub_download() and related functions, but pyannote.audio 3.x and whisperx still pass it. The compatibility shim now translates use_auth_token -> token at startup, before pyannote imports the function, so HF_TOKEN works correctly for diarization model access.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Add
Dockerfile.jetsonanddocker-compose.jetson.ymlfor building and running the whisper-asr-webservice on NVIDIA Jetson devices with full GPU acceleration.Closes #359
Relates to #54, #133
Target Platform
What's Included
Dockerfile.jetsondocker-compose.jetson.ymlKey Technical Decisions
-DWITH_CUDA=ON -DCUDA_ARCH_LIST="8.7"against JetPack's CUDA toolkit.https://pypi.jetson-ai-lab.io/jp6/cu126/+simple/using--index-url(not--extra-index-url) because pip prefersmanylinux_2_28(CPU) overlinux_aarch64(CUDA) wheels when both are available.[tool.poetry.dependencies]source mappings with[project.optional-dependencies], producing incorrect version constraints (e.g.torch==2.7.1+cu126instead of the actual version). Dependencies are installed explicitly via pip with a constraints file protecting CUDA packages.AudioMetaData,info(), andlist_audio_backends(). A soundfile-based.pthmonkey-patch restores them for pyannote.audio 3.x compatibility.weights_only=True, but pyannote VAD checkpoints containomegaconf.ListConfigglobals. The shim defaultsweights_only=FalsewhenNoneis passed (aslightning_fabricdoes).huggingface_hub1.5.0 removed the deprecateduse_auth_tokenparameter.pyannote.audioandwhisperxstill passuse_auth_token=. The shim translates it totoken=forhf_hub_download,model_info, andhf_hub_urlacross all submodules.Verified On
torch.cuda.is_available() = True, device = Orin (8, 7)Build & Run
Docker Hub
Pre-built image available: