Important
This project is licensed under the MIT License.
Here you can find an explainer video: Explainer Video on YouTube
After countless hours of testing various packages, libraries, and frameworks, we realized there was no remote, robust, language-agnostic solution for real-time audio transcription using OpenAI’s Whisper. Existing solutions were usually error-prone, restricted to local use or not easy to install or integrate.
Inspired by the stream socket (in a nutshell direct communication over TCP) server implementation of whisper-streaming, we decided to develop our own Websocket server for Whisper-based streaming transcription.
Main characteristics of our implementation:
- 🔰 Simple: Done by an undergrad student having simplicity in his head.
- 🚀 Fast: Thanks to the FastAPI
- 🌐 Websocket-based: Broader client support (also possible to integrate into web apps without native socket support)
- 🔀 Parallel server: Capable of handling multiple clients simultaneously.
We all know the struggle of naming a project—it’s almost as hard as the project itself. But every creation deserves a name, and this one is no exception. The name reswhis is a blend of Remote Streaming Whisper.
Important: It worths scrolling down to the end of this page, if you got in trouble installing these two requirements.
General and independent requirements:
- uv for managing the project, packages and also dependencies
- FFmpeg (2024-12-19-git-494c961379-full_build-www.gyan.dev was tested)
Requirements for the faster-whisper backend (recommended for systems engaging Nvidia GPUs):
- NVIDIA CUDA Toolkit (version 12.6 Update 3 was tested)
- NVIDIA cuDNN Library (version 9.6.0 was tested)
Requirements for the whisper-timestamped backend: Nothing! We took care of all for you.
Requirements for the openai-whisper backend:
- An API key from OpenAI
Requirements for using our web client for testing (can get ignored if you develop your own client):
- Browser of your choice
- A working microphone
Requirements for using our test client on a machine using Microsoft Windows as (can get ignored if you use the web client or prefer your own client):
- websocat (v1.14.0 was tested)
- A working microphone
- Clone the repository
git clone https://github.com/Masihtabaei/reswhis.git
- Change the directory
cd reswhis
- Run the uv
uv sync
- Open following file in the code editor of your choice:
run.bat
- Change the configurations as needed and save the file (more info in the
configurationsubsection). - Double click the batch file tor run it.
Important:
You can also run the server on a machine using Linux or Mac without the batch file. You first need to set the following environment variables (the exact commands depend on the operating system and the exact values depend on your use case [for more info please refer to the configuration subsection]):
BACKEND=<value>
MODEL_SIZE=<value>
LANGUAGE=<value>
SAMPLING_RATE=16000 # Fix value (DO NOT CHANGE)
MINIMUM_CHUNK_SIZE=<value>
USE_VOICE_ACTIVITY_CONTROLLER=<value>
USE_VOICE_ACTIVITY_DETECTION=<value>
The you can run the server directly as follows:
uv run uvicorn main:app
- Open the browser of your choice and head to the following address or send a GET HTTP-request to this endpoint using for e. g. curl, Wget or Postman:
protocol://ip:port/info
Important: This REST-endpoint can be used for pinging the server and checking the compatibility of configurations used and specified. If you run the server without changing the default configurations locally and also the port number 8000 is not otherweise bounded, you can use this address:
http://localhost:8000/info
Hurra 🔥! Now you are officially done! You have three options for using this server:
- Web-based client
- Console-based client
- Custom client
For the web-based client:
- Change the directory
cd clients
- Head to the webpage by opening the following
htmlfile:
web_client.html
For the console-based client:
- Run the following command to find out name of the microphone you want to use:
ffmpeg -list_devices true -f dshow -i dummy
- Use the following command with the microphone's name replaced to start the transcription:
ffmpeg -loglevel debug -f dshow -i audio="<microphone-name>" -ac 1 -ar 16000 -f s16le - | websocat.x86_64-pc-windows-gnu --binary -n ws://localhost:8000/transcribe
For your custom client:
Fill free to use the language, framework or library of choise. However, following points must be considered:
- Default sampling rate is 16000 (16 kHz).
- Audio should be mono channel.
- Data must be transferred as signed 16-bit integer low endian.
/infois an REST-endpoint and/transcribeis a Websocket on.e
You can find and modify the following configurations inside the batch file:
BACKEND
faster-whisper, whisper-timestamped, openai-whisper
MODEL_SIZE
tiny.en, tiny, base.en, base, small.en, small, medium.en, medium, large-v1, large-v2, large-v3, large, large-v3-turbo
LANGUAGE
af, am, ar, as, az, ba, be, bg, bn, bo, br, bs, ca, cs, cy, da, de, el, en, es, et, eu, fa, fi, fo, fr, gl, gu, ha, haw, he, hi, hr, ht, hu, hy, id, is, it, ja, jw, ka, kk, km, kn, ko, la, lb, ln, lo, lt, lv, mg, mi, mk, ml, mn, mr, ms, mt, my, ne, nl, nn, no, oc, pa, pl, ps, pt, ro, ru, sa, sd, si, sk, sl, sn, so, sq, sr, su, sv, sw, ta, te, tg, th, tk, tl, tr, tt, uk, ur, uz, vi, yi, yo, zh
-
SAMPLING_RATE(can NOT be modified currently) -
MINIMUM_CHUNK_SIZE$\in \mathbb{N}$ (exlcusive Zero) -
USE_VOICE_ACTIVITY_CONTROLLER$\in {True, False}$ -
USE_VOICE_ACTIVITY_DETECTION$\in {True, False}$
Important: we recommend the the MODEL_SIZE=medium for transcribing audios spoken in the German language.
Here you can find a list of known errors that we experienced with solutions to fix them. Please note that these are issues that are out of our control (e. g. some 3rd-party propreitary dependencies) and we came up with some custom workarounds.
Could not locate cudnn_ops64_9.dll. Please make sure it is in your library path!Invalid handle. Cannot load symbol cudnnCreateTensorDescriptorWe experienced this problem on machines using Microsoft Windows. First stop the server (for example by using CTRL + C). Please run then thecopy_cuda_dlls.batas administrator. It will prompt you about copying required DLLs so that you can get the problem fixed. After that you can go back to step number 5 and continue from there. If you installed the NVIDIA CUDA Toolkit and NVIDIA cuDNN Library in a correct manner and also supported version then it should fix the problem.
This project was inspired by:
And employed code from: