Batch transcription ? #52

translationsb · 2024-04-15T11:21:05Z

translationsb
Apr 15, 2024

I am a fan of Noscribe. I do a lot of chinese transcription and it has the best quality transcription compared to all other whisper interfaces I used so far. Is it possible to add a batch convert function to convert multiple audio files one after another?

kaixxx · 2024-04-15T12:42:20Z

kaixxx
Apr 15, 2024
Maintainer

Is it possible to add a batch convert function

Yes definitly, this feauture will come. But it may take a few weeks, see: #50 (reply in thread)

1 reply

translationsb Apr 15, 2024
Author

so great to hear! Sorry I oversaw that post . greetings from Lausanne

mkasztelnik · 2024-04-26T08:20:48Z

mkasztelnik
Apr 26, 2024

Many thx for noScribe! I have a similar use case but with running noScribe on the HPC cluster. Is it possible to run the transcription process from the command line?

1 reply

kaixxx Apr 26, 2024
Maintainer

No CLI version yet. But a client-server-version is in the making which will also include a command line interface. However, it may take a few weeks until this becomes reality.
Do you need the full feature list of noScribe, like speaker differentiation, etc.? If you only want the transcribed text, there are other implementations of whisper out there which offer a nice CLI support, like: https://github.com/ggerganov/whisper.cpp

m-aa2wq · 2025-09-08T11:49:33Z

m-aa2wq
Sep 8, 2025

Is the batch feature already implemented?

1 reply

kaixxx Sep 8, 2025
Maintainer

Yes, it's almost finished: https://github.com/kaixxx/noScribe/tree/cli (command line interface and batch processing integrated in the UI).

Are you able to run noScribe from source? If so, you can test it out. Otherwise, I could also compile a binary for beta testing (Windows or macOS).

m-aa2wq · 2025-09-08T15:01:57Z

m-aa2wq
Sep 8, 2025

I am not able to run noScribe form source.

1 reply

kaixxx Sep 8, 2025
Maintainer

Are you on Windows, macOS or Linux?

m-aa2wq · 2025-09-08T20:20:49Z

m-aa2wq
Sep 8, 2025

On Windows, sorry kaixxx ***@***.***> schrieb am Mo. 8. Sept. 2025 um 22:00:

…

Are you on Windows, macOS or Linux? — Reply to this email directly, view it on GitHub <#52 (reply in thread)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/BVQA4CHZ37EN4XR3QEDNTM33RXN7HAVCNFSM6AAAAACF5FFMLOVHI2DSMVQWIX3LMV43URDJONRXK43TNFXW4Q3PNVWWK3TUHMYTIMZUGQYDQMQ> . You are receiving this because you commented.Message ID: ***@***.***>

0 replies

kaixxx · 2025-09-10T11:23:37Z

kaixxx
Sep 10, 2025
Maintainer

Beta versions of 0.7 are available here: https://drive.switch.ch/index.php/s/EIVup04qkSHb54j?path=%2FnoScribe%20vers.%200.7

Batch transcription is integrated in the UI. Just start several jobs, and they will be sent to a queue and processed one after the other. You can also select more than one audio file in the file dialog to create multiple jobs at once.

To see the CLI-options, just run noScribe -hfrom the terminal.

8 replies

kaixxx Oct 26, 2025
Maintainer

Hi, is this still the latest beta version of NoScribe 7?

Yes, at least if you don't want to run noScribe from source.

b3eb0o Nov 5, 2025

I'm getting always this error when using the nocuda 0.7beta Windows version:

Transcription worker exited unexpectedly (code 3221226505).
Gespeichert unter: C:\Ft\001!.html
Subprocess terminated unexpectedly

kaixxx Nov 5, 2025
Maintainer

Hm, strange. Is the transcript under C:\Ft\001!.html still ok?
It would be nice if you could run noScribe from the command line to get more info on the error:

Assuming, you are on Windows, type cmd into the search bar and hit enter. This should start a terminal ("Eingabeaufforderung")
Type in cd C:\Program Files (x86)\noScribe (or wherever you have installed noScribe)
Type in `noScribe' to start the app. Now, you should get some information in the terminal during the transcription, hopefully also regarding the error.

gernophil Nov 5, 2025
Collaborator

This seems to be a GPU issue. Could you try the solution suggested here: https://stackoverflow.com/questions/73940731/raw-kernel-process-exited-code-3221226505

kaixxx Nov 6, 2025
Maintainer

This seems to be a GPU issue.

Interesting find. So, it might be that noScribe tries to access CUDA although the user @b3eb0o installed the non CUDA version. Anyhow, it would be helpful to get more information about the actual error first.

PaulEdouardG · 2025-11-12T14:06:02Z

PaulEdouardG
Nov 12, 2025

Hi,

Using the beta version avalaible here (https://drive.switch.ch/index.php/s/EIVup04qkSHb54j?path=%2FnoScribe%20vers.%200.7) I'm getting an error.

Here is what I'm doing, in cmd launched from folder where noscribe 0.7 beta is installed :

noScribe --language fr --model precise --speaker-detection 2 --overlapping --timestamps --disfluencies "Path\to\recording\2025-11-12 - recording.mp3" "Path\to\recording\2025-11-12 - recording.txt"

And here is the output in commandline :

Traceback (most recent call last):
File "C:\Users\kai\Documents\Programmierung\2023_WhisperTranscribe\noScribe\noScribe.py", line 3310, in
File "C:\Users\kai\Documents\Programmierung\2023_WhisperTranscribe\noScribe\noScribe.py", line 1029, in init
File "i18n\translator.py", line 26, in t
File "i18n\resource_loader.py", line 103, in search_translation
File "i18n\resource_loader.py", line 93, in load_directory
File "i18n\resource_loader.py", line 72, in load_translation_file
File "i18n\resource_loader.py", line 24, in load_resource
File "i18n\loaders\loader.py", line 46, in load_resource
File "i18n\loaders\yaml_loader.py", line 13, in parse_file
File "yaml_init_.py", line 81, in load
File "yaml\constructor.py", line 49, in get_single_data
File "yaml\composer.py", line 36, in get_single_node
File "yaml\composer.py", line 55, in compose_document
File "yaml\composer.py", line 84, in compose_node
File "yaml\composer.py", line 133, in compose_mapping_node
File "yaml\composer.py", line 84, in compose_node
File "yaml\composer.py", line 127, in compose_mapping_node
File "yaml\parser.py", line 98, in check_event
File "yaml\parser.py", line 438, in parse_block_mapping_key
yaml.parser.ParserError: while parsing a block mapping
in "", line 2, column 3:
app_header: "Transcription audio ...
^
expected , but found ''
in "", line 77, column 44:
... rt: '=== Démarrage de la file d'attente ===
^
[9764] Failed to execute script 'noScribe' due to unhandled exception!`

In a similar way, when I try to run GUI here is what I get :

Traceback (most recent call last):
File "C:\Users\kai\Documents\Programmierung\2023_WhisperTranscribe\noScribe\noScribe.py", line 3310, in
File "C:\Users\kai\Documents\Programmierung\2023_WhisperTranscribe\noScribe\noScribe.py", line 1029, in init
File "i18n\translator.py", line 26, in t
File "i18n\resource_loader.py", line 103, in search_translation
File "i18n\resource_loader.py", line 93, in load_directory
File "i18n\resource_loader.py", line 72, in load_translation_file
File "i18n\resource_loader.py", line 24, in load_resource
File "i18n\loaders\loader.py", line 46, in load_resource
File "i18n\loaders\yaml_loader.py", line 13, in parse_file
File "yaml_init_.py", line 81, in load
File "yaml\constructor.py", line 49, in get_single_data
File "yaml\composer.py", line 36, in get_single_node
File "yaml\composer.py", line 55, in compose_document
File "yaml\composer.py", line 84, in compose_node
File "yaml\composer.py", line 133, in compose_mapping_node
File "yaml\composer.py", line 84, in compose_node
File "yaml\composer.py", line 127, in compose_mapping_node
File "yaml\parser.py", line 98, in check_event
File "yaml\parser.py", line 438, in parse_block_mapping_key
yaml.parser.ParserError: while parsing a block mapping
in "", line 2, column 3:
app_header: "Transcription audio ...
^
expected , but found ''
in "", line 77, column 44:
... rt: '=== Démarrage de la file d'attente ==='
^
[24208] Failed to execute script 'noScribe' due to unhandled exception!

Details concerning my configuration :

I'm using the normal version (not the cuda one)
with windows
on a PC that is set in french => this last point might be related to what is occuring

The patch I found :

within noScribe\config.yml, set locale to en => then it works.

2 replies

kaixxx Nov 12, 2025
Maintainer

Hi

Thank you, there seems to be an error in the french translation file, as you've already suspected. I have corrected it since then.
To get it running, open the file _internal/trans/noScribe.fr.yml in the installation folder and edit the following line:
queue_start: '=== Démarrage de la file d'attente ==='
into
queue_start: '=== Démarrage de la file d''attente ===',
changing the single quotation mark (') into two single ones ('').

PaulEdouardG Nov 13, 2025

Thanks a lot for your quick answer ! Indeed, it solves the problem !

m-aa2wq · 2025-12-01T14:54:43Z

m-aa2wq
Dec 1, 2025

So far, I have not been able to successfully complete a transcription with this version. With an mp3 file in German, the transcription breaks off at some point and no html transcript is available.

Hier der Ausgabe-Text:

=== Warteschlange wird gestartet ===
Bearbeite 1 Transkriptionsauftrag/-aufträge

Starte Auftrag: xxx.mp3

Audioumwandlung...
Umwandlung fertig.

Sprecher:innen identifizieren...
Pyannote laden
discrete_diarization: 100%

Transkription...
Whisper laden
Sprachaktivitätserkennung...
Transcription worker exited unexpectedly (code 3221226505).
Gespeichert unter: C:/xxx.html
Subprocess terminated unexpectedly

=== Warteschlangenverarbeitung abgeschlossen ===
Aufträge insgesamt: 1
Abgeschlossen: 0
Fehlgeschlagen: 1
Abgebrochen: 0
Gesamte Verarbeitungszeit: 51:33

0 replies

m-aa2wq · 2025-12-02T11:32:12Z

m-aa2wq
Dec 2, 2025

I discovered that version 0.7 beta is okay on another PC. The "bad PC" has very restricted admin rights. Perhaps that caused the problems.

10 replies

hupzo Dec 7, 2025

There was, however, also an error in the final version 0.7 (maybe in the beta as well) that prevented the disfluencies option from working in conjunction with the "auto" language setting. This has now been corrected, the new version is compiling already.

Thank you. I see the new updated version. I will install it and run a new video to transcript with this version.. yes i will uncheck discluencies just to check... :)

hupzo Dec 7, 2025

Since releasing the beta version of 0.7, I've switched to a new version of pyannote (the AI for speaker identification) which was released a few weeks ago. This is also one reason for the delay in releasing noScribe 0.7. The switch to pyannote 4 explains the differences you see in speaker detection

ps: small question... in the google search results i now see popup: https://noscribe.ai/ .... are you involved with this paid version of "noscribe" of just "hijacked" the name?? Yes the .AI version gives .SRT export... your NoScribe does not... please let it be a complete different "project".

kaixxx Dec 7, 2025
Maintainer

Indeed, somebody has hijacked the name. I have nothing to do with this domain. The software is also not mine, it is an online platform, nothing to use locally. Sad to see such scamming attempts.
However, in my Google results, the domain appears only on page 3 (even in a new private window). All the links before point to the real thing. In which country are you? Does it appear higher up in the ranking for you?
Thank you for making me aware of this; I'll put a warning in the readme.

hupzo Dec 7, 2025

All the links before point to the real thing. In which country are you? Does it appear higher up in the ranking for you?

:( The Netherlands ... Google gives:

kaixxx Dec 8, 2025
Maintainer

Annoying. I've added a warning to the readme. Since I don't have a registered trade name, there is probably not much more I can do.

mmffkiel · 2025-12-18T15:31:24Z

mmffkiel
Dec 18, 2025

I was able to test the batch processing of version 0.7 in more detail. It is very helpful and works well overall. However, with a large number of files (400 in my case), there is a significant number of aborted processes, approximately 30 percent. ("subprocess aborted unexpectedly") Is there an error log in the background that I could provide?

1 reply

kaixxx Dec 18, 2025
Maintainer

Interesting, I've never tested such a large batch. See here how to access the log files for the (failed) jobs: https://github.com/kaixxx/noScribe#advanced-options I'm curious to know what went wrong.

mmffkiel · 2025-12-19T08:25:58Z

mmffkiel
Dec 19, 2025

Here is one exemplary error message of a job that was aborted. ERROR: Transcription worker exited unexpectedly (code 3221226505). I noticed that it is the same error as in this discussion #257
However, I am using another PC. Windows 11 (newest), I9 ultra processor, 64 GB, RTX 5060 card.

1 reply

kaixxx Dec 19, 2025
Maintainer

From what I understand, return code 3221226505 (hex C0000409) indicates that Windows has forcefully terminated the process for some reason, usually because of a quite low level error in the libraries or drivers. I have seen this error before (here is a recent issue: #259), but I don't know the exact reason. It seems to be related to CUDA.

Your RTX 5060 has 8GB of VRAM, right? That should be enough.

Can you try some of the failed files again, transcribing them one by one without the batch-mode? Do they fail again? If so: Is there anything special about these files? Are they large? Small? Do they fail right away or in the middle of the transcription?

logitacher · 2025-12-28T21:41:48Z

logitacher
Dec 28, 2025

This is what I get:

`Transkription...
Whisper laden
[2025-12-28 22:32:06.871] [ctranslate2] [thread 6924] [warning] The compute type inferred from the saved model is float16, but the target device or backend do not support efficient float16 computation. The model weights have been automatically converted to use the float32 compute type instead.
Sprachaktivitätserkennung...
Transkription...

Could not locate cudnn_ops64_9.dll. Please make sure it is in your library path!
Invalid handle. Cannot load symbol cudnnCreateTensorDescriptor
Transcription worker exited unexpectedly (code 3221226505).
Subprocess terminated unexpectedly
Job error details: Traceback (most recent call last):
File "C:\Users\kai\Documents\Programmierung\2023_WhisperTranscribe\noScribe\noScribe.py", line 3110, in _run_whisper_subprocess_stream
File "multiprocessing\queues.py", line 114, in get
_queue.Empty

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "C:\Users\kai\Documents\Programmierung\2023_WhisperTranscribe\noScribe\noScribe.py", line 2336, in transcription_worker
File "C:\Users\kai\Documents\Programmierung\2023_WhisperTranscribe\noScribe\noScribe.py", line 2939, in _process_single_job
File "C:\Users\kai\Documents\Programmierung\2023_WhisperTranscribe\noScribe\noScribe.py", line 3123, in _run_whisper_subprocess_stream
Exception: Subprocess terminated unexpectedly

=== Warteschlangenverarbeitung abgeschlossen ===
Aufträge insgesamt: 1
Abgeschlossen: 0
Fehlgeschlagen: 1
Abgebrochen: 0
Gesamte Verarbeitungszeit: 9:58`

3 replies

kaixxx Dec 28, 2025
Maintainer

Seems to be the same issue as this one: #259 (comment)
Please follow my instructions there and report back if it helps.

logitacher Dec 29, 2025

Yes, editing the config.yml fixed the issue, thanks!

kaixxx Dec 29, 2025
Maintainer

Thank you. I've added a note to the Readme (Windows section) and will try to improve the automatic CUDA detection for the next version.

mmffkiel · 2025-12-29T11:45:26Z

mmffkiel
Dec 29, 2025

I continued experimenting. 1. With large batch processing, the number of erroneous interruptions increased after a certain amount of time. 2. Individual repetitions were partially successful, so it is probably not solely due to the transcribed audio file. 3. I also installed the latest Nvidia drivers. That did not change anything.

If I understand correctly, the change in the yml file is only a temporary workaround that dispenses with CUDA and GPU, right?

Translated with DeepL.com (free version)

1 reply

kaixxx Dec 29, 2025
Maintainer

If I understand correctly, the change in the yml file is only a temporary workaround that dispenses with CUDA and GPU, right?

Yes, the change in the YAML file would switch to CPU processing for all jobs, which is probably not what you want.
Your RTX 5060 has 8GB VRAM, right? That should be plenty. I will try processing a larger batch on my PC with a RTX 3060.

mmffkiel · 2026-01-14T11:38:32Z

mmffkiel
Jan 14, 2026

Could the CUDA version be a factor? I have installed the latest CUDA version, 13.1. And I have seen that 12.8 is apparently more ideal for faster-whisper. Is 31.1 possibly not fully compatible?

1 reply

kaixxx Jan 14, 2026
Maintainer

Could the CUDA version be a factor?

Yes, that's a possibility. I am using CUDA 12.8. See her for the other requirements:
https://github.com/kaixxx/noScribe/blob/main/environments/requirements_win_cuda.txt

Batch transcription ? #52

Uh oh!

Replies: 14 comments · 31 replies

Uh oh!

kaixxx Apr 15, 2024 Maintainer

Uh oh!

translationsb Apr 15, 2024 Author

Uh oh!

Uh oh!

kaixxx Apr 26, 2024 Maintainer

Uh oh!

Uh oh!

kaixxx Sep 8, 2025 Maintainer

Uh oh!

Uh oh!

kaixxx Sep 8, 2025 Maintainer

Uh oh!

Uh oh!

kaixxx Sep 10, 2025 Maintainer

Uh oh!

kaixxx Oct 26, 2025 Maintainer

Uh oh!

Uh oh!

kaixxx Nov 5, 2025 Maintainer

Uh oh!

gernophil Nov 5, 2025 Collaborator

Uh oh!

kaixxx Nov 6, 2025 Maintainer

Uh oh!

Uh oh!

Uh oh!

kaixxx Nov 12, 2025 Maintainer

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

kaixxx Dec 7, 2025 Maintainer

Uh oh!

Uh oh!

kaixxx Dec 8, 2025 Maintainer

Uh oh!

Uh oh!

kaixxx Dec 18, 2025 Maintainer

Uh oh!

Uh oh!

kaixxx Dec 19, 2025 Maintainer

Uh oh!

Uh oh!

kaixxx Dec 28, 2025 Maintainer

Uh oh!

Uh oh!

kaixxx Dec 29, 2025 Maintainer

Uh oh!

Replies: 14 comments 31 replies

kaixxx
Apr 15, 2024
Maintainer

translationsb Apr 15, 2024
Author

kaixxx Apr 26, 2024
Maintainer

kaixxx Sep 8, 2025
Maintainer

kaixxx Sep 8, 2025
Maintainer

kaixxx
Sep 10, 2025
Maintainer

kaixxx Oct 26, 2025
Maintainer

kaixxx Nov 5, 2025
Maintainer

gernophil Nov 5, 2025
Collaborator

kaixxx Nov 6, 2025
Maintainer

kaixxx Nov 12, 2025
Maintainer

kaixxx Dec 7, 2025
Maintainer

kaixxx Dec 8, 2025
Maintainer

kaixxx Dec 18, 2025
Maintainer

kaixxx Dec 19, 2025
Maintainer

kaixxx Dec 28, 2025
Maintainer

kaixxx Dec 29, 2025
Maintainer