Skip to content

Just an .exe that can be used for those unable to build whisper.cpp in Windows.

Notifications You must be signed in to change notification settings

regstuff/whisper.cpp_windows

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 

Repository files navigation

whisper.cpp_windows

Just an .exe for those unable to build the excellent whisper.cpp in Windows. Will probably work only on x64 PCs.

Usage

  • Get the zip file from the Releases page.

  • Unzip it into a folder.

  • Download a model from https://ggml.ggerganov.com/ (let's say ggml-model-whisper-small.bin) and save it into the unzipped folder.

  • Go into the unzip folder, and launch a command line in it. (Type cmd in the address bar of the explorer window once you're in the folder)

  • Make sure your audio file is a wav file with 16000 Hz sample rate. Use ffmpeg to convert if needed: ffmpeg -i file.mp3 -ar 16000 -ac 1 -b:a 96K -acodec pcm_s16le file.wav

  • Then paste this into the command line (replace file.wav with the actual filename): main -m ggml-model-whisper-small.bin -t 4 -otxt -f file.wav

More Options

-t option decides how many threads to use for computation. Default is 4. Going beyond 8 can actually lower performance!

-l option specifies the language. Full list of languages is here.

If you specify the right language, it should transcribe by default. If you do not specify and language, and the audio is not in English, it will translate into English, rather than transcribe.

-otxt option will save a txt file in the same folder as the input wav file, once transcription is done. Other options are -osrt & -ovtt

So, an advanced command would be something like this: main -m ggml-model-whisper-small.bin -l Japanese -t 4 -otxt -f file.wav

All usage options are below. For details, check out OpenAI's Whisper or Whisper.cpp's repos.

Live Transcription

cd into the stream folder in the commandline and run stream -t 8 -m ..\ggml-model-whisper-tiny.bin

This will pick up the audio from the default microphone on your system, and will use 8 threads if your CPU has them (less if it doesn't).

Note that this process is very CPU intensive. There is also a delay of 10 seconds between the audio and its transcription. You can change this with the --length option. stream.exe -m ..\ggml-model-whisper-tiny.en.bin -t 8 --length 30

Setting length to 30 will have the least CPU load, and allow you to use larger models that are more accurate. The price is the delay in transcription. An 8-thread i5-8300H CPU can manage a length of 3 seconds with the tiny model and 6 seconds with the base model.

Full List of Options

usage: ./main [options] file0.wav file1.wav ...

options:
  -h,       --help           show this help message and exit
  -s SEED,  --seed SEED      RNG seed (default: -1)
  -t N,     --threads N      number of threads to use during computation (default: 4)
  -p N,     --processors N   number of processors to use during computation (default: 1)
  -ot N,    --offset-t N     time offset in milliseconds (default: 0)
  -on N,    --offset-n N     segment index offset (default: 0)
  -mc N,    --max-context N  maximum number of text context tokens to store (default: max)
  -ml N,    --max-len N      maximum segment length in characters (default: 0)
  -wt N,    --word-thold N   word timestamp probability threshold (default: 0.010000)
  -v,       --verbose        verbose output
            --translate      translate from source language to english
  -otxt,    --output-txt     output result in a text file
  -ovtt,    --output-vtt     output result in a vtt file
  -osrt,    --output-srt     output result in a srt file
  -owts,    --output-words   output script for generating karaoke video
  -ps,      --print_special  print special tokens
  -pc,      --print_colors   print colors
  -nt,      --no_timestamps  do not print timestamps
  -l LANG,  --language LANG  spoken language (default: en)
  -m FNAME, --model FNAME    model path (default: models/ggml-base.en.bin)
  -f FNAME, --file FNAME     input WAV file path

More Info

OpenAI's Whisper is a state of the art auto-transcription model. Unfortunately for some, it requires a GPU to be effective.

Whisper.cpp is an excellent port of Whisper in C++, which works quite well with a CPU, thereby eliminating the need for a GPU.

Whisper.cpp is quite easy to compile on Linux & MacOS. Non-technical Windows users may struggle a bit because of a lack of Make command in Windows. Compiling with MingW or Visual Studio will solve this issue. If that sounds too complicated for you, this exe might be useful.

The .exe is compiled from this commit.

You may need to install vcredist_x64. Get it here.

This exe is provided as is, and is not guaranteed to work on all systems. I've tested it on 5 WIndows 10/11 systems, and it worked on 4 of them. The fifth, where it didn't work was an old system with a lot of issues. Possibly that was the reason for failure.

Not sure if I will be able to release an exe every time Whisper.cpp is updated. This was compiled by a member of my team, and I thought I'd share it.

About

Just an .exe that can be used for those unable to build whisper.cpp in Windows.

Topics

Resources

Stars

Watchers

Forks

Packages

No packages published