Transcribe audio and video, on your computer.

Drop in a recording, pick a language, get clean text or subtitles. Nothing is uploaded — the model runs locally.

What makes it different

100% local processing

Your files never leave your machine. No upload, no cloud, no waiting for a queue. Works offline after the first model download.

99 languages

OpenAI Whisper supports nearly a hundred languages with strong accuracy. Auto-detect is the default; pick a specific language if you want to force it.

Optional GPU acceleration

A single checkbox switches inference to your GPU through Vulkan — NVIDIA, AMD, Intel — and falls back silently to CPU if the driver does not cooperate.

TXT and SRT output

Save a clean text transcript, ready-to-use subtitles, or both from a single run — the heavy inference happens only once.

Every common audio/video format

WhatsApp voice notes (.opus, .ogg), OBS recordings (.mkv), YouTube downloads (.webm), MP3, MP4, MOV, WAV, FLAC, M4A — ffmpeg handles them all.

Minimal, anonymous telemetry

Your audio and transcripts never leave your machine. The app sends a single ping per installation — a random ID, the app version, and the system language — so it is possible to see which countries and languages to focus on. That is it. Underlying components (FFmpeg, whisper.cpp, Whisper.net, .NET) are open source and independently auditable.

How it works

  1. 1. Pick a file

    Choose any audio or video file on disk and the folder where the transcript should be saved.

  2. 2. Choose language and model

    Keep auto-detect or lock a specific language. Pick the Whisper model size (Tiny for speed, Medium or LargeV3 for accuracy on long recordings).

  3. 3. Transcribe

    The app extracts a clean audio track, runs Whisper, and writes the .txt and/or .srt next to your chosen folder. That is it.

Under the hood

Built with .NET 8 on Windows. Uses ffmpeg (LGPL) for format decoding and Whisper.net (MIT) wrapping whisper.cpp (MIT) for inference. Vulkan runtime for GPU, CPU with AVX fallback for everything else. Model files are downloaded once from Hugging Face (ggerganov/whisper.cpp) and cached under %LOCALAPPDATA%.

Model sizes

  • Tiny · 77 MB · fastest · quick drafts
  • Base · 142 MB · small step up
  • Small · 466 MB · recommended balance
  • Medium · 1.5 GB · strong accuracy on long files
  • Large-v3 · 3.1 GB · best quality, slowest

Frequently asked

Is it free?

Yes. The app is free. The Microsoft Store may charge a small one-time fee in some regions — that covers distribution, not the software.

Does it work offline?

Yes, after the first run. The first time you pick a Whisper model, the app downloads it from Hugging Face. After that, everything is local.

How accurate is it?

That depends on the model size and audio quality. For clean speech in a supported language, Medium and LargeV3 are close to professional transcription services. For noisy phone recordings in mixed languages, expect rough drafts.

Do you see my files?

No. The app has no server. Files are decoded, transcribed, and saved entirely on your machine. The only external connection is downloading the Whisper model the first time you use a given size.

Get the app

Available on the Microsoft Store for Windows 10 and 11. The Store build is self-contained — no .NET runtime needed.

Get it on Microsoft Store