Whisper

text
voice
An open‑source automatic speech recognition system built on an encoder‑decoder Transformer architecture that takes audio input and outputs corresponding transcriptions.

Homepage

Whisper is an open‑source automatic speech recognition (ASR) system developed by OpenAI, built on an encoder‑decoder Transformer architecture that takes audio input and outputs corresponding transcriptions.

It was trained on 680,000 hours of multilingual and multitask supervised data collected from the web, which gives it robustness to accents, background noise, and technical language, and enables it to transcribe in many languages or translate speech into English.