Microsoft’s new MAI-Transcribe-1 claims the top spot in speech recognition accuracy

Microsoft launched MAI-Transcribe-1, a new speech-to-text model that tops the FLEURS benchmark across 25 languages and is now available in public preview on Microsoft Foundry.

mai transcribe 1

Microsoft AI launched MAI-Transcribe-1 today, a multilingual speech-to-text model that claims the lowest word error rate on the FLEURS benchmark across 25 languages. It outperforms Scribe v2, Whisper-large-V3, GPT-Transcribe, and Gemini 3.1 Flash-Lite on that benchmark.

The model was built with messy real-world audio in mind. Background noise, low-quality recordings, overlapping speech, and heavy accents are all scenarios it’s designed to handle reliably.

Image: Microsoft

By the numbers:

  • Batch transcription runs 2.5x faster than Microsoft’s current Azure Fast offering
  • Priced at $0.36 per audio hour
  • Supports 25 languages with consistent accuracy across accents and speaking styles

Where it’s going: MAI-Transcribe-1 is already in phased rollout for Copilot Voice mode and Microsoft Teams transcription. For developers building voice agents, Microsoft positions it as the foundational speech layer, meant to be combined with MAI-Voice-1 for text-to-speech and a separate LLM for reasoning.

Where to try it: The model is in public preview on Microsoft Foundry and the Microsoft AI Playground.

Source: Microsoft

Efficienist Newsletter

Get the core business tech news delivered straight to your inbox. We track AI, automation, SaaS, and cybersecurity so you don't have to.

Just read what you want, and be done with it.

Read Next