ElevenLabs launches Scribe v2 Realtime, its most advanced low-latency Speech to Text model

Nov 12th 2025: ElevenLabs, a global leader in AI-driven voice and audio innovation, today launched Scribe v2 Realtime, its most advanced  Speech to Text model that delivers live transcription in under 150 milliseconds with leading accuracy across more than 90 languages, including 11 Indian languages such as Hindi, Tamil, Malayalam, Telugu, Gujarati, Kannada, Odia, Bengali, Marathi, Punjabi and Sindhi

The model sets a new benchmark for real-time multilingual communication, helping developers and enterprises build faster, more natural, and inclusive voice experiences  across industries—from customer engagement and healthcare to media, education, and live streaming

It achieves 93.5% accuracy on the FLEURS benchmark across 30 European and Asian languages.

A leap forward in real-time transcription

Built for developers and enterprises creating voice assistants, meeting tools, and live captioning applications, Scribe v2 Realtime enables human-level understanding and immediate response in live environments. 

The model supports negative latency prediction, text conditioning, voice activity detection (VAD), and manual commit controls for fine-tuned streaming performance.

Enterprise use cases include real-time transcription of customer calls for voice AI agents and compliance monitoring, live medical dictation and clinical documentation, real-time meeting transcription, captions for media and streaming, and accessibility support in education among others.

Built for India and global scale

ElevenLabs has prioritized data localization with India data residency options, enabling organizations to deploy Speech to Text solutions in compliance with India’s data regulations. 

Scribe v2 Realtime integrates seamlessly with ElevenLabs Agents, allowing developers to power natural, human-like conversational systems for support, sales, or in-product experiences

Key capabilities include:

  • Ultra-low latency (<150 ms) live transcription
  • Negative latency  Next word and punctuation prediction
  • Streaming support Send audio in chunks while receiving transcripts in real-time
  • Voice Activity Detection (VAD): Automatic speech segmentation based on silence detection
  • Support for 90+ languages, including Indian regional languages
  • Custom vocabulary for domain-specific accuracy (currently on Scribe V2)
  • Smart speaker diarization and precise timestamps (currently on Scribe V2)
  • Zero retention mode for sensitive workloads
  • Full enterprise compliance with Indian and global standards

Available today through the ElevenLabs API

Scribe v2 Realtime is available now via the ElevenLabs API at https://elevenlabs.io/docs/capabilities/speech-to-text . Developers can also deploy Scribe v2 Realtime directly in ElevenLabs Agents, bringing instant, human-quality transcription to real-world applications.

Read more: Gaurav Om Sharma Steps Down as Senior Vice President at Leo Burnett

Author Profile

About News Bureau

View all posts by News Bureau