Skip to content
ai

Speech-to-text

Speech-to-text (Automatic Speech Recognition)

Definition

Speech-to-text (STR) systems convert audio waveforms containing spoken language into text transcriptions. Modern end-to-end neural models like OpenAI's Whisper use transformer encoders trained on large multilingual audio datasets to achieve near-human transcription accuracy.

STR is a foundational component of voice interfaces, meeting transcription tools, and audio content indexing systems.


Ship secure code faster

Crash Override integrates security into the developer workflow. No context switching, no waiting on reviews.