MixtrainDocsBlog

Audio & Speech Models

Run and fine-tune open-source audio models on Mixtrain — from industry-leading speech recognition to multilingual transcription.

Example Models

ModelProviderParametersDescription
Parakeet TDT v3NVIDIA600MMultilingual ASR supporting 25 languages with automatic language detection
Parakeet TDT v2NVIDIA600MEnglish ASR with industry-leading 6.05% WER, 50x faster than alternatives
Canary QwenNVIDIA2.5BTop of the Hugging Face Open ASR Leaderboard (5.63% WER)
Whisper Large v3OpenAI1.5BMultilingual speech recognition and translation

Quick Start

from mixtrain import Model

# Load a speech recognition model
model = Model("parakeet-tdt-v3")

# Transcribe audio
result = model.run({
    "audio": "my-workspace/recordings/meeting.wav"
})

NVIDIA Parakeet

Parakeet is NVIDIA's family of FastConformer-based ASR models, offering the best accuracy-to-speed ratio in open-source speech recognition.

Parakeet TDT v3 extends v2 with multilingual support for 25 European languages, automatic language detection, and transcription of audio up to 24 minutes in a single pass (or up to 3 hours with local attention).

Parakeet TDT v2 delivers a 6.05% word error rate on English with an RTFx of 3,380 — trained on NVIDIA's 120,000-hour Granary dataset.

model = Model("parakeet-tdt-v3")

# Fine-tune for domain-specific vocabulary
model.train(
    dataset="my-workspace/medical-transcripts",
    steps=10000
)

NVIDIA Canary Qwen

Canary Qwen 2.5B currently tops the Hugging Face Open ASR Leaderboard with 5.63% WER. It offers strong multilingual support and is well-suited for high-accuracy batch transcription workloads.

model = Model("canary-qwen-2.5b")

result = model.run({
    "audio": "my-workspace/recordings/interview.wav"
})

OpenAI Whisper

Whisper Large v3 remains a popular choice for multilingual transcription and translation, supporting 100+ languages with solid accuracy across diverse audio conditions.

model = Model("whisper-large-v3")

result = model.run({
    "audio": "my-workspace/recordings/podcast.wav",
    "language": "auto"
})

On this page