Audio & Speech Models
Run and fine-tune open-source audio models on Mixtrain — from industry-leading speech recognition to multilingual transcription.
Example Models
| Model | Provider | Parameters | Description |
|---|---|---|---|
| Parakeet TDT v3 | NVIDIA | 600M | Multilingual ASR supporting 25 languages with automatic language detection |
| Parakeet TDT v2 | NVIDIA | 600M | English ASR with industry-leading 6.05% WER, 50x faster than alternatives |
| Canary Qwen | NVIDIA | 2.5B | Top of the Hugging Face Open ASR Leaderboard (5.63% WER) |
| Whisper Large v3 | OpenAI | 1.5B | Multilingual speech recognition and translation |
Quick Start
from mixtrain import Model
# Load a speech recognition model
model = Model("parakeet-tdt-v3")
# Transcribe audio
result = model.run({
"audio": "my-workspace/recordings/meeting.wav"
})NVIDIA Parakeet
Parakeet is NVIDIA's family of FastConformer-based ASR models, offering the best accuracy-to-speed ratio in open-source speech recognition.
Parakeet TDT v3 extends v2 with multilingual support for 25 European languages, automatic language detection, and transcription of audio up to 24 minutes in a single pass (or up to 3 hours with local attention).
Parakeet TDT v2 delivers a 6.05% word error rate on English with an RTFx of 3,380 — trained on NVIDIA's 120,000-hour Granary dataset.
model = Model("parakeet-tdt-v3")
# Fine-tune for domain-specific vocabulary
model.train(
dataset="my-workspace/medical-transcripts",
steps=10000
)NVIDIA Canary Qwen
Canary Qwen 2.5B currently tops the Hugging Face Open ASR Leaderboard with 5.63% WER. It offers strong multilingual support and is well-suited for high-accuracy batch transcription workloads.
model = Model("canary-qwen-2.5b")
result = model.run({
"audio": "my-workspace/recordings/interview.wav"
})OpenAI Whisper
Whisper Large v3 remains a popular choice for multilingual transcription and translation, supporting 100+ languages with solid accuracy across diverse audio conditions.
model = Model("whisper-large-v3")
result = model.run({
"audio": "my-workspace/recordings/podcast.wav",
"language": "auto"
})