Where Are the Special Models?

Models are the new software. And just like software, the biggest impact will come from domain-specific models. So why aren't we seeing them? Since starting Mixtrain, this is the question I hear most. It's like Fermi's paradox for AI:

If specialized AI models are so obviously useful, where are they?

I think the question misreads the moment.

Industries are still climbing the adoption curve

Most teams start with the best available model on a single use case before expanding to more cases and more models. As the number of use cases grows, teams drown in the noise of new model releases. With scale, they worry about cost. As more use cases reach production, the focus shifts to capability, cost, and control. This is the natural progression: from a single API call to orchestration, to eval, to eventually training your own models. Most teams are still early on this curve. For many domains, AI capabilities have only recently crossed the useful threshold.

Capability means different things at different stages. Early on, it's binary: "Can the model do this at all?" As you move toward production, it becomes: "Can it do this consistently, at the quality bar my users expect, across the full distribution of inputs? Can it do this under cost and latency constraints? Can we patch it quickly when it fails?" Current AI models have what Ethan Mollick calls a "jagged frontier": surprisingly good at some hard things, surprisingly bad at some easy ones. That jaggedness is exactly what makes the capability-cost-control test so hard to pass for most AI projects.

If your use case sits where frontier labs compete, ride the capability curve. Wait for the next release. If it doesn't, improvements are incidental, not intentional. And incidental capability might not survive the next training run. General capability improvements don't guarantee domain alignment.

Specialization requires frontier infrastructure

Frontier labs have been post-training since ChatGPT/RLHF days to create specialized models. Now it's starting to diffuse. The most successful AI products at scale use specialized models. Cursor trains custom models with RL. Cognition built their own coding model. Specialized audio models are outperforming general-purpose ones. Foundation models in robotics are moving from research to deployment. World models are going from labs to real-world applications.

Specialization doesn't emerge from scaling alone. It emerges from owning the training loop: continuous eval, fast iteration, proprietary data. The research is catching up. The infrastructure isn't.

The long tail of models

I love frontier models. They helped create intelligence from sand. But three models behind an API won't capture the full complexity of the real world. And limited fine-tuning on a single model isn't a platform.

Reality has a surprising amount of detail, far larger than what's possible to capture with in-context learning. The visual difference between a healthy cell and a cancerous one, the precise geometry a robotic arm needs to grasp an object, the acoustic signature of a failing turbine. Frontier models capture only a fraction of this in their training data.

Think about it: Can you run a trillion parameter model in your self-driving car before making a decision? Does the model predicting protein folding need to generate the next Billboard song? Will the robot folding laundry need to write a French poem for you? These will be purpose-built models, bootstrapped from the knowledge in foundation models but tailored for the task.

While most inference compute runs through a handful of generalist models today, as more teams build for their domains, the long tail of specialized models will collectively dwarf the head. Robotics, life sciences, audio, video, and thousands of use cases we haven't imagined yet. That shift is already underway. Epoch.ai tracks around 1,500 notable AI models. HuggingFace alone hosts over 2.6 million public models. For every headline model, there are thousands of specialized variants: fine-tuned, adapted, merged for specific tasks. The long tail is already here. But most of it never makes it to production.

long tail of models

This isn't a bet against frontier models getting better. They will. I expect two modes: many use cases served well by inference APIs of frontier models, and many served far better by custom models built for a vertical. The better they get at general intelligence, the more they reveal what's possible, and the more teams realize they need something optimized to their specific constraints. General capabilities are the foundation that unlocks the long tail.

Missing multimodal platform

Multimodal models fail the capability-cost-control test harder than LLMs. They're slower, more expensive to run, and critically, less capable for domain-specific tasks. This is where specialized models are needed most, and where the conditions to build them are strongest.

The multimodal model ecosystem already looks like the long tail. Image generation, video understanding, robotics, speech all have thriving communities of new players and open-source models. Multimodal data is available in abundance. Teams building in these domains have rich foundations to adapt from, making the "tailor from a strong base" approach far more practical. And the most valuable data is already inside these organizations: the near-misses in self-driving, the odd objects in robotic manipulation, the rare anomalies in medical scans. This is data that foundation models have not seen in pre-training, and it's exactly what makes a specialized model worth building.

multimodal-models-growth

Building specialized models requires more than research breakthroughs and data. Teams need a platform to support them through the full adaptation lifecycle, with the ability to iterate fast as new models drop and requirements shift. The way you architect that platform is very different when you need to process a mix of videos or lidar data vs chunks of text. Your training, eval, and data infrastructure all need to be multimodal-aware. Robotics companies, scientific imaging teams, media companies working with video, 3D and domain-specific formats. These teams are building the next wave of AI applications. They need infrastructure built for their reality. Today.

Time to model

The next wave of AI won't be defined by general models, but by faster specialization loops. The metric that matters is time to model: how fast you can go from "which model works for my use case?" to running the right model in production. Sometimes that means finding an existing model that fits. Often it means building your own. And then doing it again when the next model drops, your data shifts, or your requirements change. The adaptation cycle needs to be fast. Just as "deploys per day" became the metric for modern software teams, "models deployed per month" is becoming the metric for AI-native ones. That cadence will accelerate from months to days. At that point, it becomes a continual learning system.

Teams that compress time to model will win.

This is why we built Mixtrain: a platform for the full model lifecycle, focused on mixmodality. Curate datasets across video, 3D, DICOM, PDB, or any domain-specific format. Train and adapt models with any method: fine-tuning, RL, distillation. Evaluate against both private and public models. All through a unified platform that works for both humans and AI agents.

The future will be mixed.

So, where are the special models? Let's go build them.