World Models
Post-training for world models
World models learn to simulate environments — predicting future states, understanding physics, and enabling agents to plan. Mixtrain provides the data and evaluation infrastructure to train them at scale.
World models need physics-aware evaluation
Image metrics don't capture what matters for world models. You need to measure long-horizon consistency, action-conditioned prediction accuracy, and physical plausibility — not just perceptual similarity between frames.
Most teams evaluate world models with repurposed video metrics and manual spot-checks. Mixtrain replaces that with structured evaluation built for environment simulation — from single-step prediction through multi-step rollouts to sim-to-real transfer.
Evaluation built for simulation
World models serve two purposes: predicting what happens next and enabling agents to plan. Mixtrain evaluates both with metrics that go beyond reconstruction error.
Prediction
- •Future state accuracy across single-step and multi-step horizons
- •Physics plausibility scoring — conservation laws, collision dynamics, gravity
- •Rollout stability metrics over extended time horizons
- •Action-conditioned prediction accuracy across diverse scenarios
Planning
- •Trajectory quality evaluation for model-based planning agents
- •Reward prediction accuracy and value function calibration
- •Sim-to-real transfer benchmarks with domain gap analysis
- •Counterfactual reasoning and branching scenario evaluation
What you get
Multi-modal environment datasets
Version and manage environment datasets spanning vision, proprioception, actions, and rewards. Slice by environment, episode, or transition.
Physics-aware evaluation
Evaluate predictions against physical constraints — conservation of energy, rigid body dynamics, contact forces, and object permanence.
Long-horizon benchmarks
Test rollout stability over hundreds of steps. Track error accumulation, drift detection, and compounding prediction failures.
Action-conditioned training
Build training pipelines that condition on action sequences. Support for discrete, continuous, and hierarchical action spaces.
Distributed training
Launch multi-node training jobs with your own scripts. Automatic checkpointing, configurable compute, and full experiment tracking.
Latent space analysis
Visualize and analyze learned representations. Track latent space structure, disentanglement metrics, and representation quality across training.
Sim-to-real evaluation
Benchmark transfer performance with structured domain gap analysis. Compare simulation predictions against real-world trajectories.
Production export
Export models with optimized inference configs, latency baselines, and regression monitoring. ONNX and TensorRT support.
Start building
Train world models that understand physics. Free to start, no credit card required.