Evaluations
Compare public and private model outputs side-by-side on a dataset.
Text-to-Image Evaluation
Compare image generation models on the same prompts:
from mixtrain import MixFlow, Model, Dataset, Eval, Image
class T2IEvaluation(MixFlow):
"""Compare text-to-image models side-by-side."""
def run(
self,
input_dataset: Dataset,
models: list[Model] | None = None,
limit: int = -1,
):
"""Run text-to-image evaluation.
Args:
input_dataset: Dataset containing prompts
models: Models to compare (default: flux-pro, stable-diffusion-xl)
limit: Number of prompts (-1 for all)
"""
if models is None:
models = [Model("flux-pro"), Model("stable-diffusion-xl")]
# Load prompts from dataset
prompts = input_dataset.to_pandas()
if limit > 0:
prompts = prompts.head(limit)
results = []
for _, row in prompts.iterrows():
prompt = row["prompt"]
result = {"prompt": prompt}
# Run each model
for model in models:
output = model.run({"prompt": prompt})
result[f"{model.name}_image"] = output.image.url
results.append(result)
# Create output dataset and evaluation
output_dataset = Dataset.create_from_dataframe(
pd.DataFrame(results),
name="t2i-comparison"
)
return {
"evaluation": Eval.create(
name="t2i-eval",
dataset=output_dataset
),
"dataset": output_dataset
}Video Generation Evaluation
Compare video generation models:
class VideoEvaluation(MixFlow):
def run(
self,
prompts: list[str],
models: list[Model] | None = None,
):
"""Run video generation evaluation.
Args:
prompts: List of prompts to evaluate
models: Models to compare (default: hunyuan-video, runway-gen3)
"""
if models is None:
models = [Model("hunyuan-video"), Model("runway-gen3")]
results = Model.batch(
models=models,
inputs_list=[{"prompt": p} for p in prompts],
max_in_flight=10
)
# Create comparison view
return {"evaluation": Eval.create(...)}Running Evaluations
# Create evaluation workflow
mixtrain workflow create eval_t2i.py \
--name t2i-evaluation
# Run with specific models and dataset
mixtrain workflow run t2i-evaluation \
--input '{"models": ["flux-pro", "dalle-3"], "input_dataset": "my-prompts", "limit": 100}'Running the workflow will create an evaluation and return a new dataset of containing the model outputs. The above command will print link to the workflow run, where you can find the link to the evaluation in the Mixtrain UI.
Viewing Results
After running the workflow, you can view the evaluation results in the Mixtrain UI:
- Go to the workflow run page. This is also the link printed by the
mixtrain workflow runcommand. - Find the link to the evaluation in the "Outputs" section.
- Click the link to view the evaluation results.
Alternatively, you can view the evaluation results in the Mixtrain UI:
- Go to the Evaluations tab in your workspace.
- Find the evaluation by name.
- Click the evaluation to view the results.
Next Steps
- Evaluations Guide - Full documentation
- Models Guide - Working with models