Evaluations let you compare outputs from different models side-by-side. The comparison view supports images, videos, 3D models, audio, and text - making it easy to visually assess quality across models.
Overview
An evaluation references columns from your datasets and displays them in a comparison grid. Typically each column represents a different model's output or metadata like latency/cost for the same inputs.
Creating an Evaluation
The easiest way to create an evaluation is from a dataset with column types. Eval.from_dataset() automatically reads the dataset's column types and builds the comparison config:
from mixtrain import Eval
eval = Eval.from_dataset("image-gen-results")This picks up all typed columns (image, video, audio, text, etc.) from the dataset and creates a side-by-side comparison view.
Selecting specific columns
Use the columns parameter to choose which columns to include and their order:
eval = Eval.from_dataset(
"image-gen-results",
name="flux-vs-sdxl",
columns=["prompt", "flux_output", "sdxl_output"]
)Manual configuration
For full control, use Eval.create() with an explicit config:
eval = Eval.create(
name="flux-vs-sdxl",
config={
"datasets": [
{"tableName": "image-gen-results", "columnName": "prompt", "dataType": "text"},
{"tableName": "image-gen-results", "columnName": "flux_output", "dataType": "image"},
{"tableName": "image-gen-results", "columnName": "sdxl_output", "dataType": "image"},
]
},
description="Compare Flux Pro vs SDXL image outputs"
)The datasets array defines which columns to show in the comparison view:
tableName- The dataset containing the datacolumnName- The column to displaydataType- How to render:text,image,video,audio, or3d
Workflow: Generate and Compare
A typical workflow is to run multiple models on the same inputs, store results in a dataset, then create an evaluation.
from mixtrain import Model, Dataset, Image
# Load test prompts
dataset = Dataset("test-prompts")
prompts = dataset.to_pandas()
# Run both models
flux = Model("flux-pro")
flex = Model("flux2-flex")
results = []
for _, row in prompts.iterrows():
prompt = row["prompt"]
results.append({
"prompt": prompt,
"fluxpro_output": flux.run(prompt=prompt).image.url,
"fluxflex_output": flex.run(prompt=prompt).image.url,
})
# Save results — column types are auto-detected from URLs
import pandas as pd
ds = Dataset.from_pandas(pd.DataFrame(results))
ds.save("image-gen-results")Then create the evaluation to view results:
from mixtrain import Eval
# Automatically uses column types from the dataset
eval = Eval.from_dataset(
"image-gen-results",
name="flux-pro-vs-flex",
columns=["prompt", "fluxpro_output", "fluxflex_output"]
)Managing Evaluations
Get an Evaluation
from mixtrain import Eval
eval = Eval("flux-vs-sdxl")
print(eval.config)
print(eval.description)Update an Evaluation
eval.update(
description="Updated comparison",
config={
"datasets": [
{"tableName": "image-gen-results", "columnName": "prompt", "dataType": "text"},
{"tableName": "image-gen-results", "columnName": "fluxpro_output", "dataType": "image"},
{"tableName": "image-gen-results", "columnName": "fluxflex_output", "dataType": "image"},
{"tableName": "image-gen-results", "columnName": "dalle_output", "dataType": "image"},
]
}
)List All Evaluations
from mixtrain import list_evals
for eval in list_evals():
print(f"{eval.name}: {eval.description}")Delete an Evaluation
eval.delete()Supported Data Types
| Type | Description |
|---|---|
text | Plain text, markdown, or code |
image | Images (PNG, JPEG, WebP, etc.) |
video | Videos (MP4, WebM, etc.) |
audio | Audio files (MP3, WAV, etc.) |
3d | 3D models (GLB, GLTF) |