Evaluations let you compare outputs from different models side-by-side. The comparison view supports images, videos, 3D models, audio, and text - making it easy to visually assess quality across models.
Overview
An evaluation references columns from your datasets and displays them in a comparison grid. Typically each column represents a different model's output or metadata like latency/cost for the same inputs.
Creating an Evaluation
from mixtrain import Eval
eval = Eval.create(
name="flux-vs-sdxl",
config={
"datasets": [
{"tableName": "image-gen-results", "columnName": "prompt", "dataType": "text"},
{"tableName": "image-gen-results", "columnName": "flux_output", "dataType": "image"},
{"tableName": "image-gen-results", "columnName": "sdxl_output", "dataType": "image"},
]
},
description="Compare Flux Pro vs SDXL image outputs"
)The datasets array defines which columns to show in the comparison view:
tableName- The dataset containing the datacolumnName- The column to displaydataType- How to render:text,image,video,audio, or3d
Workflow: Generate and Compare
A typical workflow is to run multiple models on the same inputs, store results in a table, then create an evaluation.
from mixtrain import Model, Dataset
# Load test prompts
dataset = Dataset("test-prompts")
prompts = dataset.to_pandas()
# Run both models
flux = Model("flux-pro")
flex = Model("flux2-flex")
results = []
for _, row in prompts.iterrows():
prompt = row["prompt"]
results.append({
"prompt": prompt,
"fluxpro_output": flux.run(prompt=prompt).image.url,
"fluxflex_output": flex.run(prompt=prompt).image.url,
})
# Store results in lakehouse
import pandas as pd
df = pd.DataFrame(results)
Dataset.from_df("image-gen-results", df)Then create the evaluation to view results:
from mixtrain import Eval
eval = Eval.create(
name="flux-pro-vs-flex",
config={
"datasets": [
{"tableName": "image-gen-results", "columnName": "prompt", "dataType": "text"},
{"tableName": "image-gen-results", "columnName": "fluxpro_output", "dataType": "image"},
{"tableName": "image-gen-results", "columnName": "fluxflex_output", "dataType": "image"},
]
}
)Managing Evaluations
Get an Evaluation
from mixtrain import Eval
eval = Eval("flux-vs-sdxl")
print(eval.config)
print(eval.description)Update an Evaluation
eval.update(
description="Updated comparison",
config={
"datasets": [
{"tableName": "image-gen-results", "columnName": "prompt", "dataType": "text"},
{"tableName": "image-gen-results", "columnName": "fluxpro_output", "dataType": "image"},
{"tableName": "image-gen-results", "columnName": "fluxflex_output", "dataType": "image"},
{"tableName": "image-gen-results", "columnName": "dalle_output", "dataType": "image"},
]
}
)List All Evaluations
from mixtrain import list_evals
for eval in list_evals():
print(f"{eval.name}: {eval.description}")Delete an Evaluation
eval.delete()Supported Data Types
| Type | Description |
|---|---|
text | Plain text, markdown, or code |
image | Images (PNG, JPEG, WebP, etc.) |
video | Videos (MP4, WebM, etc.) |
audio | Audio files (MP3, WAV, etc.) |
3d | 3D models (GLB, GLTF) |