MixtrainDocs

Evaluations let you compare outputs from different models side-by-side. The comparison view supports images, videos, 3D models, audio, and text - making it easy to visually assess quality across models.

Overview

An evaluation references columns from your datasets and displays them in a comparison grid. Typically each column represents a different model's output or metadata like latency/cost for the same inputs.

Creating an Evaluation

from mixtrain import Eval

eval = Eval.create(
    name="flux-vs-sdxl",
    config={
        "datasets": [
            {"tableName": "image-gen-results", "columnName": "prompt", "dataType": "text"},
            {"tableName": "image-gen-results", "columnName": "flux_output", "dataType": "image"},
            {"tableName": "image-gen-results", "columnName": "sdxl_output", "dataType": "image"},
        ]
    },
    description="Compare Flux Pro vs SDXL image outputs"
)

The datasets array defines which columns to show in the comparison view:

  • tableName - The dataset containing the data
  • columnName - The column to display
  • dataType - How to render: text, image, video, audio, or 3d

Workflow: Generate and Compare

A typical workflow is to run multiple models on the same inputs, store results in a table, then create an evaluation.

from mixtrain import Model, Dataset

# Load test prompts
dataset = Dataset("test-prompts")
prompts = dataset.to_pandas()

# Run both models
flux = Model("flux-pro")
flex = Model("flux2-flex")

results = []
for _, row in prompts.iterrows():
    prompt = row["prompt"]
    results.append({
        "prompt": prompt,
        "fluxpro_output": flux.run(prompt=prompt).image.url,
        "fluxflex_output": flex.run(prompt=prompt).image.url,
    })

# Store results in lakehouse
import pandas as pd
df = pd.DataFrame(results)
Dataset.from_df("image-gen-results", df)

Then create the evaluation to view results:

from mixtrain import Eval

eval = Eval.create(
    name="flux-pro-vs-flex",
    config={
        "datasets": [
            {"tableName": "image-gen-results", "columnName": "prompt", "dataType": "text"},
            {"tableName": "image-gen-results", "columnName": "fluxpro_output", "dataType": "image"},
            {"tableName": "image-gen-results", "columnName": "fluxflex_output", "dataType": "image"},
        ]
    }
)

Managing Evaluations

Get an Evaluation

from mixtrain import Eval

eval = Eval("flux-vs-sdxl")
print(eval.config)
print(eval.description)

Update an Evaluation

eval.update(
    description="Updated comparison",
    config={
        "datasets": [
            {"tableName": "image-gen-results", "columnName": "prompt", "dataType": "text"},
            {"tableName": "image-gen-results", "columnName": "fluxpro_output", "dataType": "image"},
            {"tableName": "image-gen-results", "columnName": "fluxflex_output", "dataType": "image"},
            {"tableName": "image-gen-results", "columnName": "dalle_output", "dataType": "image"},
        ]
    }
)

List All Evaluations

from mixtrain import list_evals

for eval in list_evals():
    print(f"{eval.name}: {eval.description}")

Delete an Evaluation

eval.delete()

Supported Data Types

TypeDescription
textPlain text, markdown, or code
imageImages (PNG, JPEG, WebP, etc.)
videoVideos (MP4, WebM, etc.)
audioAudio files (MP3, WAV, etc.)
3d3D models (GLB, GLTF)

Next Steps

  • Datasets - Store model outputs as datasets
  • Models - Run models to generate outputs for comparison

On this page