MixtrainDocs

Create Dataset

mixtrain dataset create <name> <file>

Create a dataset from a file.

Supported formats: .parquet, .csv, .tsv

Options:

OptionDescription
--description, -dDataset description
mixtrain dataset create training-data data.parquet
mixtrain dataset create eval-set data.csv --description "Evaluation data"

List Datasets

mixtrain dataset list

List all datasets in the workspace.

mixtrain dataset list

Output:

| Name          | Rows    | Created    |
|---------------|---------|------------|
| training-data | 10,000  | 2024-01-15 |
| eval-set      | 500     | 2024-01-16 |

Query Dataset

mixtrain dataset query <name> [query]

Interactive TUI browser with search capabilities.

Controls:

  • Ctrl+F - Search
  • q - Quit
mixtrain dataset query training-data                           # Browse all
mixtrain dataset query training-data "SELECT * WHERE score > 0.8"
mixtrain dataset query training-data "SELECT id, text LIMIT 50"

View Metadata

mixtrain dataset metadata <name>

Display dataset schema and metadata.

mixtrain dataset metadata training-data

Output:

Dataset: training-data

Schema:
├── id: long
├── text: string
└── score: double

Delete Dataset

mixtrain dataset delete <name>

Delete a dataset.

Options:

OptionDescription
--yes, -ySkip confirmation
mixtrain dataset delete old-dataset
mixtrain dataset delete old-dataset --yes

On this page