MixtrainDocs

Store and query datasets using Delta Lake table format with ACID transactions.

Setup

1. Add Provider

mixtrain provider add delta

Or via SDK:

from mixtrain import MixClient

client = MixClient()
client.create_dataset_provider(
    provider_type="delta",
    secrets={
        # Configure your cloud storage credentials
    }
)

Creating Datasets

from mixtrain import Dataset

dataset = Dataset.create_from_file(
    name="training-data",
    file_path="data.parquet",
    description="Training dataset"
)

Querying Datasets

from mixtrain import Dataset

dataset = Dataset("training-data")

# Convert to pandas DataFrame
df = dataset.to_pandas()

Features

  • ACID Transactions - Full transaction support for reliable data operations
  • Time Travel - Query data at any point in history
  • Schema Evolution - Add or modify columns without rewriting data

CLI

# Create dataset
mixtrain dataset create my-data data.parquet --provider delta

mixtrain dataset query my-data "SELECT * LIMIT 100"

# View metadata
mixtrain dataset metadata my-data

On this page