Connect to SQL databases for dataset storage and querying.
Supported Databases
| Database | Provider Type |
|---|---|
| PostgreSQL | postgresql |
| MySQL | mysql |
| Snowflake | snowflake |
| BigQuery | bigquery |
| Databricks | databricks |
Setup
PostgreSQL
mixtrain provider add postgresqlclient.create_dataset_provider(
provider_type="postgresql",
secrets={
"host": "localhost",
"port": "5432",
"database": "mydb",
"user": "user",
"password": "..."
}
)Snowflake
mixtrain provider add snowflakeclient.create_dataset_provider(
provider_type="snowflake",
secrets={
"account": "xyz123.us-east-1",
"user": "user",
"password": "...",
"warehouse": "COMPUTE_WH",
"database": "MYDB"
}
)BigQuery
mixtrain provider add bigqueryclient.create_dataset_provider(
provider_type="bigquery",
secrets={
"project_id": "my-project",
"credentials_json": "{...}"
}
)Querying
table = client.get_dataset("my-table")
# Run SQL query
df = table.query("SELECT * FROM users WHERE created_at > '2024-01-01'")
pandas_df = df.to_pandas()CLI
# Query dataset
mixtrain dataset query my-table "SELECT * LIMIT 100"
mixtrain dataset metadata my-tableBest Practices
- Use connection pooling for production workloads
- Create read replicas for query-heavy operations
- Use parameterized queries to prevent SQL injection
- Set appropriate timeouts for long-running queries