Vision Models
Run and fine-tune open-source vision models on Mixtrain — from visual language models to segmentation and object detection.
Example Models
| Model | Provider | Parameters | Description |
|---|---|---|---|
| SmolVLM 2 | Hugging Face | 2.2B | Lightweight multimodal model for image and video understanding |
| SigLIP 2 | — | Visual encoder for image-text matching and classification | |
| SAM 2.1 | Meta | — | Segment anything in images and video |
| PaliGemma 2 | 3B–28B | Visual question answering, captioning, and OCR |
Quick Start
from mixtrain import Model
model = Model("smolvlm-2")
result = model.run({
"image": "my-workspace/images/sample.jpg",
"prompt": "Describe what you see in this image"
})