MixtrainDocsBlog

Vision Models

Run and fine-tune open-source vision models on Mixtrain — from visual language models to segmentation and object detection.

Example Models

ModelProviderParametersDescription
SmolVLM 2Hugging Face2.2BLightweight multimodal model for image and video understanding
SigLIP 2GoogleVisual encoder for image-text matching and classification
SAM 2.1MetaSegment anything in images and video
PaliGemma 2Google3B–28BVisual question answering, captioning, and OCR

Quick Start

from mixtrain import Model

model = Model("smolvlm-2")

result = model.run({
    "image": "my-workspace/images/sample.jpg",
    "prompt": "Describe what you see in this image"
})

On this page