MixtrainDocsBlog
from mixtrain import Embedding

Overview

Embedding represents a vector embedding for ML features or semantic search. It is not file-based — it holds the vector data directly.

Use it to return embeddings from models or mark dataset columns that contain vector values.

Constructor

Embedding(
    values: list[float],
    *,
    dimension: int | None = None,
    model: str | None = None,
)
ParameterTypeDescription
valueslist[float]The embedding vector
dimensionint | NoneOptional vector dimension hint. Use len(values) if you want to set it explicitly.
modelstr | NoneName of the model that generated this embedding
embedding = Embedding(
    values=[0.1, 0.2, 0.3, ...],
    dimension=1536,
    model="text-embedding-3-small"
)

Properties

PropertyTypeDescription
valueslist[float]The embedding vector
dimensionint | NoneOptional vector dimension hint
modelstr | NoneSource model name
embedding = Embedding(values=[0.1, 0.2, 0.3])

print(embedding.values)   # [0.1, 0.2, 0.3]
print(len(embedding.values))  # 3

Using Embedding

You can use Embedding in your models, datasets, workflows, and routines.

As output

from mixtrain import MixModel, Embedding

class TextEmbedder(MixModel):
    def run(self, inputs=None):
        vector = self._embed(inputs["text"])
        return {
            "embedding": Embedding(
                values=vector,
                model="my-embedder-v1"
            )
        }

In datasets

Use Embedding as a dataset column type when a column contains vectors:

from mixtrain import Dataset, Embedding

dataset = Dataset.from_file("data.parquet")
dataset.save(
    "search-data",
    column_types={
        "text_embedding": Embedding
    }
)

From model result

result = model.run({"text": "Hello world"})

embedding = result.embedding
print(f"Dimension: {embedding.dimension}")
print(f"First 3 values: {embedding.values[:3]}")

On this page