Chroma Data Model

Collections
Databases
Tenants

Collections

A collection is the fundamental unit of storage and querying in Chroma. Each collection contains a set of items, where each item consists of:

An ID uniquely identifying the item
An embedding vector
Optional metadata (key-value pairs)
A document that belongs to the provided embedding

Collections are independently indexed and are optimized for fast retrieval using vector similarity, full-text search, and metadata filtering. In distributed deployments, collections can be sharded or migrated across nodes as needed; the system transparently manages paging them in and out of memory based on access patterns.

Databases

Collections are grouped into databases, which serve as a logical namespace. This is useful for organizing collections by purpose—for example, separating environments like “staging” and “production”, or grouping applications under a common schema. Each database contains multiple collections, and each collection name must be unique within a database.

Tenants

At the top level of the model is the tenant, which represents a single user, team, or account. Tenants provide complete isolation. No data or metadata, is shared across tenants. All access control, quota enforcement, and billing are scoped to the tenant level.

Architecture

Roadmap

⌘I

Overview

Run Chroma

Collections

Querying Collections

Embeddings

CLI

Collections

Databases

Tenants

Overview

Run Chroma

Collections

Querying Collections

Embeddings

CLI

​Collections

​Databases

​Tenants

Collections

Databases

Tenants