Schema enables fine-grained control over index configuration on collections. Control which indexes are created, optimize for your workload, and enable advanced capabilities like hybrid search.
What is Schema?
Schema allows you to configure which indexes are created for different data types in your Chroma collections. You can enable or disable indexes globally or per-field, configure vector index parameters, and set up sparse vector indexes for keyword-based search.
Why Use Schema?
- Enable Hybrid Search: Combine dense and sparse embeddings for better retrieval quality
- Optimize Performance: Disable unused indexes to speed up writes and reduce index build time
- Fine-Tune Configuration: Adjust vector index parameters for your workload
Quick Start
Here’s a simple example creating a collection with a custom schema:
import chromadb
from chromadb import Schema, StringInvertedIndexConfig
# Connect to Chroma Cloud
client = chromadb.CloudClient(
tenant="your-tenant",
database="your-database",
api_key="your-api-key"
)
# Create a schema and disable string indexing globally
schema = Schema()
schema.delete_index(config=StringInvertedIndexConfig())
# Create collection with the schema
collection = client.create_collection(
name="my_collection",
schema=schema
)
# Add data - string metadata won't be indexed
collection.add(
ids=["id1", "id2"],
documents=["Document 1", "Document 2"],
metadatas=[
{"category": "science", "year": 2024},
{"category": "tech", "year": 2023}
]
)
# Querying on disabled index will raise an error
try:
collection.query(
query_texts=["query"],
where={"category": "science"} # Error: string index is disabled
)
except Exception as e:
print(f"Error: {e}")
Important: Schema is only configurable in create_collection. We are working on supporting schema update via collection modify
Feature Highlights
- Default Indexes: Collections start with sensible defaults - inverted indexes for scalar types, vector index for embeddings, full text search index for documents
- Global Configuration: Set index defaults that apply to all metadata keys of a given type during collection creation
- Per-Key Configuration: Override defaults for specific metadata fields
- Sparse Vector Support: Enable sparse embeddings for hybrid search with BM25-style retrieval
- Index Deletion: Disable indexes you don’t need to improve write performance
- Dynamic Schema Evolution: New metadata keys added during writes automatically inherit from global defaults
Next Steps