Skip to main content

Search Basics

Learn how to construct and use the Search class for querying your Chroma collections. This page covers the basics of Search construction. For detailed usage of specific components, see:

The Search Class

from chromadb import Search

# Create an empty search
search = Search()

# Direct construction with parameters
search = Search(
    where={"status": "active"},
    rank={"$knn": {"query": [0.1, 0.2]}},
    limit=10,
    select=["#document", "#score"]
)

Constructor Parameters

The Search class accepts four optional parameters:
  • where: Filter expressions to narrow down results
    • Types: Where expression, dict, or None
    • Default: None (no filtering)
  • rank: Ranking expressions to score and order results
    • Types: Rank expression, dict, or None
    • Default: None (no ranking, natural order)
  • limit: Pagination control
    • Types: Limit object, dict, int, or None
    • Default: None (no limit)
  • select: Fields to include in results
    • Types: Select object, dict, list, set, or None
    • Default: None (returns IDs only)
    • Available fields: #id, #document, #embedding, #metadata, #score, or any custom metadata field
    • See field selection for details

Builder Pattern

The Search class provides a fluent interface with method chaining. Each method returns a new Search instance, making queries immutable and safe to reuse. For detailed usage of each builder method, see the respective sections:
from chromadb import Search, K, Knn

# Basic method chaining
search = (Search()
    .where(K("status") == "published")
    .rank(Knn(query="machine learning applications"))
    .limit(10)
    .select(K.DOCUMENT, K.SCORE))

# Each method returns a new instance
base_search = Search().where(K("category") == "science")
search_v1 = base_search.limit(5)  # New instance
search_v2 = base_search.limit(10) # Different instance

# Progressive building
search = Search()
search = search.where(K("status") == "active")
search = search.rank(Knn(query="recent advances in quantum computing"))
search = search.limit(20)
search = search.select(K.DOCUMENT, K.METADATA)
Benefits of immutability:
  • Base queries can be reused safely
  • No unexpected side effects from modifications
  • Easy to create query variations

Direct Construction

You can create Search objects directly with various parameter types:
from chromadb import Search, K, Knn
from chromadb.execution.expression.operator import Limit, Select

# With expression objects
search = Search(
    where=K("status") == "active",
    rank=Knn(query="latest research papers"),
    limit=Limit(limit=10, offset=0),
    select=Select(keys={K.DOCUMENT, K.SCORE})
)

# With dictionaries (MongoDB-style)
search = Search(
    where={"status": "active"},
    rank={"$knn": {"query": "latest research papers"}},
    limit={"limit": 10, "offset": 0},
    select={"keys": ["#document", "#score"]}
)

# Mixed types
search = Search(
    where=K("category") == "science",           # Expression
    rank={"$knn": {"query": "quantum mechanics"}},  # Dictionary
    limit=10,                                   # Integer
    select=[K.DOCUMENT, K.SCORE, "author"]      # List
)

# Minimal search (IDs only)
search = Search()

# Just filtering
search = Search(where=K("status") == "published")

# Just ranking
search = Search(rank=Knn(query="artificial intelligence"))

Dictionary Format Specification

When using dictionaries to construct Search objects, follow this format. For complete operator schemas:
# Where dictionary (MongoDB-style operators)
# Note: Each dict can only have one field or one logical operator

# Simple equality
where_dict = {"status": "active"}

# Comparison operator
where_dict = {"score": {"$gt": 0.5}}

# Logical AND combination
where_dict = {
    "$and": [
        {"status": "active"},
        {"category": "science"},
        {"year": {"$gte": 2020}}
    ]
}

# Logical OR combination
where_dict = {
    "$or": [
        {"category": "science"},
        {"category": "technology"}
    ]
}

# Rank dictionary
rank_dict = {
    "$knn": {
        "query": "machine learning research",  # Query text or embedding
        "key": "#embedding",              # Optional: field to search
        "limit": 128                      # Optional: max candidates
    }
}

# Limit dictionary
limit_dict = {
    "limit": 10,                          # Number of results
    "offset": 20                          # Skip first N results
}

# Select dictionary
# Keys can be predefined fields (with # prefix) or custom metadata fields
select_dict = {
    "keys": [
        "#id",          # Document ID (always returned)
        "#document",    # Document content
        "#embedding",   # Embedding vectors
        "#metadata",    # All metadata (includes all custom fields)
        "#score",       # Search score (when ranking is used)
    ]
}

# Or select specific metadata fields only (without #metadata)
select_dict = {
    "keys": [
        "#document",
        "#score",
        "title",        # Specific metadata field
        "author"        # Specific metadata field
    ]
}
# Note: Using #metadata returns ALL metadata fields, so no need to list individual fields
# For more details on field selection, see: ./pagination-selection#field-selection

# Complete search with dictionaries
search = Search(
    where=where_dict,
    rank=rank_dict,
    limit=limit_dict,
    select=select_dict
)

Empty Search Behavior

An empty Search object has specific default behaviors:
# Empty search
search = Search()

# Equivalent to:
# - where: None (returns all documents)
# - rank: None (natural storage order)
# - limit: None (no limit on results)
# - select: None (returns IDs only)

result = collection.search(search)
# Result contains only IDs, no documents/embeddings/metadata/scores

# Add selection to get more fields
search = Search().select(K.DOCUMENT, K.METADATA)
result = collection.search(search)
# Now includes documents and metadata
When no limit is specified, Chroma Cloud will apply a default limit based on your quota to prevent returning excessive results. For production use, it’s recommended to always specify an explicit limit.

Common Initialization Patterns

Here are common patterns for building Search queries:
from chromadb import Search, K, Knn

# Pattern 1: Baseline - no filter, no rank (natural storage order)
def get_documents():
    return Search().select(K.DOCUMENT, K.METADATA)

# Pattern 2: Filter only - no ranking
def filter_recent_science():
    return (Search()
        .where((K("category") == "science") & (K("year") >= 2023))
        .limit(10)
        .select(K.DOCUMENT, K.METADATA))

# Pattern 3: Rank only - no filtering
def search_similar(query):
    return (Search()
        .rank(Knn(query=query))
        .limit(10)
        .select(K.DOCUMENT, K.SCORE))

# Pattern 4: Both filter and rank
def search_recent_science(query):
    return (Search()
        .where((K("category") == "science") & (K("year") >= 2023))
        .rank(Knn(query=query))
        .limit(10)
        .select(K.DOCUMENT, K.SCORE))

Next Steps