Filtering with Where

Learn how to filter search results using Where expressions and the Key/K class to narrow down your search to specific documents, IDs, or metadata values.

The Key/K Class

The Key class (aliased as K for brevity) provides a fluent interface for building filter expressions. Use K to reference document fields, IDs, and metadata properties.

from chromadb import K

# K is an alias for Key - use K for more concise code
# Filter by metadata field
K("status") == "active"

# Filter by document content
K.DOCUMENT.contains("machine learning")

# Filter by document IDs
K.ID.is_in(["doc1", "doc2", "doc3"])

Filterable Fields

Field	Usage	Description
`K.ID`	`K.ID.is_in(["id1", "id2"])`	Filter by document IDs
`K.DOCUMENT`	`K.DOCUMENT.contains("text")`	Filter by document content
`K("field_name")`	`K("status") == "active"`	Filter by any metadata field

Comparison Operators

Supported operators:

== - Equality (all types: string, numeric, boolean)
!= - Inequality (all types: string, numeric, boolean)
> - Greater than (numeric only)
>= - Greater than or equal (numeric only)
< - Less than (numeric only)
<= - Less than or equal (numeric only)

# Equality and inequality (all types)
K("status") == "published"     # String equality
K("views") != 0                # Numeric inequality
K("featured") == True          # Boolean equality

# Numeric comparisons (numbers only)
K("price") > 100               # Greater than
K("rating") >= 4.5             # Greater than or equal
K("stock") < 10                # Less than
K("discount") <= 0.25          # Less than or equal

Chroma supports three data types for metadata: strings, numbers (int/float), and booleans. Order comparison operators (>, <, >=, <=) currently only work with numeric types.

Set and String Operators

Supported operators:

is_in() - Value matches any in the list
not_in() - Value doesn’t match any in the list
contains() - String contains substring (case-sensitive, currently K.DOCUMENT only)
not_contains() - String doesn’t contain substring (currently K.DOCUMENT only)
regex() - String matches regex pattern (currently K.DOCUMENT only)
not_regex() - String doesn’t match regex pattern (currently K.DOCUMENT only)

# Set membership operators (works on all fields)
K.ID.is_in(["doc1", "doc2", "doc3"])           # Match any ID in list
K("category").is_in(["tech", "science"])       # Match any category
K("status").not_in(["draft", "deleted"])       # Exclude specific values

# String content operators (currently K.DOCUMENT only)
K.DOCUMENT.contains("machine learning")        # Substring search in document
K.DOCUMENT.not_contains("deprecated")          # Exclude documents with text
K.DOCUMENT.regex(r"\bAPI\b")                   # Match whole word "API" in document

# Note: String pattern matching on metadata fields not yet supported
# K("title").contains("Python")                # NOT YET SUPPORTED
# K("email").regex(r".*@company\.com$")        # NOT YET SUPPORTED

String operations like contains() and regex() are case-sensitive by default. The is_in() operator is efficient even with large lists.

Logical Operators

Supported operators:

& - Logical AND (all conditions must match)
| - Logical OR (any condition can match)

Combine multiple conditions using these operators. Always use parentheses to ensure correct precedence.

# AND operator (&) - all conditions must match
(K("status") == "published") & (K("year") >= 2020)

# OR operator (|) - any condition can match
(K("category") == "tech") | (K("category") == "science")

# Combining with document and ID filters
(K.DOCUMENT.contains("AI")) & (K("author") == "Smith")
(K.ID.is_in(["id1", "id2"])) | (K("featured") == True)

# Complex nesting - use parentheses for clarity
(
    (K("status") == "published") &
    ((K("category") == "tech") | (K("category") == "science")) &
    (K("rating") >= 4.0)
)

Always use parentheses around each condition when using logical operators. Python’s operator precedence may not work as expected without them.

Dictionary Syntax (MongoDB-style)

You can also use dictionary syntax instead of K expressions. This is useful when building filters programmatically. Supported dictionary operators:

Direct value - Shorthand for equality
$eq - Equality
$ne - Not equal
$gt - Greater than (numeric only)
$gte - Greater than or equal (numeric only)
$lt - Less than (numeric only)
$lte - Less than or equal (numeric only)
$in - Value in list
$nin - Value not in list
$contains - String contains
$not_contains - String doesn’t contain
$regex - Regex match
$not_regex - Regex doesn’t match
$and - Logical AND
$or - Logical OR

# Direct equality (shorthand)
{"status": "active"}                        # Same as K("status") == "active"

# Comparison operators
{"status": {"$eq": "published"}}            # Same as K("status") == "published"
{"count": {"$ne": 0}}                       # Same as K("count") != 0
{"price": {"$gt": 100}}                     # Same as K("price") > 100 (numbers only)
{"rating": {"$gte": 4.5}}                   # Same as K("rating") >= 4.5 (numbers only)
{"stock": {"$lt": 10}}                      # Same as K("stock") < 10 (numbers only)
{"discount": {"$lte": 0.25}}                # Same as K("discount") <= 0.25 (numbers only)

# Set membership operators
{"#id": {"$in": ["id1", "id2"]}}            # Same as K.ID.is_in(["id1", "id2"])
{"category": {"$in": ["tech", "ai"]}}       # Same as K("category").is_in(["tech", "ai"])
{"status": {"$nin": ["draft", "deleted"]}}  # Same as K("status").not_in(["draft", "deleted"])

# String operators (currently K.DOCUMENT only)
{"#document": {"$contains": "API"}}         # Same as K.DOCUMENT.contains("API")
# {"title": {"$not_contains": "draft"}}     # Not yet supported - metadata fields
# {"email": {"$regex": ".*@example\\.com"}} # Not yet supported - metadata fields
# {"version": {"$not_regex": "^beta"}}      # Not yet supported - metadata fields

# Logical operators
{"$and": [
    {"status": "published"},
    {"year": {"$gte": 2020}},
    {"#document": {"$contains": "AI"}}
]}                                          # Combines multiple conditions with AND

{"$or": [
    {"category": "tech"},
    {"category": "science"},
    {"featured": True}
]}                                          # Combines multiple conditions with OR

# Complex nested example
{
    "$and": [
        {"$or": [
            {"category": "tech"},
            {"category": "science"}
        ]},
        {"status": "published"},
        {"quality_score": {"$gte": 0.8}}
    ]
}

Each dictionary can only contain one field or one logical operator ($and/$or). For field dictionaries, only one operator is allowed per field.

Common Filtering Patterns

# Filter by specific document IDs
search = Search().where(K.ID.is_in(["doc_001", "doc_002", "doc_003"]))

# Exclude already processed documents
processed_ids = ["doc_100", "doc_101"]
search = Search().where(K.ID.not_in(processed_ids))

# Full-text search in documents
search = Search().where(K.DOCUMENT.contains("quantum computing"))

# Combine document search with metadata
search = Search().where(
    K.DOCUMENT.contains("machine learning") &
    (K("language") == "en")
)

# Price range filtering
search = Search().where(
    (K("price") >= 100) &
    (K("price") <= 500)
)

# Multi-field filtering
search = Search().where(
    (K("status") == "active") &
    (K("category").is_in(["tech", "ai", "ml"])) &
    (K("score") >= 0.8)
)

Edge Cases and Important Behavior

Missing Keys

When filtering on a metadata field that doesn’t exist for a document:

Most operators (==, >, <, >=, <=, is_in()) evaluate to false - the document won’t match
!= evaluates to true - documents without the field are considered “not equal” to any value
not_in() evaluates to true - documents without the field are not in any list

# If a document doesn't have a "category" field:
K("category") == "tech"         # false - won't match
K("category") != "tech"         # true - will match
K("category").is_in(["tech"])   # false - won't match
K("category").not_in(["tech"])  # true - will match

Mixed Types

Avoid storing different data types under the same metadata key across documents. Query behavior is undefined when comparing values of different types.

# DON'T DO THIS - undefined behavior
# Document 1: {"score": 95}      (numeric)
# Document 2: {"score": "95"}    (string)
# Document 3: {"score": true}    (boolean)

K("score") > 90  # Undefined results when mixed types exist

# DO THIS - consistent types
# All documents: {"score": <numeric>} or all {"score": <string>}

String Pattern Matching Limitations

Currently, contains(), not_contains(), regex(), and not_regex() operators only work on K.DOCUMENT. These operators do not yet support metadata fields. Additionally, the pattern must contain at least 3 literal characters to ensure accurate results.

# Currently supported - K.DOCUMENT only
K.DOCUMENT.contains("API")              # Works
K.DOCUMENT.regex(r"v\d\.\d\.\d")       # Works
K.DOCUMENT.contains("machine learning") # Works

# NOT YET SUPPORTED - metadata fields
K("title").contains("Python")           # Not supported yet
K("description").regex(r"API.*")        # Not supported yet

# Pattern length requirements (for K.DOCUMENT)
K.DOCUMENT.contains("API")              # 3 characters - good
K.DOCUMENT.contains("AI")               # Only 2 characters - may give incorrect results
K.DOCUMENT.regex(r"\d+")                # No literal characters - may give incorrect results

String pattern matching currently only works on K.DOCUMENT. Support for metadata fields is not yet available. Also, patterns with fewer than 3 literal characters may return incorrect results.

String pattern matching on metadata fields is not currently supported. Full support is coming in a future release, which will allow users to opt-in to additional indexes for string pattern matching on specific metadata fields.

Complete Example

Here’s a practical example combining different filter types:

from chromadb import Search, K, Knn

# Complex filter combining IDs, document content, and metadata
search = (Search()
    .where(
        # Exclude specific documents
        K.ID.not_in(["excluded_001", "excluded_002"]) &

        # Must contain specific content
        K.DOCUMENT.contains("artificial intelligence") &

        # Metadata conditions
        (K("status") == "published") &
        (K("quality_score") >= 0.75) &
        (
            (K("category") == "research") |
            (K("category") == "tutorial")
        ) &
        (K("year") >= 2023)
    )
    .rank(Knn(query="latest AI research developments"))
    .limit(10)
    .select(K.DOCUMENT, "title", "author", "year")
)

results = collection.search(search)

Tips and Best Practices

Use parentheses liberally when combining conditions with & and | to avoid precedence issues
Filter before ranking when possible to reduce the number of vectors to score
Be specific with ID filters - using K.ID.is_in() with a small list is very efficient
String matching is case-sensitive - normalize your data if case-insensitive matching is needed
Use the right operator - is_in() for multiple exact matches, contains() for substring search

Next Steps

Learn about ranking and scoring to order your filtered results
See practical examples of filtering in real-world scenarios
Explore batch operations for running multiple filtered searches

Features

Schema

Search API

Sync

Package Search

Filtering with Where

Filtering with Where

The Key/K Class

Filterable Fields

Comparison Operators

Set and String Operators

Logical Operators

Dictionary Syntax (MongoDB-style)

Common Filtering Patterns

Edge Cases and Important Behavior

Missing Keys

Mixed Types

String Pattern Matching Limitations

Complete Example

Tips and Best Practices

Next Steps

Features

Schema

Search API

Sync

Package Search

​Filtering with Where

​The Key/K Class

​Filterable Fields

​Comparison Operators

​Set and String Operators

​Logical Operators

​Dictionary Syntax (MongoDB-style)

​Common Filtering Patterns

​Edge Cases and Important Behavior

​Missing Keys

​Mixed Types

​String Pattern Matching Limitations

​Complete Example

​Tips and Best Practices

​Next Steps

Filtering with Where

The Key/K Class

Filterable Fields

Comparison Operators

Set and String Operators

Logical Operators

Dictionary Syntax (MongoDB-style)

Common Filtering Patterns

Edge Cases and Important Behavior

Missing Keys

Mixed Types

String Pattern Matching Limitations

Complete Example

Tips and Best Practices

Next Steps