Sparse Vector Search Setup
Learn how to configure and use sparse vectors for keyword-based search, and combine them with dense embeddings for powerful hybrid search capabilities.What are Sparse Vectors?
Sparse vectors are high-dimensional vectors with mostly zero values, designed for keyword-based retrieval. Unlike dense embeddings which capture semantic meaning, sparse vectors excel at:- Exact keyword matching: Finding documents containing specific terms
- Domain-specific terminology: Better at matching technical terms, proper nouns, and rare words
- Lexical retrieval: BM25-style retrieval patterns
Enabling Sparse Vector Index
To use sparse vectors, add a sparse vector index to your schema. Thekey parameter is the metadata field name where sparse embeddings will be stored - you can name it whatever you want:
The
source_key specifies which field to generate sparse embeddings from (typically K.DOCUMENT for document text), and embedding_function specifies the function to generate the sparse embeddings. This example uses ChromaCloudSpladeEmbeddingFunction, but you can also use other sparse embedding functions like HuggingFaceSparseEmbeddingFunction or FastembedSparseEmbeddingFunction. The sparse embeddings are automatically generated and stored in the metadata field you specify as the key.Create Collection and Add Data
Create Collection with Schema
Add Data
When you add documents, sparse embeddings are automatically generated from the source key:Using Sparse Vectors for Search
Once configured, you can search using sparse vectors alone or combine them with dense embeddings for hybrid search.Sparse Vector Search
Use sparse vectors for keyword-based retrieval:Hybrid Search
Hybrid search combines dense semantic embeddings with sparse keyword embeddings for improved retrieval quality. By merging results from both approaches using Reciprocal Rank Fusion (RRF), you often achieve better results than either approach alone.Benefits of Hybrid Search
- Semantic + Lexical: Dense embeddings capture meaning while sparse vectors catch exact keywords
- Improved recall: Finds relevant documents that either semantic or keyword search might miss alone
- Balanced results: Combines the strengths of both retrieval methods
Combining Dense and Sparse with RRF
Use RRF (Reciprocal Rank Fusion) to merge dense and sparse search results:For comprehensive details on RRF parameters, weight tuning, and advanced hybrid search strategies, see the Search API Hybrid Search documentation.
Next Steps
- Search API Hybrid Search with RRF - Learn RRF parameters, weight tuning, and advanced strategies
- Index Configuration Reference - Detailed parameters for all index types
- Schema Basics - General Schema usage and patterns