JinaAI
Chroma provides a convenient wrapper around JinaAI’s embedding API. This embedding function runs remotely on JinaAI’s servers, and requires an API key. You can get an API key by signing up for an account at JinaAI.model_name argument, which lets you choose which Jina model to use. By default, Chroma uses jina-embedding-v2-base-en.
Late Chunking Example
jina-embeddings-v3 supports Late Chunking, a technique to leverage the model’s long-context capabilities for generating contextual chunk embeddings. Includelate_chunking=True in your request to enable contextual chunked representation. When set to true, Jina AI API will concatenate all sentences in the input field and feed them as a single string to the model. Internally, the model embeds this long concatenated string and then performs late chunking, returning a list of embeddings that matches the size of the input list.
Task parameter
jina-embeddings-v3 has been trained with 5 task-specific adapters for different embedding uses. Include task in your request to optimize your downstream application:
retrieval.query: Used to encode user queries or questions in retrieval tasks.retrieval.passage: Used to encode large documents in retrieval tasks at indexing time.classification: Used to encode text for text classification tasks.text-matching: Used to encode text for similarity matching, such as measuring similarity between two sentences.separation: Used for clustering or reranking tasks.