Roadmap

What is the core Chroma team working on right now?

Extending the search and retrieval capabilities for Chroma Cloud. Email us your feedback and ideas.

What did the Chroma team just complete?

Features like:

Chroma 1.0.0 - a complete rewrite of Chroma in Rust, giving users up to x4 performance boost.
A rewrite of our JS/TS client, with better DX and many quality of life improvements.
Persistent collection configuration on the server, unlocking many new features. For example, you no longer need to provide your embedding function on every get_collection.
The new Chroma CLI that lets you browse your collections locally, manage your Chroma Cloud DBs, and more!
Chroma Cloud!
Package Search MCP - Allow your coding agent to search and understand the source code of thousands of dependencies from npm, pypi and crates.
Github Sync - Chroma can handle ingesting, parsing, chunking, and embedding your codebase.

What will Chroma prioritize over the next 6mo?

Areas we will invest in Not an exhaustive list, but these are some of the core team’s biggest priorities over the coming few months. Use caution when contributing in these areas and please check-in with the core team first.

Workflow: Building tools for answer questions like: what embedding model should I use? And how should I chunk up my documents?
Visualization: Building visualization tool to give developers greater intuition embedding spaces
Query Planner: Building tools to enable pre-query and post-query transforms
Developer experience: Adding more features to our CLI
Easier Data Sharing: Working on formats for serialization and easier data sharing of embedding Collections
Improving recall: Fine-tuning embedding transforms through human feedback
Analytical horsepower: Clustering, deduplication, classification and more

What areas are great for community contributions?

This is where you have a lot more free reign to contribute (without having to sync with us first)! If you’re unsure about your contribution idea, feel free to chat with us (@chroma) in the #general channel on our Discord! We’d love to support you however we can.

Example Templates

We can always use more integrations with the rest of the AI ecosystem. Please let us know if you’re working on one and need help! Other great starting points for Chroma:

Google Colab

For those integrations we do have, like LangChain and LlamaIndex, we do always want more tutorials, demos, workshops, videos, and podcasts (we’ve done some pods on our blog).

Example Datasets

It doesn’t make sense for developers to embed the same information over and over again with the same embedding model. We’d like suggestions for:

“small” (<100 rows)
“medium” (<5MB)
“large” (>1GB)

datasets for people to stress test Chroma in a variety of scenarios.

Embeddings Comparison

Chroma does ship with Sentence Transformers by default for embeddings, but we are otherwise unopinionated about what embeddings you use. Having a library of information that has been embedded with many models, alongside example query sets would make it much easier for empirical work to be done on the effectiveness of various models across different domains.

Preliminary reading on Embeddings

Huggingface Benchmark of a bunch of Embeddings

notable issues with GPT3 Embeddings and alternatives to consider

Experimental Algorithms

If you have a research background, we welcome contributions in the following areas:

Projections (t-sne, UMAP, the new hotness, the one you just wrote) and Lightweight visualization
Clustering (HDBSCAN, PCA)
Deduplication
Multimodal (CLIP)
Fine-tuning manifold with human feedback eg
Expanded vector search (MMR, Polytope)
Your research

Please reach out and talk to us before you get too far in your projects so that we can offer technical guidance/align on roadmap.

Overview

Run Chroma

Collections

Querying Collections

Embeddings

CLI

What is the core Chroma team working on right now?

What did the Chroma team just complete?

What will Chroma prioritize over the next 6mo?

What areas are great for community contributions?

Example Templates

Example Datasets

Embeddings Comparison

Preliminary reading on Embeddings

Huggingface Benchmark of a bunch of Embeddings

notable issues with GPT3 Embeddings and alternatives to consider

Experimental Algorithms

Overview

Run Chroma

Collections

Querying Collections

Embeddings

CLI

​What is the core Chroma team working on right now?

​What did the Chroma team just complete?

​What will Chroma prioritize over the next 6mo?

​What areas are great for community contributions?

​Example Templates

​Example Datasets

​Embeddings Comparison

Preliminary reading on Embeddings

Huggingface Benchmark of a bunch of Embeddings

notable issues with GPT3 Embeddings and alternatives to consider

​Experimental Algorithms

What is the core Chroma team working on right now?

What did the Chroma team just complete?

What will Chroma prioritize over the next 6mo?

What areas are great for community contributions?

Example Templates

Example Datasets

Embeddings Comparison

Experimental Algorithms