Agentic Search
What were the key factors behind our Q3 sales growth, and how do they compare to industry trends?Suppose you have Chroma collections storing quarterly reports, sales data, and industry research papers. A simple retrieval approach might query the sales-data collection—or even all collections at once—retrieve the top results, and pass them to an LLM for answer generation. However, this single-step retrieval strategy has critical limitations:
- It can’t decompose complex questions - This query contains multiple sub-questions: internal growth factors, external industry trends, and comparative analysis. The information needed may be scattered across different collections and semantically dissimilar documents.
- It can’t adapt its search strategy - If the first retrieval returns insufficient context about industry trends, there’s no mechanism to refine the query and search again with a different approach.
- It can’t handle ambiguous terms - “Q3” could refer to different years across your collections, while “sales growth” might mean unit sales, revenue, or profit margins. A single query has no way to disambiguate and search accordingly.
- Plans - Breaks down complex queries into a sequence of retrieval steps
- Executes - Performs targeted searches across Chroma collections or using other tools
- Evaluates - Assesses whether the retrieved information answers the question or identifies gaps
- Iterates - Refines the plan and repeats steps 2-3 based on what it has learned so far
- Synthesizes - Combines information from multiple retrievals to form a comprehensive answer
Example Agent Execution
Example Agent Execution
- Legal assistants search across case law databases, statutes, regulatory documents, and internal firm precedents.
- Medical AI systems query across clinical guides, research papers, patient records, and drug databases to support medical reasoning.
- Customer support AI agents navigate product documentation, past ticket resolutions, and company knowledge bases, while dynamically adjusting their search based on specific use cases.
- Coding assistants search across documentation, code repositories, and issue trackers to help developers solve problems.
- Query Planning - using the LLM to analyze the user’s question and generate a structured plan, breaking the input query down to sub-queries that can be addressed step-by-step.
- Tool Use - the agent has access to a suite of tools - such as querying Chroma collections, searching the internet, and using other APIs. For each step of the query plan, we ask an LLM to repeatedly call tools to gather information for the current step.
- Reflection and Evaluation - at each step, we use an LLM to evaluate the retrieved results, determining if they’re sufficient, relevant, or if we need to revise the rest of our plan.
- State Management and Memory - the agent maintains context across all steps, tracking retrieved information, remaining sub-queries, and intermediate findings that inform subsequent retrieval decisions.
BrowseComp-Plus
In this guide we will build a Search Agent from scratch. Our agent will be able to answer queries from the BrowseComp-Plus dataset, which is based on OpenAI’s BrowseComp benchmark. The dataset contains challenging questions that need multiple rounds of searching and reasoning to answer correctly. This makes it ideal for demonstrating how to build an agentic search system and how tuning each of its components (retrieval, reasoning, model selection, and more) affects overall performance. Every query in the BrowseComp-Plus dataset has- Gold docs - that are needed to compile the final correct answer for the query.
- Evidence docs - are needed to answer the query but may not directly contain the final answer themselves. They provide supporting information required for reasoning through the problem. The gold docs are a subset of the evidence docs.
- Negative docs - are included to deliberately make answering the query more difficult. They are introduced to distract the agent, and force it to distinguish between relevant and irrelevant information.
770:
- 6753
- 68484
- 1735
- 60284
query_id: The BrowseComp-Plus query ID.query: Set totrue, indicating this is a query record.gold_docs: The list of gold doc IDs needed to answer this query
doc_id: The original BrowseComp-Plus document ID this record was chunked from.index: The order in which this chunk appears in the original document. This is useful if we want to reconstruct the original documents.
Running the Agent
Before we start walking through the implementation, let’s run the agent to get a sense of what we’re going to build.1
2
Use the “Create Database” button on the top right of the Chroma Cloud dashboard, and name your DB
agentic-search (or any name of your choice). If you’re a first time user, you will be greeted with the “Create Database” modal after creating your account.3
Choose the “Load sample dataset” option, and then choose the BrowseCompPlus dataset. This will copy the data into a collection in your own Chroma DB.
4
Once your collection loads, choose the “Settings” tab. On the bottom of the page, choose the
.env tab. Create an API key, and copy the environment variables you will need for running the project: CHROMA_API_KEY, CHROMA_TENANT, and CHROMA_DATABASE.5
Clone the Chroma Cookbooks repo:
6
Navigate to the
agentic-search directory, and create a .env file at its root with the values you obtained in the previous step:7
To run this project, you will also need an OpenAI API key. Set it in your
.env file:8
This project uses pnpm workspaces. In the root directory, install the dependencies:
--provider: The LLM provider you want to use. Defaults to OpenAI (currently only OpenAI is supported).--model: The model you want the agent to use. Defaults togpt-4o-mini.--max-plan-size: The maximum query plan steps the agent will go through to solve the query. Defaults to 10. When set to 1, the query planning step is skipped.--max-step-iterations: The maximum number of tool-call interactions the agent will issue when solving each step. Defaults to 5.
Building the Agent
We built a simple agent in this project to demonstrate the core concepts in this guide. TheBaseAgent class orchestrates the agentic workflow described above. It holds a reference to
- An
LLMService- a simple abstraction for interacting with an LLM provider for getting structured outputs and tool calling. - A
promptsobjects, defining the prompts used for different LLM interactions needed for this workflow (for example, generating the query plan, evaluating it, etc.). - A list of
Tools that will be used to solve a user’s query.
QueryPlanner generates a query plan for a given user query. This is a list of PlanStep objects, each keeping track of its status (Pending, Success, Failure, Cancelled etc.), and dependency on other steps in the plan. The planner is an iterator that emits the next batch of Pending steps ready for execution. It also exposes methods that let other components override the plan and update the status of completed steps.
The Executor solves a single PlanStep. It implements a simple tool calling loop with the LLMService until the step is solved. Finally it produces a StepOutcome object, summarizing the execution, identifying candidate answers and supporting evidence.
The Evaluator considers the plan and the history of outcomes to decide how to proceed with the query plan.
The SearchAgent class extends BaseAgent and provides it with the tools to search over the BrowseComp-Plus collection, using Chroma’s Search API. It also passes the specific prompts needed for this specific search task.