Reranking

AI Agents use Reranking to improve the quality of answers delivered to your customers by applying an additional filtering step after the initial document retrieval phase. Reranking ensures that only the most relevant content makes it to the generating stage by evaluating each candidate document against the query and rearranging them by genuine relevance, as opposed to sending the initial batch of received documents straight to the LLM. As a result, all knowledge base content types receive more precise, targeted, and contextually relevant responses.

How it works

Supported Providers

Reranking is supported exclusively through Azure and vLLM (via Sestek). Other LLM providers are not currently supported for the reranking step.

Reranking operates as a two-phase pipeline that sits between the initial retrieval step and the LLM;

Expanded retrieval is the first phase. In comparison to ordinary retrieval, the system retrieves twice as many documents when Reranking is enabled. Since the first retrieval phase only uses embedding similarity, this enlarged pool guarantees that highly relevant documents are not overlooked.

Re-scoring and filtering is the second phase. A specialized reranking model receives the recovered candidates and assesses each document separately in comparison to the initial query. The reranking methodology generates an improved relevance score by directly evaluating the relevance between the query and each document, in contrast to embedding-based retrieval. After that, the top k documents, which correspond to the initial configured count, are chosen and sent to the language model so that responses can be generated.

Reranking

How it works

Reranking Architecture