Sestek's Advanced Conversation Routing (ACR) seamlessly combines traditional Natural Language Understanding (NLU) with modern agentic AI in a true hybrid architecture, giving customers the best of both worlds without forcing a paradigm choice. This intelligent routing ensures deterministic accuracy and control for transactional or compliance-critical scenarios, while harnessing generative AI for richer, more natural, and empathetic conversations when flexibility is required. The result is an adaptive framework that delivers enterprise-grade reliability at scale, without compromising the quality of the user experience.
How the Routing Works
Every user utterance first passes through Sestek’s proprietary NLU engine. What happens next depends on the NLU’s confidence score:
- Confidence = 1 (exact match): The system routes directly to the mapped intent action. This path is fast, deterministic, and predictable — exactly what traditional NLU customers expect.
- Confidence < 1 (no exact match): In this case, the NLU engine’s top N candidate intents (with their 5–10 sample utterances each) are forwarded to an LLM, along with an explicit “out-of-scope” option, so the LLM can make an intelligent routing decision based on the candidates NLU has already identified.
The LLM then makes one of three decisions:
- Confident match to a candidate intent → execute that intent’s action.
- Determines the request is truly out of scope → trigger fallback action.
- Cannot match but the request seems valid → ask a clarifying question.
After a clarifying question, the user’s response is concatenated with the original utterance and sent back to NLU. The cycle continues, but a “no-match counter” caps clarification attempts at two to prevent infinite loops and user frustration.
.png?sv=2022-11-02&spr=https&st=2026-04-22T10%3A52%3A55Z&se=2026-04-22T11%3A06%3A55Z&sr=c&sp=r&sig=NOAuaZXQ8oPHteRq3gmMhEOIgxBaKV69PUGbHnPTm04%3D)
Why This Matters
Traditional NLU classifiers tend to force-match every input to an existing intent — even when the input is ambiguous or completely out of scope. This leads to incorrect routing and a poor user experience, as the system confidently executes the wrong action instead of acknowledging uncertainty.
ACR's hybrid approach addresses this by leveraging the LLM's reasoning ability as a second layer. Rather than a single "Sorry, I didn't understand" after one failed attempt, the system genuinely tries to disambiguate — and when the input truly doesn't match any intent, it correctly identifies it as out of scope instead of forcing a wrong match.
Benchmark Results
| Model | Total Accuracy | In Scope Accuracy | Out of Scope Recall |
|---|---|---|---|
| MLP Only (threshold=0.9) | 0.862 | 0.954 | 0.716 |
| Hybrid + GPT-4.1 | 0.954 | 0.990 | 0.903 |
| Hybrid + GPT-4.1-Mini | 0.932 | 0.995 | 0.839 |
| Hybrid + Gemini-2.5-Flash | 0.960 | 0.995 | 0.907 |
| Hybrid + GPT-OSS-120B | 0.955 | 0.994 | 0.898 |
Benchmark details: "MLP Only" refers to Sestek's proprietary Multi-Layer Perceptron based NLU classifier used as the baseline. The test was conducted on a production dataset. The most notable improvement is in Out-of-Scope Recall — the system's ability to correctly identify inputs that don't belong to any defined intent — which jumps from 0.716 to 0.907 with the hybrid approach.
Latency Considerations
In voice scenarios, response time is critical. ACR is designed to add minimal overhead:
- Exact match (confidence = 1): No LLM call is made. Latency is identical to a pure NLU system — typically under 50ms.
- LLM-assisted routing (confidence < 1): An additional LLM round-trip is introduced. Typical ACR response times range from 300–800ms depending on the selected model and provider. A configurable timeout (default: 3 seconds) ensures the system never blocks indefinitely — if the LLM doesn't respond in time, the system falls back to NLU-only behavior.
- Clarification turns: Each clarification adds one additional NLU + LLM cycle. With the default cap of 2 clarification attempts, the worst-case scenario is 3 total LLM calls per interaction.
In production deployments, the LLM-assisted path is triggered for only a subset of utterances (those where NLU is not fully confident), so the average latency impact across all interactions remains low.
AI Agent Escalation Within Hybrid Flows
Within this hybrid architecture, an intent action can trigger a full AI Agent. When ACR detects a complex intent — such as "pay my bill" or "cancel my service" — that cannot be handled by deterministic flows, it routes to a specialized AI Agent with tool-calling capabilities.
The escalation flow works as follows:
- ACR routes to an AI Agent intent — either via direct NLU match or LLM-assisted classification.
- The AI Agent takes over the conversation — managing multi-turn dialogue, calling backend APIs (e.g., payment processing, account lookup), and completing the task autonomously.
- Upon task completion, the AI Agent hands control back to the main NLU routing loop — the user's next utterance re-enters the ACR pipeline from the beginning, ensuring seamless transition between agent-handled tasks and NLU-driven flows.
This architecture allows customers to gradually introduce AI Agents for complex use cases while keeping simple, high-volume intents on fast deterministic paths.
Customer Configuration
Customers influence routing behavior through several levers:
- Number of Top Intents to Use for ACR The number of candidate intents forwarded to the LLM (default: top 5). Increasing this improves classification accuracy for ambiguous inputs but increases token consumption and latency.
- Utterances per intent: The number of sample utterances per intent forwarded to the LLM (default: top 10). More utterances give the LLM better context but increase prompt size.
- LLM request timeout: Maximum wait time for the ACR LLM call (default: 3 seconds). If exceeded, the system falls back to NLU-only classification.
- Clarification attempt limit: Maximum number of clarifying questions before triggering fallback (default: 2).
- Intent-to-action mapping: Customers choose which intents map to deterministic flow actions versus AI Agent escalations.
- LLM provider and model selection: The LLM provider and model used for ACR can be configured per project.
- Fallback behavior: Fallback actions and escalation paths are fully configurable per project.

