Agentic Evaluation

Prev Next
Document Number Revision Number Revision Date
KN.GU.57.EN Rev7 10.04.2026

Agentic Evaluation

The Agentic Evaluation feature enables automated quality evaluations of agent conversations using an AI Agent Evaluator (AE) powered by large language model (LLM) capabilities. By integrating a pre-configured VA project with the CA evaluation pipeline, the system can automatically assess conversations against defined QA form criteria — without requiring manual evaluator intervention.
Unlike rule-based automatic evaluations, this approach leverages LLM reasoning to interpret conversation transcripts in context, providing answers, confidence scores, and explanatory comments for each evaluation question. This makes the AI Evaluator particularly effective for open-ended or nuanced quality criteria.

Note: AI Evaluator is available exclusively in Premium AQM editions.

Step-by-Step Configuration

1. Configure VA Project for Agentic Evaluation

Create a Virtual Agent (VA) project to be used for evaluation, or use an existing project. The VA project must be configured with:

  • Instructions that define how the agent should evaluate conversations (e.g., scoring criteria, tone, expected behaviors).
  • (Optional) Knowledge Base documents containing the QA criteria and guidelines the agent will use as reference during evaluation.

The VA project receives conversation transcripts and form questions, reasons over them using its configured instructions and knowledge base, and returns structured evaluation responses.

Sample Flow

  1. Navigate to Virtual Agent → Projects and create a new project or open an existing one (e.g. CAEvaluator).
    Image
  2. When creating the project, configure the following:
    Image
  3. Go to Knowledge Bases → New Knowledge Base and configure the following sections:
    Image
    Image
    Image
    Image
    Image
  4. Upload the QA evaluation documents under Documents:
    Image
  5. Go to AI Agents and open or create the agent for the project
    Image
    Image
    Image
  6. Give Instruction to AI Agent
    Example:
    # EVALUATION AGENT You are an objective evaluator that scores contact center conversations against KB standards. ## WORKFLOW 1. Parse Input: Extract form_id, conversation_id, transcript, and questions 2. For Each Question: - Call KB1 with: [question_text] + [hint] + "criteria" - Find evidence in transcript - Assign answer based on KB criteria - Write brief comment 3. Return ONLY with valid, minified JSON. Do not include markdown code blocks, backticks, explanations (any additional text), newlines within strings, or escaped quotes unless they are part of the actual data value. ## ANSWER RULES Format by Type: - YesNo: "order1" or "order2" | "answer": "order1" | - Multiple_Choice: "order1", "order2", or "order3"| "answer": "order2" | - CheckBoxes: ["order1", "order2"] | "answer": ["order1", "order3"] | Selection: - KB criteria met → order1 (Yes/highest) - KB criteria not met → order2 (No/lowest) - No evidence → null, confidence "low", flag for review ## CONFIDENCE - high: Clear evidence + clear KB criteria - medium: Evidence needs interpretation OR KB has gray areas - low: Missing/unclear evidence → FLAG FOR REVIEW ## COMMENT (200-250 chars) Comment Structure: [what was perfect] + [Evidence] + [KB reference] [What was missing/incorrect] + [Evidence] + [provide recommendation] + [KB reference] Examples: - "✓ Proper greeting with name. KB: Greeting Standards" - "✗ No identity verification. Required per KB: Auth Procedures" - "Name used 2x (need 3x). KB: Personalization" ``` ## OUTPUT (EXACT FORMAT) json{ "request_id": "[conversation_id]", "evaluation_status": "completed", "processing_metadata": { "agent_id": "CA_Evaluator", "agent_version": "2.0", "processing_time_ms": 0, "evaluation_timestamp": "[ISO8601]", "kb_documents_accessed": ["[KB sections]"] }, "evaluation_results": [{ "questions": [{ "question_id": "Q1.1", "question_text": "[text]", "question_type": "YesNo", "answer": "order1", "confidence": "high", "comment": "[comment]", } }] }], "flagged_for_human_review": false } ## CRITICAL RULES - Same evidence = same answer (be deterministic) - No assumptions - only transcript evidence - If confidence="low" → flag for review - Return ONLY JSON, no extra text ---
  7. Under Tools → + (Add Tool), add a Knowledge Base tool with the following settings:
    Image
  8. Add AI Agent node to workflow and click save and close
    Image
  9. Lastly publish the project
    Image

2. Enter AI Evaluator Settings in CA Tenant

To connect the Agentic AI project with the CA evaluation pipeline, navigate to:

Administration → Settings → Generative AI → AI Evaluator
Fill in the following fields:
Image
The Project Name field is used to match the CA assignment with the correct VA project at runtime.

3. Create an AI Evaluator Assignment

Once the configuration is complete, navigate to AQM → Assignments and create a new assignment.

At the top of the assignment creation page, select the evaluator type:

  • Human — uses the existing assignment flow with human evaluators
  • AI Evaluator — activates the AI-powered evaluation pipeline

When AI Evaluator is selected, the configured project is pre-selected automatically.
Configure the following fields:

Image

Note: Only manual forms are supported for AI Evaluator assignments. Rule-based form logic is applied on the CA side after receiving the VA response.

Work & Data Flow

Once an assignment is saved, the system operates fully automatically:

  1. A scheduled background job runs daily within a configured time window.
  2. The system selects eligible conversations based on assignment criteria (target users, category filter, max daily limit).
  3. For each conversation, an access token is obtained from the configured Identity server (refreshed every 24 hours).
  4. The conversation transcript and form questions are sent to the VA Sync API.
  5. The VA processes the transcript against its instructions and knowledge base, and returns evaluation responses.
  6. Responses are converted into CA evaluation results and saved to the database.
  7. Results appear in Evaluation Results and are reflected in the Coaching Dashboard → Quality tab.

Request & Response Format

Example request for VA Synch API Call:

curl --location --request POST 'https://sandbox.web.cai.demo.sestek.com/magpie/ext-api/messages/synchronized' \
--header 'Content-Type: application/json' \
--header 'Project: CAEvaluator' \
--header 'Authorization: Bearer <YOUR_TOKEN>' \
--header 'X-Conversation-Id: conv_20251120_cs_00212334567' \
--data-raw '{
  "text": "
# FORM INFO
form_id: QA_Form_CS_v2024
form_name: Müşteri Hizmetleri Kalite Değerlendirme Formu
conversation_id: conv_20251120_cs_00234

# TRANSCRIPT
Agent:    hello hello this is carlos from power telecom i would like to tell you about our plan hi sir i would like to tell you about our new sales if you have a few minutes
Customer: hello okay i guess i have a minute what is
Agent:    perfect the plan includes unlimited talk text data and nationwide coverage for only fifty dollars a month
Customer: sixty dollars my current plan is forty five dollars so the price is not
Agent:    we have the best coverage and signal nationwide sir no weak signals
Customer: okay but i think i will pass on that i am happy with my current plan and i dont want to make another commitment what will bind me for years you understand what im zain
Agent:    the sale end the sale ends today i think its a great deal you will have unlimited talk text and data for only fifty dollars a month
Customer: i will pass on your offer im not interested i will call you if i decide to do so
Agent:    okay sir sorry for the inconvenience please feel free to call me back if there is anything i can help you with
Customer: okay have a good day
Agent:    you too sir i appreciate your time bye bye

# QUESTIONS

Q1.1: Avoid Distractions?
Type: Multiple_Choice
Options:
  - order1: Full Points
  - order2: Partial Points
  - order3: Zero Points
Hint: (none)

Q1.2: Maintained Friendly, Professional Tone?
Type: Multiple_Choice
Options:
  - order1: Full Points
  - order2: Partial Points
  - order3: Zero Points
Hint: (none)

Q1.3: Offer Further Assistance & Close Call Politely?
Type: Multiple_Choice
Options:
  - order1: Full Points
  - order2: Partial Points
  - order3: Zero Points
Hint: (none)

Q1.4: Avoid Distractions / Maintained Friendly, Professional Tone
Type: CheckBoxes
Options:
  - order1: Agent Avoided Distractions
  - order2: Agent Maintained Friendly, Professional Tone
Hint: (none)

Q2.1: Provide Correct Information?
Type: YesNo
Options:
  - order1: Yes
  - order2: No
Hint: (none)
  ",
  "type": "message",
  "channelId": "IVR",
  "conversation": {
    "id": "conv_20251120_cs_00212334567"
  },
  "attachments": [],
  "channelData": {
    "ResponseType": "Text"
  }
}'

Example response from VA Synch API Call:

{
  "request_id": "conv_20251120_cs_00234",
  "evaluation_status": "completed_partial_form",
  "processing_metadata": {
    "agent_id": "CAEvaluator",
    "agent_version": "1.0",
    "processing_time_ms": 0,
    "evaluation_timestamp": "2026-01-29T00:00:00Z",
    "kb_documents_accessed": [
      "QA_Form_CS_v2024 rubric snapshot (Communication Skills/Customer Relations/Work Quality)",
      "Customer Relations Monitoring Definitions 2023-24 (Avoid Distractions; Tone; Closing standards)"
    ]
  },
  "evaluation_results": [
    {
      "questions": [
        {
          "question_id": "Q1.1",
          "question_text": "Avoid Distractions?",
          "question_type": "Multiple_Choice",
          "answer": "order1",
          "confidence": "medium",
          "comment": "✓ Dikkat dağınıklığına dair kanıt yok. ✗ Ekran/hold vb. gözlenemiyor; kayıt sınırlı. KB: Avoid Distractions.",
          "evidence": {
            "conversation_quotes": [
              "Customer: hello okay i guess i have a minute what is",
              "Agent: perfect the plan includes unlimited talk text data..."
            ],
            "kb_sections_referenced": [
              "Customer Relations Monitoring Definitions 2023-24: Avoid Distractions"
            ]
          }
        },
        {
          "question_id": "Q1.2",
          "question_text": "Maintained Friendly, Professional Tone?",
          "question_type": "Multiple_Choice",
          "answer": "order2",
          "confidence": "high",
          "comment": "✓ Kibar kapanış var: 'sorry... feel free to call me back'. ✗ Başlangıçta zayıf profesyonellik: 'okay i guess i have a minute'. KB: Friendly/Professional Tone.",
          "evidence": {
            "conversation_quotes": [
              "Agent: hello okay i guess i have a minute what is",
              "Agent: okay sir sorry for the inconvenience please feel free to call me back..."
            ],
            "kb_sections_referenced": [
              "QA_Form_CS_v2024: Maintain Friendly, Professional Tone",
              "Customer Relations Monitoring Definitions 2023-24: Tone expectations"
            ]
          }
        },
        {
          "question_id": "Q1.3",
          "question_text": "Offer further Assistance & Close call Politely",
          "question_type": "Multiple_Choice",
          "answer": "order2",
          "confidence": "high",
          "comment": "✓ Yardım teklif etti: 'please feel free to call me back if there is anything i can help you with'. ✗ Standart soru yok ('anything else...?'). KB: Closing.",
          "evidence": {
            "conversation_quotes": [
              "Agent: okay sir sorry for the inconvenience please feel free to call me back if there is anything i can help you with",
              "Agent: you too sir i appreciate your time bye bye"
            ],
            "kb_sections_referenced": [
              "Customer Relations Monitoring Definitions 2023-24: Offer further assistance & close politely"
            ]
          }
        },
        {
          "question_id": "Q1.4",
          "question_text": "Avoid Distractions/Maintained Friendly, Professional Tone",
          "question_type": "CheckBoxes",
          "answer": [
            "order1",
            "order2"
          ],
          "confidence": "medium",
          "comment": "✓ order1: Dikkat dağınıklığı kanıtı yok. ✓ order2: Genel olarak kibar dil; ancak başlangıç ifadesi geliştirilir. KB: Avoid Distractions; Tone.",
          "evidence": {
            "conversation_quotes": [
              "Agent: hello okay i guess i have a minute what is",
              "Agent: ...sorry for the inconvenience..."
            ],
            "kb_sections_referenced": [
              "Customer Relations Monitoring Definitions 2023-24: Avoid Distractions",
              "QA_Form_CS_v2024: Maintain Friendly, Professional Tone"
            ]
          }
        }
      ]
    }
  ],
  "flagged_for_human_review": false
}