Send Inference Request

Text Clustering Inference

This step involves utilizing the trained model to generate predictions on new data.

The clustering published pod is automatically timed out based on the latest_request_time parameter. When the defined inactivity period is exceeded without receiving inference requests, the pod enters timeout status. To restore the pod, check the model readiness status and initiate a training request with the appropriate forceTrain parameter.

When an inference request is submitted to a timed-out pod, the response will return 503 - Published pod is currently unavailable due to inactivity timeout.

Possible Scenarios

This section explains how the system behaves depending on whether training and inference settings are provided. The logic is divided into two main cases:

✅ Case 1: Training Settings Not Provided
If the training settings endpoint is not called, the system follows default (legacy) behavior. It will try to infer what to use based on the llm value in the request or fall back to built-in defaults.

Scenario	What Happens
`llm = ""` or `null` in training request	The system assumes no LLM is required and uses BERT clustering by default.
`llm = openai` in training request	The system uses the default OpenAI configuration, which includes a hardcoded API key.

✅ Case 2: Training Settings Provided
If training settings are submitted via training endpoint, the system uses these settings as the default provider configuration—unless inference settings are separately provided.

Scenario	What Happens
Only training settings are provided	The system uses the same provider settings for both training and inference. The client does not need to include `providerSettings` in the inference request.
Different training and inference settings are provided	The system uses training settings for training and overrides them with the inference settings during inference. The client must include the desired `providerSettings` in the inference request to customize inference behavior independently.

"providerSettings": {  
        "provider": "", 
        "OPENAI_API_KEY": "string",
        "AZURE_API_KEY": "string",
        "AZURE_API_BASE": "string",
        "AZURE_API_VERSION": "string",
        "AWS_ACCESS_KEY_ID": "string",
        "AWS_SECRET_ACCESS_KEY": "string",
        "AWS_REGION_NAME": "string",
        "GEMINI_API_KEY": "string",
        "TOGETHERAI_API_KEY": "string",
        "GROQ_API_KEY": "string"
    }

Important Note

Supported providers: openai, azure, bedrock, gemini, together_ai, groq, vllm, ""

Token Endpoint:

URL: {{baseUrl}}/published/:tenantName-modelId/v1/text/clustering/inference/dialogue
HTTP Method: POST
Content-Type: application/json

Here's an example of a request:

curl --location '/published/:tenantName-modelId/v1/text/clustering/inference/dialogue' \
--header 'Content-Type: application/json' \
--data '{
    "inputs": [
        {
            "dialogue": {
                "id": "string",
                "version": "1.0.0",
                "metadata": {},
                "participants": [
                    {
                        "id": "string",
                        "role": "agent"
                    }
                ],
                "transcript": [
                    {
                        "id": "string",
                        "participantId": "agent",
                        "content": "string"
                    }
                ]
            }
        }
    ],
    "textNormalizationConfiguration": {
        "language": "en-US",
        "lowerCase": false,
        "spellCorrection": false,
        "stemming": false,
        "removeStopwords": false,
        "removePunctuation": false
    },
    "language": "en-US",
    "model": "",
    "providerSettings": {  
        "provider": "", 
        "OPENAI_API_KEY": "string",
        "AZURE_API_KEY": "string",
        "AZURE_API_BASE": "string",
        "AZURE_API_VERSION": "string",
        "AWS_ACCESS_KEY_ID": "string",
        "AWS_SECRET_ACCESS_KEY": "string",
        "AWS_REGION_NAME": "string",
        "GEMINI_API_KEY": "string",
        "TOGETHERAI_API_KEY": "string",
        "GROQ_API_KEY": "string"
    }
}'

Expected Response:

{
  "results": [
    {
      "dialogueId": "string",  
      "topPrediction": {  // The cluster that best matches new data.
        "clusterId": "string",  // A unique number to identify the cluster.
        "name": "string",  // Automatically generated name for the cluster.
        "confidence": 0,
        "keyphrases": [   
          {
            "original": "string",  // Primary words characterizing the cluster.
            "stemmed": "string",
            "weight": 0
          }
        ]
      },
      "predictions": [
        {
          "clusterId": "string",
          "name": "string",
          "confidence": 0,
          "keyphrases": [  // List of phrases associated with the cluster match.
            {
              "original": "string",
              "stemmed": "string",
              "weight": 0
            }
          ]
        }
      ],
      "dialogueTopic": ""
    }
  ]
}