Send Inference Request

Token Endpoint:

URL: {{baseUrl}}/published/:tenantName-modelId/v1/text/question-answering/inference
HTTP Method: POST
Content-Type: application/json

Here's an example of a request:

curl --location '{{baseUrl}}/published/:tenantName-modelId/v1/text/question-answering/inference' \
--header 'Content-Type: application/json' \
--header 'Authorization: Bearer <token>' \
--data '{
  "prompt": "Use the following pieces of context to answer the user's question. Keep your answer concise and adhere only to the information contained in the text. The answer must consist of a maximum of 20 words.",
  "inputs": [
    {
      "query": "",
      "chat_history": [
        [
          {
            "role": "user",
            "content": ""
          },
          {
            "role": "assistant",
            "content": ""
          }
        ]
      ],
      "use_semantic_cache": true,
      "semantic_cache_threshold": 0.95
    }
  ]
}'

Expected Response:

{
  "result": [
    {
      "answer": "answer",
      "standalone_question": "question",
      "docs": [
        {
          "source": "file.txt",
          "score": 0.20298055250630587,
          "content": "content"
        }
      ],
      "logs": {
        "openai_duration": 2.017857313156128,
        "full_duration": 2.024966239929199
      },
      "cache_hit": {
        "question": "cached question",
        "score": 1
      }
    }