Token Endpoint:
URL: {{baseUrl}}/published/:tenantName-modelId/v1/text/question-answering/inference
HTTP Method: POST
Content-Type: application/json
Here's an example of a request:
curl --location '{{baseUrl}}/published/:tenantName-modelId/v1/text/question-answering/inference' \
--header 'Content-Type: application/json' \
--header 'Authorization: Bearer <token>' \
--data '{
"prompt": "Use the following pieces of context to answer the user's question. Keep your answer concise and adhere only to the information contained in the text. The answer must consist of a maximum of 20 words.",
"inputs": [
{
"query": "",
"chat_history": [
[
{
"role": "user",
"content": ""
},
{
"role": "assistant",
"content": ""
}
]
],
"use_semantic_cache": true,
"semantic_cache_threshold": 0.95
}
]
}'
Expected Response:
{
"result": [
{
"answer": "answer",
"standalone_question": "question",
"docs": [
{
"source": "file.txt",
"score": 0.20298055250630587,
"content": "content"
}
],
"logs": {
"openai_duration": 2.017857313156128,
"full_duration": 2.024966239929199
},
"cache_hit": {
"question": "cached question",
"score": 1
}
}
