Once the availability of the dependency modules is successfully verified, model training process can be initiated. For each training, unique modelId parameter must be assigned, and training data must be sent in JSON format.
- Effective from v1.70, new training requests with the same
modelIdwhile a model training is already in progress will return the error message: "The training is still in progress for the same model ID: modelId" - Effective from v1.69,
modelIdparameter can only contain lowercase letters (a-z), numbers (0-9), and hyphens ("-"). Hyphens ("-") are not allowed at the beginning and end of the text.
- Using a different
modelIdwhile retraining a model will lead to a change in the endpoint used for inference. - Please ensure that the
modelIddoes not include any Turkish characters and does not exceed 63 characters.
Endpoint:
The following URL for sending training request will be deprecated starting January 1, 2025:
{{baseUrl}}/sestekai-api/api/app/external-trainings/train
Please use the external URL for sending training request provided below.
URL: {{baseUrl}}/sestekai-api/api/external/trainings/train
HTTP Method: POST
Content-Type: multipart/form-data
The access token acquired for the training must be included in the request header.
Text Clustering Training
A new semi-supervised text clustering feature has introduced that combines labeled and unlabeled data for better model performance. You can now use the newly added Intent as data schema options for clustering additional to Dialogue. The new structure supports multiple data schemas in a single training request, including "dialogue," "dialogue & intent," and "intent."
The intent data schema refers to the model output trained with dialogue data.
Clustering training endpoint has been updated to accept optional parameters secondaryDataSchema and secondaryDataFile for sending dialogue and intent data schemas and files in the same request.
The forceTrain parameter determines whether to retrain the model or redeploy the existing model:
| Response | Description |
|---|---|
false |
When published pod times out but clustering data has not changed since last training. Redeploy existing model without retraining |
true |
When clustering data has been updated since last training. Retrain model with updated data. |
1. Dialogue Data Schema
Here's an example of a request:
curl --location '{{baseUrl}}/sestekai-api/api/external/trainings/train' \
--header 'Authorization: Bearer <token>' \
--form 'modelId="\"unique id of the model\""' \
--form 'productName="<productName>"' \
--form 'languageCode="\"en-US\" | \"tr-TR\" | \"ar-SA\" | \"fr-FR\" | \"nl-NL\" | \"it-IT\" "' \
--form 'dataSchema="dialogue"' \
--form 'dataFile=@"/path/to/file"' \
--form 'numClusters="\"Number of required clusters. Set to \"0\" for model-driven determination\""' \
--form 'llm="\"openai\" // Optional parameter"'
--form 'forceTrain="true" \
Expected Response: 200 OK & Response Body: Experiment Object
2. Dialogue & Intent Data Schemas
Here's an example of a request:
curl --location '{{baseUrl}}/sestekai-api/api/external/trainings/train' \
--header 'Authorization: Bearer <token>' \
--form 'modelId="\"unique id of the model\""' \
--form 'productName="<productName>"' \
--form 'languageCode="\"en-US\" | \"tr-TR\" | \"ar-SA\" | \"fr-FR\" | \"nl-NL\" | \"it-IT\" "' \
--form 'dataSchema="dialogue"' \
--form 'dataFile=@"/path/to/file"' \
--form 'secondaryDataSchema="intent"' \
--form 'secondaryDataFile=@"/path/to/file"' \
--form 'numClusters="\"Number of required clusters. Set to \"0\" for model-driven determination\""' \
--form 'llm="\"openai\" // Optional parameter"'
--form 'forceTrain="true" \
Expected Response: 200 OK & Response Body: Experiment Object
3. Intent Data Schema
Here's an example of a request:
curl --location '{{baseUrl}}/sestekai-api/api/external/trainings/train' \
--header 'Authorization: Bearer <token>' \
--form 'modelId="\"unique id of the model\""' \
--form 'productName="<productName>"' \
--form 'languageCode="\"en-US\" | \"tr-TR\" | \"ar-SA\" | \"fr-FR\" | \"nl-NL\" | \"it-IT\" "' \
--form 'dataSchema="intent"' \
--form 'dataFile=@"/path/to/file"' \
--form 'numClusters="\"Number of required clusters. Set to \"0\" for model-driven determination\""' \
--form 'llm="\"openai\" // Optional parameter"'
--form 'forceTrain="true" \
Expected Response: 200 OK & Response Body: Experiment Object
Sometimes training might take longer than expected. This case is not rare when OpenAI integration is used. If you send the training request again for the same model-id before training is ended; you will receive 202 as response code. In this case you need to wait until training is completed. You can check the training status in next step.
