Migrating projects to CAILA
Projects that were created based on patterns or phrase examples can be migrated to the CAILA NLU kernel.
All the CAILA features will be available for the project: intent recognition, system and user-defined entities, slot markup and filling, CAILA API, etc.
We will cover the process of migrating projects to CAILA in this article:
- Editing the project configuration file.
- Migrating NLU settings.
- Configuring activation rules.
- Testing and updating the script.
Configuration file
Specify the chatbot.yaml parameters in the configuration file:
botEngine: v2
language: ru
sts:
noMatchThreshold: 0.2
caila:
noMatchThreshold: 0.2botEngine– classifier type; specifyv2to use CAILA.language– specify the language for compatibility with the STS classifier.noMatchThreshold– sets the minimum required similarity of the phrase to one of the classes. We have empirically determined that the optimal value for this parameter is 0.2 in the process of the NLU service development.
Please note that we also set the noMatchThreshold value for the STS classifier. This is required for the project’s backward compatibility.
Migrating NLU settings
Most NLU settings in chatbot.yaml become inactive when migrated to CAILA. Some of the settings can be defined when you configure the project in the NLU settings section.
Let us have a look at the changes in a sample project with an STS classifier:
nlp:
morphology: myStem
tokenizer: myStem
vocabulary: common-vocabulary.json
lengthLimit:
enabled: true
symbols: 400
words: 100000
timeLimit:
enabled: true
timeout: 10000
spellcheck:
enabled: true
dictionary: dict.txt
frequency: frequency.txt
minWordLengthForEditDistance: 3
maxWordEditDistance: 0
speller:
dictionary: speller.dict
classifier:
enable: true
engine: sts
noMatchThreshold: 0.2
parameters:
algorithm: aligner2morphology– parameter is inactive; the text is marked up in CAILA. Can be overridden in advanced NLU settings.tokenizer– parameter is inactive; the text is marked up in CAILA. Can be overridden in advanced NLU settings.vocabulary– parameter is inactive, cannot be overridden.timeLimit– parameter is only active for theq,e,egtags. Theintenttag is set to a default value, cannot be overridden.enabled– parameter is inactive.timeout– parameter is inactive.
spellcheck– spellchecker module, the parameter is inactive. The format of the dictionary is not compatible with CAILA. Create and upload a dictionary in the new format.speller– new format spellchecker module, the parameter is inactive. The format of the dictionary is compatible with CAILA. You can use the CAILA API to upload a dictionary.classifer– parameter is only active for theq,e,egtags. Theintenttag is set to a default value, cannot be overridden.noMatchThreshold– parameter is inactive, thests.noMatchThresholdparameter is used instead.
Spellchecker module
The built-in spellchecker module can be used to correct spelling errors in client requests. If can be used in combination with a user-defined dictionary. This way, the project dictionary will be used to correct the words from the domain scope and the global module will be used for other words.
If you used a .dict dictionary before, you can migrate it to your project using the CAILA API Direct.
Tokenization and lemmatization
The udpipe tokenizer is used for projects in Russian by default. Tests have shown this is the best tokenization and lemmatization solution.
If your project was created using patterns or an STS classifier, we recommend that you use the morphsrus or mystem tokenizer.
NLU advanced configuration parameters
Switch to project editing. Specify advanced configuration parameters here: NLU language, classifier algorithm, timezone, NLU settings.
Configuring activation rules
You can use patterns, phrase examples from an STS classifier and the CAILA classifier in combination to detect client intent. Specify the state triggering mechanism for the combined use of intents, patterns and example groups in the bot script.
Learn more about the activation rule mechanism
Example dictionary
If you used an STS classifier in your project, you can migrate your example dictionary to the updated project.
Open the Intents page. Click Import at the top of the intents tree > upload the .json file.
CatchAll
Note that if the NLU service is used in combination with patterns and classifier phrase examples, the following CatchAll is not used:
state: CatchAll
q!: *
a: I did not get itUse event: noMatch for user requests not processed by your script:
state: CatchAll
event: noMatch
a: You said: {{ $request.query }}
Testing and updating the script
Use the test widget built in the script editor to debug your script.
We recommend using the intent activation rule, system and custom entities, slot filling and other features of the CAILA NLU kernel to further update your script.