Connecting the classifier to the bot Deprecated
Please note that the use of the classifier by examples has been deprecated since version 1.10.0. Learn more about migrating projects to CAILA.
Do the following to connect the classifier to the bot:
- Specify parameters for the platform’s nlp functions in the configuration file of the
chatbot.yamlbot:
morphology
Can be used to select the library for morphological analysis of words. Is used for pattern processing in ~, $lemma, $morph, and in the $nlp.parseMorph function.
Specify one of the libraries:
aot— the AOT.ru library is used.myStem— the myStem utility is used.pyMorphy— the pyMorphy library is used, which is the best analyzer for Russian.
tokenizer
You can use the tokenizer to specify rules to be used to split the text into words.
Supported tokenizer types
regexp— a simple tokenizer based on regular expressions.srx— a configurable tokenizer based on customizable segmentation rules. When this tokenizer is specified, you also need to specify a grammar file in thesrxPathparameter.myStem— segmentation via the myStem utility. The preferred tokenizer to be used with patterns and classifier.
vocabulary
Word weight dictionary for pattern ranking. Default: common-vocabulary.json.
lengthLimit, timeLimit
Can be used to modify the limits on the incoming message size and the nlp module processing time.
Default parameters:
nlp:
lengthLimit:
enabled: true
symbols: 400
words: 100000
timeLimit:
enabled: true
timeout: 10000For lengthLimit:
symbols— sets the limit on the number of symbols in an incoming message. When this limit is exceeded, thelengthLimitevent is triggered which can be processed in the bot script by theevent: lengthLimittag.words— sets the limit on the number of words in an incoming message. When this limit is exceeded, thelengthLimitevent is triggered which can be processed in the bot script by theevent: lengthLimittag.
Please note when you set the limit that the words counter treats the !,.:;?"'()*/[\]{|} symbols as words.
For timeLimit:
timeout— sets the maximum request processing time (in milliseconds) for the nlp module. When this limit is exceeded, thetimeLimitevent is triggered which can be processed in the bot script by theevent: timeLimittag.
Example of an nlp module:
nlp: // platform’s nlp function parameters
morphology: myStem // library for morphological analysis of words
tokenizer: myStem // tokenizer, specifies rules to be used to split the text into words
vocabulary: common-vocabulary.json // word weight dictionary for pattern ranking
lengthLimit:
enabled: true
symbols: 400 // limit on the number of symbols in an incoming message
words: 100000 // limit on the number of words in an incoming message
timeLimit:
enabled: true
timeout: 10000 // maximum request processing time (in milliseconds) for the nlp module- Next, specify classification parameters:
engine
Classifier type (sts by default).
noMatchThreshold
The lower similarity threshold under which phrases are to be considered different. It was determined empirically in the course of classifier development that the optimum value of this parameter is 0.2.
parameters: algorithm
The type of the classification algorithm used. match-aligner is used, which is the primary type of an sts classifier. You can also use aligner and aligner2, which is an alternative implementation of the classification algorithm.
- Next, configure the classifier algorithm. All parameters are set to their default values. You only need to specify the weight dictionary which is identical to the dictionary specified in the
nlpblock. Default:common-vocabulary.json.
Here is an example of a chatbot.yaml configuration file with a classifier connected to it:
name: demo
entryPoint:
- main.sc
tests:
exclude:
- tests.xml
messages:
onError:
defaultMessage: Oops, something has gone wrong.
locales:
ru: Oops, something has gone wrong.
nlp: // platform’s nlp function parameters
morphology: myStem // library for morphological analysis of words
tokenizer: myStem // tokenizer, specifies rules to be used to split the text into words
vocabulary: common-vocabulary.json // word weight dictionary for pattern ranking
lengthLimit:
enabled: true
symbols: 400 // limit on the number of symbols in an incoming message
words: 100000 // limit on the number of words in an incoming message
timeLimit:
enabled: true
timeout: 10000 // maximum request processing time (in milliseconds) for the nlp module
classifier: // classifier parameters
enable: true
engine: sts // classifier type
noMatchThreshold: 0.2
parameters:
algorithm: aligner2 // classification algorithm
aligner:
vocabulary: common-vocabulary.json
exampleGroups:
- src/dictionaries/examples.json // is specified when a group of examples is used