{
    "componentChunkName": "component---src-templates-doc-page-js",
    "path": "/docs/en/classificator/connection_classificator/",
    "result": {"data":{"site":{"siteMetadata":{"title":"Gatsby-doc-engine"}},"markdownRemark":{"id":"7f7499dd-a722-53b5-8ed2-cacea2c5398a","excerpt":"!> Please note that the use of the classifier by examples has been deprecated since version . Learn more about migrating projects to CAILA. Do the following to…","html":"<h1>Connecting the classifier to the bot <span class=\"tag-heading red\">Deprecated</span></h1>\n<hr>\n<p class='tip'>Please note that the use of the classifier by examples has been deprecated since version <code class=\"language-text\">1.10.0</code>. Learn more about <a href=/1.10.3/docs/en/NLU_core/project_migration  >migrating projects to CAILA</a>.</p>\n<p>Do the following to connect the classifier to the bot:</p>\n</br>\n<ol>\n<li>Specify parameters for the platform’s nlp functions in the configuration file of the <code class=\"language-text\">chatbot.yaml</code> bot:</li>\n</ol>\n<h5>morphology</h5>\n<p>Can be used to select the library for morphological analysis of words. Is used for pattern processing in <code class=\"language-text\">~</code>, <code class=\"language-text\">$lemma</code>, <code class=\"language-text\">$morph</code>, and in the <code class=\"language-text\">$nlp.parseMorph</code> function.</p>\n<p>Specify one of the libraries:</p>\n<ul>\n<li><code class=\"language-text\">aot</code> — the <a href=\"http://aot.ru/technology.html\" target=\"_blank\" rel=\"noopener noreferrer\">AOT.ru</a> library is used.</li>\n<li><code class=\"language-text\">myStem</code> — the <a href=\"https://yandex.ru/dev/mystem/\" target=\"_blank\" rel=\"noopener noreferrer\">myStem</a> utility is used.</li>\n<li><code class=\"language-text\">pyMorphy</code> — the pyMorphy library is used, which is the best analyzer for Russian.</li>\n</ul>\n</br>\n<h5>tokenizer</h5>\n<p>You can use the tokenizer to specify rules to be used to split the text into words.</p>\n<p>Supported tokenizer types</p>\n<ul>\n<li><code class=\"language-text\">regexp</code> — a simple tokenizer based on regular expressions.</li>\n<li><code class=\"language-text\">srx</code> — a configurable tokenizer based on <a href=\"https://en.wikipedia.org/wiki/Segmentation_Rules_eXchange\" target=\"_blank\" rel=\"noopener noreferrer\">customizable segmentation rules</a>. When this tokenizer is specified, you also need to specify a grammar file in the <code class=\"language-text\">srxPath</code> parameter.</li>\n<li><code class=\"language-text\">myStem</code> — segmentation via the myStem utility. The preferred tokenizer to be used with patterns and classifier.</li>\n</ul>\n</br>\n<h5>vocabulary</h5>\n<p>Word weight dictionary for pattern ranking. Default: <code class=\"language-text\">common-vocabulary.json</code>.</p>\n</br>\n<h5>lengthLimit, timeLimit</h5>\n<p>Can be used to modify the limits on the incoming message size and the nlp module processing time.</p>\n<p>Default parameters:</p>\n<div class=\"gatsby-highlight\" data-language=\"yaml\"><pre class=\"language-yaml\"><code class=\"language-yaml\"><span class=\"token key atrule\">nlp</span><span class=\"token punctuation\">:</span>\n  <span class=\"token key atrule\">lengthLimit</span><span class=\"token punctuation\">:</span>\n    <span class=\"token key atrule\">enabled</span><span class=\"token punctuation\">:</span> <span class=\"token boolean important\">true</span>\n    <span class=\"token key atrule\">symbols</span><span class=\"token punctuation\">:</span> <span class=\"token number\">400</span>\n    <span class=\"token key atrule\">words</span><span class=\"token punctuation\">:</span> <span class=\"token number\">100000</span>\n  <span class=\"token key atrule\">timeLimit</span><span class=\"token punctuation\">:</span>\n    <span class=\"token key atrule\">enabled</span><span class=\"token punctuation\">:</span> <span class=\"token boolean important\">true</span>\n    <span class=\"token key atrule\">timeout</span><span class=\"token punctuation\">:</span> <span class=\"token number\">10000</span></code></pre></div>\n<p>For <code class=\"language-text\">lengthLimit</code>:</p>\n<ul>\n<li><code class=\"language-text\">symbols</code> — sets the limit on the number of symbols in an incoming message. When this limit is exceeded, the <code class=\"language-text\">lengthLimit</code> event is triggered which can be processed in the bot script by the <code class=\"language-text\">event: lengthLimit</code> tag.</li>\n<li><code class=\"language-text\">words</code> — sets the limit on the number of words in an incoming message. When this limit is exceeded, the <code class=\"language-text\">lengthLimit</code> event is triggered which can be processed in the bot script by the <code class=\"language-text\">event: lengthLimit</code> tag.</li>\n</ul>\n<p class='tip'>Please note when you set the limit that the <code class=\"language-text\">words</code> counter treats the <code class=\"language-text\">!,.:;?\"'()*/[\\]{|}</code> symbols as words.</p>\n<p>For <code class=\"language-text\">timeLimit</code>:</p>\n<ul>\n<li><code class=\"language-text\">timeout</code> — sets the maximum request processing time (in milliseconds) for the nlp module. When this limit is exceeded, the <code class=\"language-text\">timeLimit</code> event is triggered which can be processed in the bot script by the <code class=\"language-text\">event: timeLimit</code> tag.</li>\n</ul>\n<p>Example of an nlp module:</p>\n<div class=\"gatsby-highlight\" data-language=\"yaml\"><pre class=\"language-yaml\"><code class=\"language-yaml\"><span class=\"token key atrule\">nlp</span><span class=\"token punctuation\">:</span>                                    // platform’s nlp function parameters\n  <span class=\"token key atrule\">morphology</span><span class=\"token punctuation\">:</span> myStem                    // library for morphological analysis of words\n  <span class=\"token key atrule\">tokenizer</span><span class=\"token punctuation\">:</span> myStem                     // tokenizer<span class=\"token punctuation\">,</span> specifies rules to be used to split the text into words\n  <span class=\"token key atrule\">vocabulary</span><span class=\"token punctuation\">:</span> common<span class=\"token punctuation\">-</span>vocabulary.json    // word weight dictionary for pattern ranking\n  <span class=\"token key atrule\">lengthLimit</span><span class=\"token punctuation\">:</span>\n    <span class=\"token key atrule\">enabled</span><span class=\"token punctuation\">:</span> <span class=\"token boolean important\">true</span>\n    <span class=\"token key atrule\">symbols</span><span class=\"token punctuation\">:</span> 400                        // limit on the number of symbols in an incoming message\n    <span class=\"token key atrule\">words</span><span class=\"token punctuation\">:</span> 100000                       // limit on the number of words in an incoming message\n  <span class=\"token key atrule\">timeLimit</span><span class=\"token punctuation\">:</span>\n    <span class=\"token key atrule\">enabled</span><span class=\"token punctuation\">:</span> <span class=\"token boolean important\">true</span>\n    <span class=\"token key atrule\">timeout</span><span class=\"token punctuation\">:</span> 10000                      // maximum request processing time (in milliseconds) for the nlp module</code></pre></div>\n</br>\n<ol start=\"2\">\n<li>Next, specify classification parameters:</li>\n</ol>\n<h5>engine</h5>\n<p>Classifier type (<code class=\"language-text\">sts</code> by default).</p>\n</br>\n<h5>noMatchThreshold</h5>\n<p>The lower similarity threshold under which phrases are to be considered different. It was determined empirically in the course of classifier development that the optimum value of this parameter is <code class=\"language-text\">0.2</code>.</p>\n</br>\n<h5>parameters: algorithm</h5>\n<p>The type of the classification algorithm used. <code class=\"language-text\">match-aligner</code> is used, which is the primary type of an sts classifier. You can also use <code class=\"language-text\">aligner</code> and <code class=\"language-text\">aligner2</code>, which is an alternative implementation of the classification algorithm.</p>\n</br>\n<ol start=\"3\">\n<li>Next, configure the classifier algorithm. All parameters are set to their default values. You only need to specify the weight dictionary which is identical to the dictionary specified in the <code class=\"language-text\">nlp</code> block. Default: <code class=\"language-text\">common-vocabulary.json</code>.</li>\n</ol>\n</br>\n<p>Here is an example of a <code class=\"language-text\">chatbot.yaml</code> configuration file with a classifier connected to it:</p>\n<div class=\"gatsby-highlight\" data-language=\"yaml\"><pre class=\"language-yaml\"><code class=\"language-yaml\"><span class=\"token key atrule\">name</span><span class=\"token punctuation\">:</span> demo\n\n<span class=\"token key atrule\">entryPoint</span><span class=\"token punctuation\">:</span>\n  <span class=\"token punctuation\">-</span> main.sc\n\n<span class=\"token key atrule\">tests</span><span class=\"token punctuation\">:</span>\n  <span class=\"token key atrule\">exclude</span><span class=\"token punctuation\">:</span>\n    <span class=\"token punctuation\">-</span> tests.xml\n\n<span class=\"token key atrule\">messages</span><span class=\"token punctuation\">:</span>\n    <span class=\"token key atrule\">onError</span><span class=\"token punctuation\">:</span> \n        <span class=\"token key atrule\">defaultMessage</span><span class=\"token punctuation\">:</span> Oops<span class=\"token punctuation\">,</span> something has gone wrong. \n        <span class=\"token key atrule\">locales</span><span class=\"token punctuation\">:</span> \n            <span class=\"token key atrule\">ru</span><span class=\"token punctuation\">:</span> Oops<span class=\"token punctuation\">,</span> something has gone wrong.\n\n<span class=\"token key atrule\">nlp</span><span class=\"token punctuation\">:</span>                                    // platform’s nlp function parameters\n  <span class=\"token key atrule\">morphology</span><span class=\"token punctuation\">:</span> myStem                    // library for morphological analysis of words\n  <span class=\"token key atrule\">tokenizer</span><span class=\"token punctuation\">:</span> myStem                     // tokenizer<span class=\"token punctuation\">,</span> specifies rules to be used to split the text into words\n  <span class=\"token key atrule\">vocabulary</span><span class=\"token punctuation\">:</span> common<span class=\"token punctuation\">-</span>vocabulary.json    // word weight dictionary for pattern ranking\n  <span class=\"token key atrule\">lengthLimit</span><span class=\"token punctuation\">:</span>\n    <span class=\"token key atrule\">enabled</span><span class=\"token punctuation\">:</span> <span class=\"token boolean important\">true</span>\n    <span class=\"token key atrule\">symbols</span><span class=\"token punctuation\">:</span> 400                        // limit on the number of symbols in an incoming message\n    <span class=\"token key atrule\">words</span><span class=\"token punctuation\">:</span> 100000                       // limit on the number of words in an incoming message\n  <span class=\"token key atrule\">timeLimit</span><span class=\"token punctuation\">:</span>\n    <span class=\"token key atrule\">enabled</span><span class=\"token punctuation\">:</span> <span class=\"token boolean important\">true</span>\n    <span class=\"token key atrule\">timeout</span><span class=\"token punctuation\">:</span> 10000                      // maximum request processing time (in milliseconds) for the nlp module\n\n<span class=\"token key atrule\">classifier</span><span class=\"token punctuation\">:</span>                             // classifier parameters\n  <span class=\"token key atrule\">enable</span><span class=\"token punctuation\">:</span> <span class=\"token boolean important\">true</span>\n  <span class=\"token key atrule\">engine</span><span class=\"token punctuation\">:</span> sts                           // classifier type\n  <span class=\"token key atrule\">noMatchThreshold</span><span class=\"token punctuation\">:</span> <span class=\"token number\">0.2</span>\n  <span class=\"token key atrule\">parameters</span><span class=\"token punctuation\">:</span>\n    <span class=\"token key atrule\">algorithm</span><span class=\"token punctuation\">:</span> aligner2                 // classification algorithm\n\n<span class=\"token key atrule\">aligner</span><span class=\"token punctuation\">:</span>\n  <span class=\"token key atrule\">vocabulary</span><span class=\"token punctuation\">:</span> common<span class=\"token punctuation\">-</span>vocabulary.json\n\n<span class=\"token key atrule\">exampleGroups</span><span class=\"token punctuation\">:</span>\n    <span class=\"token punctuation\">-</span> src/dictionaries/examples.json    // is specified when a group of examples is used</code></pre></div>","frontmatter":{"title":"","description":null},"headings":[{"value":"Connecting the classifier to the bot"}]}},"pageContext":{"slug":"/docs/en/classificator/connection_classificator/","previous":{"fields":{"slug":"/docs/en/classificator/e!/"},"frontmatter":{"title":"","description":null}},"next":{"fields":{"slug":"/docs/en/classificator/classificator_platform/"},"frontmatter":{"title":"","description":null}}}},
    "staticQueryHashes": ["1209419333"]}