{
    "componentChunkName": "component---src-templates-doc-page-js",
    "path": "/docs/en/telephony/recognition_synthesis_settings/",
    "result": {"data":{"site":{"siteMetadata":{"title":"Gatsby-doc-engine"}},"markdownRemark":{"id":"b4741145-1954-5344-b6a4-ada0938a747e","excerpt":"Speech recognition and synthesis Bots that make and accept calls use the voice synthesis (text-to-speech, TTS) and automatic speech recognition (ASR) features…","html":"<h1>Speech recognition and synthesis</h1>\n<hr>\n<p>Bots that make and accept calls use the voice synthesis (text-to-speech, <a href=\"#TTS\">TTS</a>) and automatic speech recognition (<a href=\"#ASR\">ASR</a>) features.</p>\n<ul>\n<li><em>Text-To-Speech (TTS)</em> (voice synthesis) is the process of generation of speech from the written text.</li>\n<li><em>Automatic Speech Recognition (ASR)</em> is the process of translating of speech to text.</li>\n</ul>\n<p>You can do the following when you create a telephone channel:</p>\n<ul>\n<li><a href=\"#ASR-and-TTS-configuration\">Select providers, setup speech synthesis and recognition</a>. E.g. select a specific voice or recognition mode. Or you can keep the defaults.</li>\n<li><a href=\"/1.10.3/docs/en/telephony/own_telephony\">Create a connection</a> using a provider’s account to recognize and synthesize speech.</li>\n</ul>\n</br>\n<h3>Select a provider</h3>\n<p>You can select ASR and TTS providers when you <a href=\"/1.10.3/docs/en/telephony/telephone_channel\">create a telephone channel</a>. Open the <em>ASR</em> tab and select the connection, and then repeat these steps for <em>TTS</em>.</p>\n<p>Please note that you will need to manually switch your channel to another provider in case of any faults if a specific ASR or TTS provider is selected.</p>\n<p>You can also keep the <em>Default</em> settings, in which case the settings of the most stable ASR and TTS providers will be applied. The channel will be switched to another provider in case of any faults in provider operation.</p>\n</br>\n<h3>ASR and TTS configuration</h3>\n<h4>ASR</h4>\n<p>You can select one of the connections for ASR and specify additional settings when you create a telephone channel.</p>\n</br>\n<table>\n<thead>\n<tr>\n<th>Connection</th>\n<th>Settings</th>\n<th>Description</th>\n</tr>\n</thead>\n<tbody>\n<tr>\n<td><strong>Google</strong></td>\n<td><em>Language</em></td>\n<td>The service can recognize speech in multiple languages. You can find the complete list <a href=\"https://cloud.google.com/speech-to-text/docs/languages\" target=\"_blank\" rel=\"noopener noreferrer\">here</a>. English (<code class=\"language-text\">en-US</code>) is used by default.</td>\n</tr>\n<tr>\n<td></td>\n<td><em>Model</em></td>\n<td>One of the <a href=\"https://cloud.google.com/speech-to-text/docs/basics#select-model\" target=\"_blank\" rel=\"noopener noreferrer\">machine learning models</a> is used for speech recognition. These models were trained by Google for certain sound types and sources. </br> </br> See the <a href=\"https://cloud.google.com/speech-to-text/docs/languages\" target=\"_blank\" rel=\"noopener noreferrer\">table</a> for the list of models available for each language: </br> </br> <code class=\"language-text\">Phone call</code> — Use this model to recognize speech in a phone call. </br> </br> <code class=\"language-text\">Command and search</code> — Use this model to recognize speech in short audio files, such as voice commands. </br> </br> <code class=\"language-text\">Default</code> — Use this model if the above models do not satisfy you.</td>\n</tr>\n<tr>\n<td><strong>Yandex</strong></td>\n<td><em>Language</em></td>\n<td>The service can recognize speech in the following languages: </br> </br> <code class=\"language-text\">ru-RU</code> (default) — Russian, </br> <code class=\"language-text\">en-US</code> — English, </br> <code class=\"language-text\">tr-TR</code> — Turkish.</td>\n</tr>\n<tr>\n<td></td>\n<td><em>Model</em></td>\n<td>One of the <a href=\"https://cloud.yandex.ru/docs/speechkit/stt/models\" target=\"_blank\" rel=\"noopener noreferrer\">machine learning models</a> is used for speech recognition. Data arrays from Yandex services and applications are used to train models. </br> </br></td>\n</tr>\n<tr>\n<td><strong>Tinkoff</strong></td>\n<td></td>\n<td>This connection setting is currently not available.</td>\n</tr>\n</tbody>\n</table>\n</br>\n<h4>TTS</h4>\n<p>You can select one of the connections for TTS and specify additional settings when you create a telephone channel.</p>\n</br>\n<table>\n<thead>\n<tr>\n<th>Connection</th>\n<th>Settings</th>\n<th>Description</th>\n</tr>\n</thead>\n<tbody>\n<tr>\n<td><strong>Google</strong></td>\n<td><em>Language</em></td>\n<td>The service can synthesize speech in multiple languages. You can find the complete list <a href=\"https://cloud.google.com/speech-to-text/docs/languages\" target=\"_blank\" rel=\"noopener noreferrer\">here</a>.</td>\n</tr>\n<tr>\n<td></td>\n<td><em>Voice</em></td>\n<td>You can use multiple voice options in the service (see <a href=\"https://cloud.google.com/text-to-speech/docs/voices\" target=\"_blank\" rel=\"noopener noreferrer\">here</a> for the complete list). </br> </br> The following voice is used by default: </br> </br><code class=\"language-text\">en-US-Wavenet-A</code> for English;</br> <code class=\"language-text\"> ru-RU-Wavenet-B</code> for Russian; </br> <code class=\"language-text\">cmn-CN-Wavenet-B</code> for Chinese; </br> <code class=\"language-text\">Wavenet-A</code> for other languages.</td>\n</tr>\n<tr>\n<td></td>\n<td><em>Speed</em></td>\n<td>Speech tempo or speed. Here <code class=\"language-text\">1</code> is the normal speed of specific voice.</td>\n</tr>\n<tr>\n<td></td>\n<td><em>Voice pitch</em></td>\n<td>Voice pitch. Here <code class=\"language-text\">20</code> is 20 halftones up from the original tone, and <code class=\"language-text\">-20</code> means the corresponding decrease.</td>\n</tr>\n<tr>\n<td></td>\n<td><em>Raise volume</em></td>\n<td>Volume increase in dB relative to the normal volume of specific voice. When <code class=\"language-text\">+6.0</code> dB is selected, playback volume is twice as high as the normal one. We strongly discourage you from exceeding <code class=\"language-text\">+10.0</code> dB.</td>\n</tr>\n<tr>\n<td><strong>Yandex</strong></td>\n<td><em>Language</em></td>\n<td>Speech can be synthesized in three languages:</br> </br> <code class=\"language-text\">ru-RU</code> (Russian); </br> <code class=\"language-text\">en-US</code> (English); </br> <code class=\"language-text\">tr-TR</code> (Turkish).</td>\n</tr>\n<tr>\n<td></td>\n<td><em>Voice</em></td>\n<td>You can use multiple voice options in the service (see <a href=\"https://cloud.yandex.ru/docs/speechkit/tts/voices\" target=\"_blank\" rel=\"noopener noreferrer\">here</a> for the complete list). The following voice is used by default: </br> </br><code class=\"language-text\">alyss</code> for English;</br> <code class=\"language-text\">alena</code> for Russian; </br> <code class=\"language-text\">alyss</code> for other languages.</td>\n</tr>\n<tr>\n<td></td>\n<td><em>Speed</em></td>\n<td>Speech tempo or speed. Here <code class=\"language-text\">1</code> is the normal speed of specific voice.</td>\n</tr>\n</tbody>\n</table>","frontmatter":{"title":"","description":null},"headings":[{"value":"Speech recognition and synthesis"}]}},"pageContext":{"slug":"/docs/en/telephony/recognition_synthesis_settings/","previous":{"fields":{"slug":"/docs/en/telephony/script_voice_bot/"},"frontmatter":{"title":"","description":null}},"next":{"fields":{"slug":"/docs/en/telephony/own_telephony/"},"frontmatter":{"title":"","description":null}}}},
    "staticQueryHashes": ["1209419333"]}