This site is no longer updated.Go to new Conversational Cloud docs

Audio playback


In the phone channel and in voice assistants, you can use not only synthesized speech but also pre-recorded audio files as bot replies. The main reason to use audio is that a live speaker always sounds more lively and engaging, which improves the overall user experience.

If you want to use context-dependent variables that should be mentioned throughout the dialog, you can try speech synthesis with variables.

Audio file format

For the phone channel
For voice assistants
  • File extension: .wav.
    • Constant bit rate: 128 kbps.
    • 1 channel (mono).
    • Sampling rate: 8–48 kHz.
    • Codec: 16-bit (PCM) little-endian.
  • File size: up to 10 MB.
  • File extension: .mp3.
  • Duration: up to 4 minutes.
  • File size: up to 10 MB.

How to use

There are several ways audios can be played from the bot script:

In all cases, pass the direct link to the audio file. The file can be stored in any repository which allows public access.

state: Playback
    audio: https://example.com/audio.wav

Audio caching

When audio files are played for the first time, they are cached on the server where the bot is deployed. This prevents further lags when the audio is played back again.

Audio URLs serve as cache keys. If you change the contents of an audio file, e. g. clip it or replace it altogether, but keep the original file name, these changes will be ignored. The old version of the audio file from the cache will still be used.

Rename your audio files if you make any changes to their contents.