Access Voicegain ASR from FreeSWITCH using mod

Voicegain STT platform has supported MRCP (Media Resource Control Protocol) for a long time now. Our ASR can be accessed using MRCP and we support both grammar-based recognition (e.g. GRXML) and large-vocabulary transcription. MRCP is a communication protocol designed to connect telephony based IVRs and Voice Bots with speech recognizers (ASR) and speech synthesizers (TTS).

Previously we tested connecting to Voicegain using MRCP from VXML platforms like Dialogic PowerMedia XMS or Aspect Prophecy. We had not tested connecting from FreeSWITCH, a popular open source telephony platform, using its MRCP plugin mod_unimrcp.

We are pleased to announce that Voicegain platform works out-of-the box with mod_unimrcp, the MRCP plugin for FreeSWITCH. However, getting the mod_unimrcp plugin to work on FreeSWITCH is not particularly trivial. Here are some pointers to help those who would like to use mod_unimrcp with our platform.

‍

Deploying Voicegain unimrcp server

There are currently 2 options to do this. We plan to add a third option very soon

For production deployments of Speech IVRs and Voice Bots on FreeSWITCH, we recommend an Edge Deployment of the Voicegain platform. This will deploy our unimrcp server that can communicate with a locally deployed FreeSWITCH using MRCP.
To use our Cloud ASR, you will need to download a MRCP IVR Proxy. This proxy can be downloaded from the Voicegain Web Console. You will download a tar file that has the definition of a docker compose that you can then run on your docker server. This will deploy our preconfigured unimrcp server with a proxy for connecting to Voicegain Cloud Speech-to-Text engine .
(Coming soon) We plan to implement a voicegain_asr plugin that can be deployed on a standard unimrcp server. The plugin will talk to our ASR in the cloud using gRPC.

Also, the current TTS option accessible over MRCP are not great. Our focus has been on the use of prerecorded prompts for IVRs and Voice Bots. We plan to shortly allow developers to access the Google or Amazon TTS.

‍

Configuring FreeSWITCH for mod_unimrcp

mod_unimrcp does not get built by default when you build FreeSWITCH from source. To get it built you need to enable it in build/modules.conf.in by uncommenting this line: #asr_tts/mod_unimrcp

After the build, before starting FreeSWITCH you will need to:

Add <load module="mod_unimrcp"/> to autoload_configs/modules.conf.xml(you can put it in  section because that is where it logically belongs)
Create mrcp_profile for voicegain (see below)
Modify content of autoload_config/unimrcp.conf.xmlIf you want to use both ASR and TTS via Voicegain MRCP, you will need to point both default-asr-profile and default-tts-profile to the voicegain1-mrcp2 profile you will create in mrcp_profiles folder.

Here is an example MRCP v2 profile for connecting to Voicegain MRCP:

‍

Here are some additional notes about the configuration file:

It is important that the port range used by the Unimrcp Client:<param name="rtp-port-min" value="4000"/><param name="rtp-port-max" value="5000"/>is accessible from outside, otherwise, the TTS via MRCP will not work. Also, these ports may not overlap with the UDP ports used by FreeSWITCH.
In some setups the "auto" values of :<param name="client-ip" value="auto"/> and<param name="rtp-ip" value="auto"/>may not work and you will have to manually specify the external IP.

‍

How to use mod_unimrcp

‍

Here is an example of how to play a question prompt and to invoke the ASR via mod_unimrcp to recognize a spoken phone number:

‍


session:execute("set", "tts_engine=unimrcp:voicegain1-mrcp2");
session:execute("set", "tts_voice=Catherine");
session:execute("play_and_detect_speech", 
"say:What is your phone number detect:unimrcp {start-input-timers=false,define-grammar=true,no-input-timeout=5000}builtin:grammar/phone")

asrResult = session:getVariable("detect_speech_result");
test

‍

What this example does is:

tells FS which tts_egine to use
sets the TTS voice - currently ignored
plays a question prompt using the specified TTS and launches the recognition
retrieves the result of the speech recognition

The result of the recognition is a string in XML format (NLSML). You will need to parse it to get the utterance and any semantic interpretations. NLSML result also contains confidence.

‍

The normal command "play_and_detect_speech" holds onto ASR session until the end of the call - this makes subsequent recognitions more responsive, but you are paying for the MRCP session. You can also use this command "play_and_detect_speech_close_asr" to release ASR session immediately after recognition.

‍

If you have any questions about the use of Voicegain ASR via MRCP please contact us at: support@voicegain.ai

‍

Coming Soon

On our roadmap we have a mod_voicegain plugin for FreeSWITCH which will bypass the need for mod_unimrcp and unimrcp server and will be talking from FreeSWITCH directly to the Voicegain ASR using gRPC.

Casey

Transcribe