Our Blog

News, Insights, sample code & more!

ASR
Voicegain MRCP ASR - Quick, Affordable and Simple replacement for the Nuance Recognizer which is rapidly approaching EOL

This article outlines how the modern Voicegain deep-learning based Speech-to-Text/ASR can be a simple and affordable alternative for businesses that are looking for a quick and easy replacement to their on-premise Nuance Recognizer. Nuance has announced that its going to end support for Nuance Recognizer, its grammar-based ASR which uses the MRCP protocol, sometime in 2026 or 2027. So organizations that have a Speech-enabled IVR as their front door to the contact center need to start planning now.

The future belongs to Generative AI powered Voice Agents

With the rise of Generative AI and highly accurate low latency speech-to-text models, the front door of the call center is poised for major transformation. The infamous and highly frustrating IVR phone menu will be replaced by Conversational AI Voicebots; but this will likely happen over the next 3-5 years. As enterprises start to plan their migration journey from these tree-based IVRs to an Agentic AI future, they would like to do this on their timelines. In other words, they do not want to be forced to do this under the pressure of a deadline because of EOL of their vendor.

Staying On-Premise or in a VPC for both the IVR platform and the ASR

In addition, the migration path proposed by Nuance is a multi-tenant cloud offering. While a cloud based ASR/Speech-to-Text engine is likely to make sense for most businesses, there are companies in regulated sectors that are prevented from sending their sensitive audio data to a multi-tenant cloud offering. 

In addition to the EOL announcement by Nuance for their on-premise ASR, a major IVR platform vendor like Genesys has also announced that its premise-based offerings - Genesys Engage and Genesys Connect - will also approach EOL at the same time as the Nuance ASR.

So businesses that want a modern Gen AI powered Voice Assistant but want to keep the IVR on-premise in their datacenter or behind their firewall in a VPC will need to start planning very quickly what their strategy is going to be.

At Voicegain, we allow enterprises that are in this situation and want to remain on-premise or in their VPC with a modern Voicebot platform. This Voicebot platform runs on modern Kubernetes clusters and leverages the latest NVIDIA GPUs. 

Switching Nuance Recognizer with Voicegain is quick and easy!

Rewriting the IVR Application logic to migrate from a tree-based IVR menu to a conversational Voice Assistant is a journey. It would require investments and allocation of resources. Hence a  good first step is to simply replace the underlying Nuance ASR (and possibly the IVR platform too). This will guarantee that a company can migrate to a modern Gen-AI Voice Assistant on its timelines.

Voicegain offers a modern highly accurate deep-learning-based Speech-to-text engine trained on hundreds of thousands of hours of telephone conversations. It is integrated into our native modern telephony stack. It can also talk over the MRCP protocol with VoiceXML based IVR platforms and it supports the traditional Speech grammars (SRGS, JJSGF). Voicegain also supports a range of built-in grammars (like Zipcode, Dates etc).

As a result, it is a simple "drop-in" replacement to the Nuance Recognizer. There is no need to rewrite the current IVR application. Instead of pointing to the IP address of the Nuance Server, the VoiceXML platform just needs to be reconfigured to point to the IP address of the Voicegain ASR server. This should take no more than a couple of minutes.

Voicegain Telephony Bot API - a Callback API for Telephony-based AI Voice Assistant

In addition to the Voicegain ASR/STT engine, we also offer a Telephony Bot API. This is a callback style API that includes our native IVR platform and ASR/STT engine can be used to build Gen AI powered Voicebots. It integrates with leading LLMs - both cloud and open-source premise based - to drive a natural language conversation with the callers.

Talk to us about your IVR migration journey!

If you would like to discuss your IVR migration journey, please email us at sales@voicegain.ai . At Voicegain, we have decades of experience in designing, building and launching conversational IVRs and Voice Assistants.

Here is also a link to more information. Please feel free to schedule a call directly with one of our Co-founders.

Read more → 
Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.
Raspberry Pi as Audio Streaming Client
Edge
Raspberry Pi as Audio Streaming Client

You can stream audio for Voicegain transcription API from any computer, but sometimes it is handy to have a dedicated inexpensive  device just for this task. Below we relay experiences of one of our customers in using a Raspbery Pi to stream audio for real-time transcription. It replaced a Mac Mini which was initially used for that purpose. Using Pi had two benefits: a) obviously the cost, and b) it is less likely than Mac Mini to be "hijacked" for other purposes.

Hardware

Voicegain Audio Streaming Daemon requires very little as far as computing resources, so in even a Raspberry Pi Zero is sufficient ; however, we recommend using Raspberry Pi 3 B+ mainly because it has on-board 1Gbps wired Ethernet port. WiFi connections are more likely to have problems with streaming using UDP protocol.

Here is a list of all hardware used in the project (with amazon prices (as of July 2019)):

  • Element14 Raspberry Pi 3 B+ Motherboard - $37.78
  • Miuzei Raspberry Pi 3 b+ Screen, 3.5 Inch - $23.99
  • Miuzei 3.5 Inch Screen Case for 3.5 LCD - $9.99
  • iPazzPort Wireless Mini Handheld Keyboard - $13.99
  • UGREEN USB Audio Adapter - $8.99
  • SanDisk Ultra 32GB microSDHC UHS-I card - $7.23
  • plus some existing USB 5V power supply was uses.

All the components added up to a total of $101.97. The reason why a mini monitor and a mini keyboard were included is that they make it more convenient to control the device while it is in the audio rack. For example, the alsa audio mixer can be easily adjusted this way, while at the same time monitoring the level of the audio via headphones.



Raspberry PI running AudioDaemon

Software

The device is running standard Raspbian which can easily be installed from an image using e.g.  balenaEtcher. After base install, the following was needed to get things running:

  • enable ssh access
  • change default audio device to USB sound card (Raspbian comes default with alsa and basic USB sound drivers)
  • installing driver for the display (otherwise output font is too tiny and not readable)
  • installing OpenJDK 9
  • use link generated from Voicegain Portal to download Voicegain AudioDaemon jar file and correct JSON config
  • seting the correct audio source number the AudioDaemon start script and launching the daemon

Observations

Here are some lessons learned from using this setup over the past 6 months:

  • While streaming the CPU use stays under 10%
  • Java heap is set to 128m, which seems to be more that enough because GCs manage to reduce it to about 54m
  • Raspberry Pi turned out to be very reliable - we have not had a single issue with the hardware nor with the Raspbian OS
  • Cheap USB audio card delivers very good sound quality (for speech recognition at least)
  • Very cheap USB power supplies should be avoided - sometimes they cause a hum in the audio (but that also depends on what audio device is being connected).

Read more → 
Voicegain Story
Announcement
Voicegain Story

The team behind VoiceGain has more than 12 years experience of using Automated Speech Recognition in real wold - developing and hosting complete IVR systems for large enterprises.

​We started of as Resolvity, Inc., back in 2005. We built our own IVR Dialog platform, utilizing AI to guide the dialog and to improve the recognition results from commercial ASR engines.

Resolvity Dialog Platform

The Resolvity Dialog platform, had some advanced AI modules. For example:


  • It had ontology that could be used to model Dialog Domain . This ontology then could be used to automatically drive the dialog. It would automatically generate follow up questions based on the information that was already acquired. We used this often in IVR applications that required recognition of product names.
  • It had an Incremental Case-Based Reasoning (CBR) troubleshooting engine which together with Ontology could be used to diagnose technical problems based on presented symptoms.
  • It had a module to correct systematic errors of the ASR engine to improve the accuracy (we received a US Patent for this)
  • It had an NLCR module that could automatically handle "How may I help you?" type of interactions. It used a combination on Ontology, Bayesian and Neural Network classifiers.


Hosted IVR

Starting from 2007 we were building complete IVR applications for Customer Support and hosting them on our servers in data centers. We build a Customer Solutions team that interacted with our customers ensuring that the IVR applications were always up to date and an Operations team that ensured that we ran the IVRs 24/7 with very high SLAs.

Resolvity Dialog Platform had a set of tools available that allowed us to analyze speech recognition accuracy in high detail and also allowed us to tune various ASR parameters (thresholds, grammars).

Moreover, because that platform was ASR-engine agnostic, we were able to see how a number of ASR engines from various brands performed in real life.



VoiceGain 1.0 Cloud PBX

In 2012-2013 Resolvity built a complete low-cost Cloud PBX platform on top of Open Source projects. We launched it for the India market under the brand name VoiceGain. The platform was providing complete end-to-end PBX+IVR functionality.

The version that we used in prod supported only DTMF, but we also had a functional ASR version. However, at that time it was built using conventional ASR technologies (GMM+HMM) and we found that training it for new  languages presented quite a bit of challenges.

VoiceGain was growing quite fast. We had presence in data centers in Bangalore and Mumbai. We were able to provision both landline and mobile numbers for our PBX+IVR customers. Eventually, although our technology was performing quite well, we found it expensive to run a very hands-on business in India from the USA and sold our India operations.

Augmented Recognition

​When the combination of hardware and AI developments made Deep Neural Networks possible, we decided to start working on our own DNN Speech Recognizer, initially with the goal to augment the results from the ASR engines that we used in our IVRs. Very quickly we noticed that with our new customized ASR used for IVR tasks we could achieve results better than with the commercial ASRs. We were able to confirm this by running comparison tests across data sets containing thousands of examples. The key to higher accuracy was ability to customize the ASR Acoustic Models to the specific IVR domain and user population.

​Own ASR Platform

Great results with augmented recognition lead us to launch a full scale effort to build a complete ASR platform, again under Voicegain (.ai) brand name, that would allow for easy model customization and be easy to use in IVR applications.

From our IVR experience we knew that large enterprise IVR users are (a) very price sensitive plus (b) require tight security compliance, that is why from day 1 we also worked on making the Voicegain platform deployable on the Edge.

Read more → 
Category 1
This is some text inside of a div block.
by Jacek Jarmulak • 10 min read

Donec sagittis sagittis ex, nec consequat sapien fermentum ut. Sed eget varius mauris. Etiam sed mi erat. Duis at porta metus, ac luctus neque.

Read more → 
Category 1
This is some text inside of a div block.
by Jacek Jarmulak • 10 min read

Donec sagittis sagittis ex, nec consequat sapien fermentum ut. Sed eget varius mauris. Etiam sed mi erat. Duis at porta metus, ac luctus neque.

Read more → 
Category 1
This is some text inside of a div block.
by Jacek Jarmulak • 10 min read

Donec sagittis sagittis ex, nec consequat sapien fermentum ut. Sed eget varius mauris. Etiam sed mi erat. Duis at porta metus, ac luctus neque.

Read more → 
Category 1
This is some text inside of a div block.
by Jacek Jarmulak • 10 min read

Donec sagittis sagittis ex, nec consequat sapien fermentum ut. Sed eget varius mauris. Etiam sed mi erat. Duis at porta metus, ac luctus neque.

Read more → 
Category 1
This is some text inside of a div block.
by Jacek Jarmulak • 10 min read

Donec sagittis sagittis ex, nec consequat sapien fermentum ut. Sed eget varius mauris. Etiam sed mi erat. Duis at porta metus, ac luctus neque.

Read more → 
Category 1
This is some text inside of a div block.
by Jacek Jarmulak • 10 min read

Donec sagittis sagittis ex, nec consequat sapien fermentum ut. Sed eget varius mauris. Etiam sed mi erat. Duis at porta metus, ac luctus neque.

Read more → 
Sign up for an app today
* No credit card required.

Enterprise

Interested in customizing the ASR or deploying Voicegain on your infrastructure?

Contact Us → 
Voicegain - Speech-to-Text
Under Your Control