MRCP ASR allows telephony developers to integrate with Voicegain’s deep learning ASR using the MRCP protocol. Integrate with FreeSWITCH or any VoiceXML platform that can talk MRCP.
Invoke our ASR with Speech Grammars or use large vocabulary transcription for Voice Bots, IVRs, real-time captioning and more.
On a broad benchmark, our accuracy of 89% is on par with the very best
Talk to us in English, Spanish, German, Portuguese, Korean Hindi (more coming)
Tested on compute instances on Google, AWS, Azure, IBM & Oracle
Integrates with VXML, FreeSWITCH and other platforms that talk MRCP
This blog post is intended for anyone responsible for upgrading/migrating an MRCP-based Nuance ASR nearing EOL (End of Life). They can explore how Voicegain ASR simplifies and economically extends the life of existing speech-IVR platforms. It serves as a 'drop-in' replacement for grammar-based Nuance ASR.
There are several hundred (if not thousands) telephony-based speech-enabled IVRs that act as the 'front-door' for all customer service phone calls for enterprises of all sizes. These speech-enabled IVRs are built on platforms like Genesys Voice Portal (GVP), Genesys Engage, Avaya Aura Experience Portal(AAEP)/Avaya Voice Portal , Cisco Voice Portal (CVP), Aspect or Voxeo ProphecyVoiceXML platform and several other such VoiceXML based IVR solutions. The systems predominantly use Nuance ASR as the speech recognition engine.
Unlike contemporary large vocabulary neural-network-based ASR/STT engines, the traditional Nuance ASR is a grammar-based ASR. It uses the MRCP protocol to talk to VoiceXML based IVR platforms. Most of these systems were purchased in the last two decades (2000s and 2010s). Customers typically paid a port-based perpetual license fee (the IVR platforms were also licensed similarly). Most enterprises have a software maintenance/AMC contracts for the Nuance ASR and this is usually bundled along with the IVR platform. The Nuance Recognizer versions in the market vary between 9.0 and 11.0. As of June 2022, Nuance had announced end of support for Nuance 10.0. It is our understanding in speaking with customers that the last version of Nuance sold – Nuance 11.0 Recognizer will approach either end-of-life or end-of-Orderability sometime in 2025*.
Also in speaking with customers, we have understood that customers who currently license the MRCP grammar-based Nuance ASR would have to upgrade to Nuance’s Krypton engine, the new deep-learning based ASR in 2025. Nuance Krypton can only be accessed using the modern gRPC based API and not over MRCP, which makes this upgrade expensive and time-consuming. Because of this, Customers would need to upgrade not just their the ASR but also the entire IVR platform. This is because most legacy IVR platforms - especially would do not support gRPC. This might also entail migrating the existing call flow logic –which is likely written in a VoiceXML app studio or written in a build tool and generated as VoiceXML pages – would also need to be ported.
All of the above steps makes the upgrade process very challenging. While there is a strong case to be made for the merits of upgrading to a deep-learning based ASR to support conversational interactions (better automation rates and more natural user-experience), it is critical for customers that this upgrade/migration is done on the customer’s timelines and not under the gun on the vendor’s clock.
Voicegain offers a drop-in replacement for the Nuance grammar-based ASR. We are the only modern deep-learning/AI (neural-network-based)ASR in the market that natively supports both traditional speech grammars (grxml, SRGS) and large-vocabulary conversational interactions. We are also one of the very few ASR vendors that can be accessed both over a traditional telephony-based protocol like MRCP and a modern web-based method like web-sockets (or gRPC). So the same neural-network model supports both the old and the new protocols. This allows you a future-proof method of replacing Nuance ASR with minimal effort while safeguarding this investment for the long term.
Net-net, by just "pointing" the ASR resource on the VoiceXML platform to the IP-address of the Voicegain MRCP ASR in your network, you can replace the entire Nuance ASR with the Voicegain ASR. Customers would not need to even change or modify a single line of code of the speech-IVR application logic.
In other words, a client can retain the existing telephony/IVR setup and just perform a "drop-in replacement" of Nuance MRCP ASR with Voicegain MRCP ASR.
Longer-term the same Voicegain ASR can perform large vocabulary transcription because it is a neural-network based ASR; so when the customer is ready to replace the directed-dialog Speech IVR with a conversational interaction, the Voicegain platform will already support it.
To discuss your upgrade situation in more detail, please contact us over email at sales@voicegain.ai.We can answer any questions that you have. You could also get started with a free developer account by following these instructions. There is no credit card required and we offer 1500 hours of usage for free. Here is a link to the instructions; after you sign up, please contact us at support@voicegain.aiand request MRCP access.
* Nuance ASR and Nuance Krypton are trademarks of Nuance, Inc which is now part of Microsoft. Please confirm the End of Life announcement and the protocol capability directly with the company. Our information in this blog post is anecdotal and has not been verified with Nuance.
Most enterprise IT organizations have mature telephony based IVR applications that serve as the “front door” for all voice based customer support calls. These applications use a combination of touchtone (DTMF) and speech to interact with callers. They have been carefully designed, developed and tuned over the years.
The objectives of any IVR are two fold 1) Automate simple routine queries (like balance inquiry, payment status, etc) and 2) Authenticate and intelligently route calls that require live support to the appropriate agent.
IT organizations across industry verticals like financial services, travel, media, telecom, retail or health-care have a small staff of in-house or outsourced IVR developers to maintain these applications. While enterprises have been focused on scaling and upgrading their digital support channels (like chat and email), IVR applications have largely remained un-touched for years.
As CIOs and CDOs (Chief Digital Officers) embark on strategic initiatives to migrate enterprise workloads to the Cloud, one "niche" workload on this list is the IVR. However migrating IVRs "as-is" to the cloud is tricky. The languages, protocols and platforms that these telephony based IVRs were built on is from the early 2000s and are approaching obsolescence. Also while they support directed dialogs with limited customer spoken utterances, they are not a good fit for conversational bot interactions.
So IT organizations are faced with a Catch 22 situation. On one-hand, it is cumbersome to maintain these IVR workloads. On the other hand, the rationale to migrate existing platforms "as-is" to modern cloud infrastructure is questionable. Why bear the trouble and expense if IVRs are eventually are going to be replaced by conversational bots?
So there is a real need to modernize these IVRs as part of their cloud migration strategy.
Traditionally speech IVR applications ran on on-premise Contact Center telephony platforms. Companies like Avaya, Nortel, Cisco, Intervoice, Genesys and Aspect dominated the vendor landscape. In the early to mid-2000s, these vendors worked collaboratively as part of the W3C consortium to develop VoiceXML, an open vendor agnostic language for speech-enabled IVR applications.
VoiceXML enabled developers to build interactive voice dialogs and provided a standard way to interact with an automatic speech recognizer (ASR). This was done using a telephony based protocol called MRCP. The standard also provided a method to define speech grammars called SRGS and a format called GRXML.
The architecture and supporting jargon/terminology around VoiceXML borrowed heavily from the web world. The VoiceXML platform was referred to as a “Voice browser” that could “render VoiceXML pages” just like how a web browser could render HTML pages. Most contact center platforms provided visual IDEs to help build and maintain these interactive call flows. Some also automated the generation of the VoiceXML pages. The IDE generated code that could run on application server (like Apache Tomcat) which in turn generated VoiceXML pages that were sent to a VoiceXML platform over standard HTTP. The application server was also responsible for making web-services requests to enterprise database resources that were required for the IVR interaction; for e.g. billing/payment systems or CRM systems.
Also most ASRs from the late 90s and early 2000s were based on Hidden Markov Models and Gaussian Mixture models. They mainly supported grammar-based recognition - which meant that as a Speech IVR developer you had to anticipate all possible utterances that a user could say in response to a question/prompt. There were some options to build open-ended statistical language models but these were tricky and required careful selection of the training corpus.
Why modernize now?
While VoiceXML worked well in the past, it is a niche and outdated language. The last release of VoiceXML 2.1 was back in 2007!! That is more than a decade ago.
And a lot has changed in the web world since then. VoiceXML was developed at a time when JSP (Java Server Pages) was widely used. So it was before JSON, YAML, RESTful APIs & AJAX.
For enterprises, it is expensive to maintain a dedicated staff - whether in-house or outsourced - with niche skills in technologies like VoiceXML and MRCP.
Enterprises should ideally be able to run IVR app like any other modern web application. Most enterprise web apps are built on programming languages like Python, Node.JS that are popular with web developers. They are containerized using docker and orchestrated using Kubernetes.
It would be ideal for an enterprise IT organization for its IVR app to be built on similar programming languages so that it can be supported or maintained just like other applications in the IT portfolio.
In addition to the obsolescence of VoiceXML, the speech recognition engine (ASR) that was deployed in the early 2000s has also become outdated. Modern speech-to-text engines are built on Deep Neural Networks that run on powerful GPU infrastructure. They offer amazing accuracy and allow the use of a very large vocabulary - which is what is needed for bot like conversational experience. Also modern NLU engines allow you to easily extract intents from the transcribed text.
So if an enterprise wants to offer a voice bot that supports an open conversational experience, they need to move to a modern DNN based Speech-to-Text platform that can integrate with such NLU engines.
At Voicegain, we recommend that an enterprise first modernize the underlying infrastructure while retaining the existing IVR application logic. This is a great first step. It allows an enterprise to continue serving existing users while taking a step towards providing a more conversational user experience.
We suggest that the existing call flow logic - which is typically maintained using visual IDEs of contact center platforms - get rewritten (ideally with the help of automated tools) into a modern programming language like Python or Node.Js.
Instead of generating legacy VoiceXML pages, enterprises should use web friendly data representation languages like JSON or YAML to interact with modern RESTful Speech-to-Text APIs using web callbacks.
How Voicegain supports IVR App modernization?
At Voicegain, we provide a modern Voice AI platform that includes
Voicegain is developing tools to automatically convert VoiceXML to equivalent JSON/YAML representation that talks to our callback APIs.
How is this a "future proof" architecture for an enterprise?
The Voicegain platform is capable of large vocabulary transcription which is a requirement for NLU based Voice Bots. This will be the way customers interact with enterprises in the future.
We allow developers to switch between grammar based recognition and large vocabulary recognition at each and every turn of the dialog; or you could simultaneously use both to achieve more flexibility.
Our Telephony Bot APIs can also integrate with Bot Frameworks like Google Dialog Flow, .
We are inviting enterprise web developers for a free trial of our platform.
Interested in customizing the ASR or deploying Voicegain on your infrastructure?