This article outlines how the modern Voicegain deep-learning based Speech-to-Text/ASR can be a simple and affordable alternative for businesses that are looking for a quick and easy replacement to their on-premise Nuance Recognizer. Nuance has announced that its going to end support for Nuance Recognizer, its grammar-based ASR which uses the MRCP protocol, sometime in 2026 or 2027. So organizations that have a Speech-enabled IVR as their front door to the contact center need to start planning now.
With the rise of Generative AI and highly accurate low latency speech-to-text models, the front door of the call center is poised for major transformation. The infamous and highly frustrating IVR phone menu will be replaced by Conversational AI Voicebots; but this will likely happen over the next 3-5 years. As enterprises start to plan their migration journey from these tree-based IVRs to an Agentic AI future, they would like to do this on their timelines. In other words, they do not want to be forced to do this under the pressure of a deadline because of EOL of their vendor.
In addition, the migration path proposed by Nuance is a multi-tenant cloud offering. While a cloud based ASR/Speech-to-Text engine is likely to make sense for most businesses, there are companies in regulated sectors that are prevented from sending their sensitive audio data to a multi-tenant cloud offering.
In addition to the EOL announcement by Nuance for their on-premise ASR, a major IVR platform vendor like Genesys has also announced that its premise-based offerings - Genesys Engage and Genesys Connect - will also approach EOL at the same time as the Nuance ASR.
So businesses that want a modern Gen AI powered Voice Assistant but want to keep the IVR on-premise in their datacenter or behind their firewall in a VPC will need to start planning very quickly what their strategy is going to be.
At Voicegain, we allow enterprises that are in this situation and want to remain on-premise or in their VPC with a modern Voicebot platform. This Voicebot platform runs on modern Kubernetes clusters and leverages the latest NVIDIA GPUs.
Rewriting the IVR Application logic to migrate from a tree-based IVR menu to a conversational Voice Assistant is a journey. It would require investments and allocation of resources. Hence a good first step is to simply replace the underlying Nuance ASR (and possibly the IVR platform too). This will guarantee that a company can migrate to a modern Gen-AI Voice Assistant on its timelines.
Voicegain offers a modern highly accurate deep-learning-based Speech-to-text engine trained on hundreds of thousands of hours of telephone conversations. It is integrated into our native modern telephony stack. It can also talk over the MRCP protocol with VoiceXML based IVR platforms and it supports the traditional Speech grammars (SRGS, JJSGF). Voicegain also supports a range of built-in grammars (like Zipcode, Dates etc).
As a result, it is a simple "drop-in" replacement to the Nuance Recognizer. There is no need to rewrite the current IVR application. Instead of pointing to the IP address of the Nuance Server, the VoiceXML platform just needs to be reconfigured to point to the IP address of the Voicegain ASR server. This should take no more than a couple of minutes.
In addition to the Voicegain ASR/STT engine, we also offer a Telephony Bot API. This is a callback style API that includes our native IVR platform and ASR/STT engine can be used to build Gen AI powered Voicebots. It integrates with leading LLMs - both cloud and open-source premise based - to drive a natural language conversation with the callers.
If you would like to discuss your IVR migration journey, please email us at sales@voicegain.ai . At Voicegain, we have decades of experience in designing, building and launching conversational IVRs and Voice Assistants.
Here is also a link to more information. Please feel free to schedule a call directly with one of our Co-founders.
Voicegain, the leading Privacy-first Voice AI platform for enterprises and AI SaaS companies, is thrilled to announce the successful completion of a System and Organizational Control (SOC) 2 Type 2 Audit performed by Johanson LLP. This achievement underscores Voicegain's ongoing commitment to the highest standards of security, availability, and confidentiality for customer data.
Developed by the American Institute of Certified Public Accountants (AICPA), the SOC 2 Information security audit provides a report on the examination of controls relevant to the trust services criteria categories covering security, availability, processing integrity, confidentiality, and privacy. Voicegain’s SOC 2 Type 2 report did not have any noted exceptions and was therefore issued with a “clean” audit opinion from Johanson LLP. The SOC 2 Type 2 certification is widely recognized as a rigorous measure of an organization’s systems and controls over an extended period.
"As a Privacy first Voice AI Platform, data protection and trust are at the heart of Voicegain. Whether you are a developer working for a startup using our APIs or a Fortune 500 enterprise user of our platform, you shouldn’t have to worry about the controls in place for your sensitive voice data. It has been close to 2 years since we embarked on our SOC 2 Type 1 journey and I am really proud of what our team has accomplished. We look forward to providing businesses worldwide with Voice AI solutions that deliver true peace of mind when it comes to security and compliance" said Dr Jacek Jarmulak, Co-founder, CTO & CISO Of Voicegain.
Service Organization Control 2(SOC2) is a set of criteria established by the American Institute of Certified Public Accountants (AICPA) to assess controls relevant to the security, availability, and processing integrity of the systems a service organization uses to process users’ data and the confidentiality and privacy of the information processed by these systems. SOC 2 compliance is important for Voice AI platforms like Voicegain, as it demonstrates that we have implemented controls to safeguard users’ data.
There are two types of SOC 2 compliance:
From a functional standpoint, achieving SOC 2 Type 2 compliance doesn’t change anything. Our APIs and Apps will work exactly as they always have and as expected. However SOC 2 Type 2 compliance means that we have established a set of controls and processes to ensure the security of our users’ data. This compliance demonstrates that we have the necessary measures in place to protect sensitive information from unauthorized access and disclosure.
Our commitment to security doesn’t end with SOC 2 Type 2. We are looking forward to increasing the maturity of the entire security process. This includes the following:
"We understand that in today's fast moving industry landscape, data security is non-negotiable," added Arun Santhebennur, Co-founder & CEO of Voicegain. "By achieving SOC 2 Type 2 compliance, we aim to set a high watermark in the Voice AI market. Our customers can have full confidence that their sensitive information is protected throughout its lifecycle."
To request a copy of our SOC 2 Type 2 report, please email security.it@voicegain.ai
This article outlines how the modern Voicegain deep-learning based Speech-to-Text/ASR can be a simple and affordable alternative for businesses that are looking for a quick and easy replacement to their on-premise Nuance Recognizer. Nuance has announced that its going to end support for Nuance Recognizer, its grammar-based ASR which uses the MRCP protocol, sometime in 2026 or 2027. So organizations that have a Speech-enabled IVR as their front door to the contact center need to start planning now.
With the rise of Generative AI and highly accurate low latency speech-to-text models, the front door of the call center is poised for major transformation. The infamous and highly frustrating IVR phone menu will be replaced by Conversational AI Voicebots; but this will likely happen over the next 3-5 years. As enterprises start to plan their migration journey from these tree-based IVRs to an Agentic AI future, they would like to do this on their timelines. In other words, they do not want to be forced to do this under the pressure of a deadline because of EOL of their vendor.
In addition, the migration path proposed by Nuance is a multi-tenant cloud offering. While a cloud based ASR/Speech-to-Text engine is likely to make sense for most businesses, there are companies in regulated sectors that are prevented from sending their sensitive audio data to a multi-tenant cloud offering.
In addition to the EOL announcement by Nuance for their on-premise ASR, a major IVR platform vendor like Genesys has also announced that its premise-based offerings - Genesys Engage and Genesys Connect - will also approach EOL at the same time as the Nuance ASR.
So businesses that want a modern Gen AI powered Voice Assistant but want to keep the IVR on-premise in their datacenter or behind their firewall in a VPC will need to start planning very quickly what their strategy is going to be.
At Voicegain, we allow enterprises that are in this situation and want to remain on-premise or in their VPC with a modern Voicebot platform. This Voicebot platform runs on modern Kubernetes clusters and leverages the latest NVIDIA GPUs.
Rewriting the IVR Application logic to migrate from a tree-based IVR menu to a conversational Voice Assistant is a journey. It would require investments and allocation of resources. Hence a good first step is to simply replace the underlying Nuance ASR (and possibly the IVR platform too). This will guarantee that a company can migrate to a modern Gen-AI Voice Assistant on its timelines.
Voicegain offers a modern highly accurate deep-learning-based Speech-to-text engine trained on hundreds of thousands of hours of telephone conversations. It is integrated into our native modern telephony stack. It can also talk over the MRCP protocol with VoiceXML based IVR platforms and it supports the traditional Speech grammars (SRGS, JJSGF). Voicegain also supports a range of built-in grammars (like Zipcode, Dates etc).
As a result, it is a simple "drop-in" replacement to the Nuance Recognizer. There is no need to rewrite the current IVR application. Instead of pointing to the IP address of the Nuance Server, the VoiceXML platform just needs to be reconfigured to point to the IP address of the Voicegain ASR server. This should take no more than a couple of minutes.
In addition to the Voicegain ASR/STT engine, we also offer a Telephony Bot API. This is a callback style API that includes our native IVR platform and ASR/STT engine can be used to build Gen AI powered Voicebots. It integrates with leading LLMs - both cloud and open-source premise based - to drive a natural language conversation with the callers.
If you would like to discuss your IVR migration journey, please email us at sales@voicegain.ai . At Voicegain, we have decades of experience in designing, building and launching conversational IVRs and Voice Assistants.
Here is also a link to more information. Please feel free to schedule a call directly with one of our Co-founders.
Nuance just announced that the Nuance Recognizer which is a MRCP grammar-based ASR will reach EOL in May 2027. This decision affects a significant number of on-premise speech-enabled IVR systems that rely on the Nuance Recognizer, creating uncertainty for many businesses.
If you are impacted by this decision, this post outlines an immediate fix while preparing companies for an AI future.
The decision appears to be driven by two primary factors:
Nuance provides two upgrade options, but neither is fully compatible with existing IVRs:
The EOL announcement introduces two major hurdles for businesses:
If your business relies on Nuance’s MRCP-based ASR (as of November 2024), now is the time to plan for a replacement. Below, we outline a solution that allows you to continue using your existing IVR without major disruptions.
Voicegain offers a seamless alternative to Nuance's grammar-based MRCP ASR. Our platform:
This allows you to maintain your current IVR workflow until you're ready to upgrade on your terms.
Over the next few years, many businesses will transition to generative AI-powered phone agents to improve caller experiences and increase automation rates. While this is a promising future, businesses shouldn’t feel forced to move to the cloud just to access these capabilities.
Voicegain’s deep-learning-based large-vocabulary STT engine is designed to evolve with your needs:
To discuss your upgrade options, email us at sales@voicegain.ai. If you'd like to test our solution, sign up for a free developer account (no credit card required) and get 1,500 free hours of usage. Visit the link in the instructions, and once signed up, contact support@voicegain.ai to request MRCP access.
Start future-proofing your IVR system today with Voicegain.
This articles provides an overview of Voicegain SIP Media Stream Back-to-Back User Agent (B2BUA), a Contact Center Platform agnostic solution that forks real-time SIP media streams from premise-based contact centers to Voicegain Speech-to-Text for real-time transcription. In SIP telephony, B2BUA stands for Back-to-Back User Agent and is a specific network element that can both terminate and originate media streams. This is explained further in this post.
The Voicegain SIP Media B2BUA is a containerized solution that is deployed in the same network as the Contact Center platform. Once configured, Developers or enterprise customers can get real-time access to speaker-separated transcripts (over a Websocket connection).
The Voicegain SIP Media B2BUA is a solution for enterprises and SaaS ISVs looking to extend their real-time LLM-powered Voice AI application to premise-based contact centers. Examples of such on-premise contact center platforms include Avaya, Genesys or Cisco. The Generative Voice AI applications supported include Real-Time Agent Assist or Voice AI Co-Pilots, Real-time sentiment analysis, voice biometrics and other types of real-time speech analytics apps.
Most premise-based Contact Center platforms - whether it is from Avaya, Genesys, or Cisco - do not provide programmatic access to real-time media streams. While these systems are reliable for call routing and management, they were not designed for modern LLM-powered AI applications.
Traditionally forking of media streams is supported by a Session Border Controller (SBC), a separate network element that sits "in front" of the Call Center platform. These SBCs rely on a protocol called SIPREC to fork these media streams. However SIPREC is primarily intended for network-based compliant call recording and commercial compliant recording vendors like NICE or Verint leverage the SIPREC protocol to access real-time media streams from premise-based contact center platforms.
However there are many pain points :
1) Only large enterprises have implemented Session Border Controllers.
2) Even if an enterprise has an SBC, the forking media capacity is used up by the call recording solution. Adding an additional streaming option for generative AI requires upgrades to hardware and software licensing on the SBC.
Voicegain offers a highly scalable, reliable and fully contained SIP Media Stream B2BUA to address the challenge discussed above. This B2BUA is a containerized network element that is deployed in the same network as the premise-based contact center. From a SIP Protocol standpoint, this Media Stream Back-to-Back User Agent (B2BUA) acts as a transparent media relay while forking SIP RTP media streams to real-time Speech-to-Text.
What is a B2BUA? Unlike a simple SIP proxy that only handles signaling, a Back-to-Back User Agent terminates and re-originates both signaling and media streams, allowing it to manipulate call flows while maintaining access to the audio content.
The Voicegain SIP Media Stream B2BUA is:
The diagram above show cases the call flow to fork media streams for real-time transcription.
At a high-level, most SIP-based Contact Center ACDs like Avaya Communication Manager, Genesys Engage and Cisco UCCE support creation of SIP Trunks. The Voicegain SIP Media Stream B2BUA is a SIP Server/SIP Peer that connects to the premise-based ACD over a dedicated SIP trunk. It can receive calls from and make calls to the premise-based ACD.
The overall call flow has the following steps:
To summarize, what is needed is to configure a SIP Trunk on the current Contact Center ACD and use that to transfer calls to Voicegain Media Stream B2BUA (which is the equivalent of initiating a SIP INVITE). The Medial Stream SIP B2BUA in turn bridges the DID of the Contact Center Queue or IVR (which is a SIP INVITE to the destination SIP URI) over the same SIP Trunk and then forks the two RTP streams (caller & Agent) to Voicegain STT.
The Voicegain SIP Media Stream B2BUA has been deployed in production application for a leading health care customer with an On-Premise Avaya Contact Center.
To deploy the Voicegain SIP Media Stream Proxy, you'll need:
If you have an on-premise contact center and you would like to discuss getting access to the real-time media stream, please contact us at support@voicegain.ai
This article outlines the evaluation criteria involved in selecting a real-time Speech-to-Text or ASR for LLM-powered AI Copilots and Real-time agent assist applications in the contact center. This article is intended for Product Managers and Engineering leads in Contact Center AI SaaS companies and CIO/CDO organizations in enterprises that are looking to build such AI co-pilots.
A very popular use-case for Generative AI & LLMs is the AI Co-pilot or Realtime Agent Assist in contact centers. By transcribing an agent-customer conversation in real-time and feeding the transcript to modern LLMs like Open AI's GPT, Facebook's LLAMA2 or Google's Gemini, contact centers can guide their agents to handle their calls more effectively and efficiently.
An AI Co-pilot can deliver great business benefits. It can improve CSAT and NPS as the AI can quickly search and present relevant knowledge-base to the agent, enabling them to be more knowledgeable and productive. It can also save Agent FTE costs by reducing AHT and eliminating wrap time.
In addition by building a library of "gold-standard" calls across various key call types, LLM can also deliver personalized coaching to agents in an automated way using Generative AI.Companies are finding that while Gen AI-powered Co-Pilots are especially beneficial to new hires, they also deliver benefits to agents with tenure too.
Building an AI-powered Co-Pilot requires three main components - a) A real-time ASR/Speech-to-Text engine for transcription 2) An LLM to understand the transcript and 3) Agent and Supervisor/Manager facing web applications. The focus of this blog post is on the first component - the real-time ASR/Speech-to-Text engine.
Now here are the four key factors that you should look at while evaluating the real-time ASR/Speech-to-Text engine.
The first step for any AI Co-Pilot is to stream the agent and customer real-time media to an ASR that supports streaming Speech-to-Text. This is easily the most involved engineering design decision in this process.
There are two main approaches - 1) Streaming audio from the server-side. In an enterprise contact center, that would mean forking the media from either an enterprise Session Border Controller or the Contact Center Platform (which is the IP-PBX). 2) Streaming audio from the client side - i.e from the Agent Desktop. An Agent desktop can be a OS based thick client or a browser-based thin client - this depends on the actual CCaaS/Contact-Center platform being used.
Selecting the method of integration is an involved decision. While there are advantages and disadvantages to both approaches, server-side approaches have been the preferred option. This is because you would avoid the need to install client software and plan for compute resources at the agent desktop level.
However if you have an on-premise contact center like an Avaya, Cisco or Genesys, the integration can become more involved. This is because each platform has its own mechanism to fork these media streams and you also need to install the ASR/STT behind the corporate firewall (or open it up to access a Cloud-based ASR/STT).
Net-net, there is a case to be made for client-side streaming too - because not all companies may have the expertise available within the company.
There are modern CCaaS platforms like Amazon Connect, Twilio Flex, Genesys Cloud and Five9 that offer APIs/programmable access to the media streams. You are in luck if you have one of these platforms. Also if the PSTN access is through a programmable CPaaS platform - like Twilio, Signalwire, Telnyx etc, then it is quite a
Once you finalize a method to fork the audio, you would need to consider the standard protocols supported by the ASR/Speech-to-text engine. Ideally, the ASR/STT engine should be flexible and support multiple options. One of the most common approaches today to stream audio over websockets. It is important to confirm that the ASR/Speech-to-Text vendor supports two-channel/stereo audio submission over websockets. There are other approaches - sharing audio over gRPC and over raw RTP.
The next big consideration is the latency of real-time ASR/Speech-to-Text model - which in turn depends on the underlying neural network architecture of the model. In order to provide timely recommendations to the Agent, it is important to target ASRs that can deliver word-by-word transcript in less than one second and ideally in about 500 milliseconds. This is because there is additional latency associated with collecting and submitting the transcript to LLMs and then delivering the insights onto the Agent Desktop.
Last but not the least, it is really important that the price for real-time transcription is affordable in order to build a strong business case for the AI Co-Pilot. It is important to confirm that the agent and caller channel are not priced independently as that very often kills the business case.
If you are building an LLM-powered AI Co-pilot and would like to engage in a deeper discussion, please give us a shout! You can reach us at sales@voicegain.ai.
This blog post is intended for anyone responsible for upgrading/migrating an MRCP-based Nuance ASR nearing EOL (End of Life). They can explore how Voicegain ASR simplifies and economically extends the life of existing speech-IVR platforms. It serves as a 'drop-in' replacement for grammar-based Nuance ASR.
There are several hundred (if not thousands) telephony-based speech-enabled IVRs that act as the 'front-door' for all customer service phone calls for enterprises of all sizes. These speech-enabled IVRs are built on platforms like Genesys Voice Portal (GVP), Genesys Engage, Avaya Aura Experience Portal(AAEP)/Avaya Voice Portal , Cisco Voice Portal (CVP), Aspect or Voxeo ProphecyVoiceXML platform and several other such VoiceXML based IVR solutions. The systems predominantly use Nuance ASR as the speech recognition engine.
Unlike contemporary large vocabulary neural-network-based ASR/STT engines, the traditional Nuance ASR is a grammar-based ASR. It uses the MRCP protocol to talk to VoiceXML based IVR platforms. Most of these systems were purchased in the last two decades (2000s and 2010s). Customers typically paid a port-based perpetual license fee (the IVR platforms were also licensed similarly). Most enterprises have a software maintenance/AMC contracts for the Nuance ASR and this is usually bundled along with the IVR platform. The Nuance Recognizer versions in the market vary between 9.0 and 11.0. As of June 2022, Nuance had announced end of support for Nuance 10.0. It is our understanding in speaking with customers that the last version of Nuance sold – Nuance 11.0 Recognizer will approach either end-of-life or end-of-Orderability sometime in 2025*.
Also in speaking with customers, we have understood that customers who currently license the MRCP grammar-based Nuance ASR would have to upgrade to Nuance’s Krypton engine, the new deep-learning based ASR in 2025. Nuance Krypton can only be accessed using the modern gRPC based API and not over MRCP, which makes this upgrade expensive and time-consuming. Because of this, Customers would need to upgrade not just their the ASR but also the entire IVR platform. This is because most legacy IVR platforms - especially would do not support gRPC. This might also entail migrating the existing call flow logic –which is likely written in a VoiceXML app studio or written in a build tool and generated as VoiceXML pages – would also need to be ported.
All of the above steps makes the upgrade process very challenging. While there is a strong case to be made for the merits of upgrading to a deep-learning based ASR to support conversational interactions (better automation rates and more natural user-experience), it is critical for customers that this upgrade/migration is done on the customer’s timelines and not under the gun on the vendor’s clock.
Voicegain offers a drop-in replacement for the Nuance grammar-based ASR. We are the only modern deep-learning/AI (neural-network-based)ASR in the market that natively supports both traditional speech grammars (grxml, SRGS) and large-vocabulary conversational interactions. We are also one of the very few ASR vendors that can be accessed both over a traditional telephony-based protocol like MRCP and a modern web-based method like web-sockets (or gRPC). So the same neural-network model supports both the old and the new protocols. This allows you a future-proof method of replacing Nuance ASR with minimal effort while safeguarding this investment for the long term.
Net-net, by just "pointing" the ASR resource on the VoiceXML platform to the IP-address of the Voicegain MRCP ASR in your network, you can replace the entire Nuance ASR with the Voicegain ASR. Customers would not need to even change or modify a single line of code of the speech-IVR application logic.
In other words, a client can retain the existing telephony/IVR setup and just perform a "drop-in replacement" of Nuance MRCP ASR with Voicegain MRCP ASR.
Longer-term the same Voicegain ASR can perform large vocabulary transcription because it is a neural-network based ASR; so when the customer is ready to replace the directed-dialog Speech IVR with a conversational interaction, the Voicegain platform will already support it.
To discuss your upgrade situation in more detail, please contact us over email at sales@voicegain.ai.We can answer any questions that you have. You could also get started with a free developer account by following these instructions. There is no credit card required and we offer 1500 hours of usage for free. Here is a link to the instructions; after you sign up, please contact us at support@voicegain.aiand request MRCP access.
* Nuance ASR and Nuance Krypton are trademarks of Nuance, Inc which is now part of Microsoft. Please confirm the End of Life announcement and the protocol capability directly with the company. Our information in this blog post is anecdotal and has not been verified with Nuance.
Donec sagittis sagittis ex, nec consequat sapien fermentum ut. Sed eget varius mauris. Etiam sed mi erat. Duis at porta metus, ac luctus neque.
Read more →Donec sagittis sagittis ex, nec consequat sapien fermentum ut. Sed eget varius mauris. Etiam sed mi erat. Duis at porta metus, ac luctus neque.
Read more →Donec sagittis sagittis ex, nec consequat sapien fermentum ut. Sed eget varius mauris. Etiam sed mi erat. Duis at porta metus, ac luctus neque.
Read more →Donec sagittis sagittis ex, nec consequat sapien fermentum ut. Sed eget varius mauris. Etiam sed mi erat. Duis at porta metus, ac luctus neque.
Read more →Donec sagittis sagittis ex, nec consequat sapien fermentum ut. Sed eget varius mauris. Etiam sed mi erat. Duis at porta metus, ac luctus neque.
Read more →Donec sagittis sagittis ex, nec consequat sapien fermentum ut. Sed eget varius mauris. Etiam sed mi erat. Duis at porta metus, ac luctus neque.
Read more →Interested in customizing the ASR or deploying Voicegain on your infrastructure?