Our Blog

News, Insights, sample code & more!

Contact Center
Voicegain Acquires TrampolineAI to deliver End-to-End Contact Center AI for Healthcare Payers


New unified platform combines AI voice agent automation with Real-time agent assistance and Auto QA, enabling healthcare payers to reduce average handle time (AHT) and improve first contact resolution (FCR) in their call centers.

IRVING, Texas and SAN FRANCISCO, Jan. 7, 2026 /PRNewswire-PRWeb/ -- Voicegain, a leader in AI Voice Agents and Infrastructure, today announced the acquisition of TrampolineAI, a venture-backed healthcare payer-focused Contact Center AI company whose products supports thousands of member interactions. The acquisition unifies Voicegain's AI Voice Agent automation with Trampoline's real-time agent assistance and Auto QA capabilities, enabling healthcare payers to optimize their entire contact center operation—from fully automated interactions to AI-enhanced human agent support.

Healthcare payer contact centers face mounting pressure to reduce costs while improving member experience. The reasons vary from CMS pressure, Medicaid redeterminations, Medicare AEP volume and staffing shortages. The challenge lies in balancing automation for routine inquiries with personalized support for complex interactions. The combined Voicegain and TrampolineAI platform addresses this challenge by providing a comprehensive solution that spans the full spectrum of contact center needs—automating high-volume routine calls while empowering human agents with real-time intelligence for interactions that require specialized attention.

"We're seeing strong demand from healthcare payers for a production-ready Voice AI platform. TrampolineAI brings deep payer contact center expertise and deployments at scale, accelerating our mission at Voicegain." — Arun Santhebennur

Over the past two years, Voicegain has scaled Casey, an AI Voice Agent purpose-built for health plans, TPAs, utilization management, and other healthcare payer businesses. Casey answers and triages member and provider calls in health insurance payer call centers. After performing HIPAA validation, Casey automates routine caller intents related to claims, eligibility, coverage/benefits, and prior authorization. For calls requiring live assistance, Casey transfers the interaction context via screen pop to human agents.

TrampolineAI has developed a payer-focused Generative AI suite of contact center products—Assist, Analyze, and Auto QA—designed to enhance human agent efficiency and effectiveness. The platform analyzes conversations between members and agents in real-time, leveraging real-time transcription and Gen AI models. It provides real-time answers by scanning plan documents such as Summary of Benefits and Coverage (SBCs) and Summary Plan Descriptions (SPDs), fills agent checklists automatically, and generates payer-optimized interaction summaries. Since its founding, TrampolineAI has established deployments with leading TPAs and health plans, processing hundreds of thousands of member interactions.

"Our mission at Voicegain is to enable businesses to deploy private, mission-critical Voice AI at scale," said Arun Santhebennur, Co-founder and CEO of Voicegain. "As we enter 2026, we are seeing strong demand from healthcare payers for a comprehensive, production-ready Voice AI platform. The TrampolineAI team brings deep expertise in healthcare payer operations and contact center technology, and their solutions are already deployed at scale across multiple payer environments."

Through this acquisition, Voicegain expands the Casey platform with purpose-built capabilities for payer contact centers, including AI-assisted agent workflows, real-time sentiment analysis, and automated quality monitoring. TrampolineAI customers gain access to Voicegain's AI Voice Agents, enterprise-grade Voice AI infrastructure including real-time and batch transcription, and large-scale deployment capabilities, while continuing to receive uninterrupted service.

"We founded TrampolineAI to address the significant administrative cost challenges healthcare payers face by deploying Generative Voice AI in production environments at scale," said Mike Bourke, Founder and CEO of TrampolineAI. "Joining Voicegain allows us to accelerate that mission with their enterprise-grade infrastructure, engineering capabilities, and established customer base in the healthcare payer market. Together, we can deliver a truly comprehensive solution that serves the full range of contact center needs."

A TPA deploying TrampolineAI noted the platform's immediate impact, stating that the data and insights surfaced by the application were fantastic, allowing the organization to see trends and issues immediately across all incoming calls.

The combined platform positions Voicegain to deliver a complete contact center solution spanning IVA call automation, real-time transcription and agent assist, Medicare and Medicaid compliant automated QA, and next-generation analytics with native LLM analysis capabilities. Integration work is already in progress, and customers will begin seeing benefits of the combined platform in Q1 2026.

Following the acquisition, TrampolineAI founding team members Mike Bourke and Jason Fama have joined Voicegain's Advisory Board, where they will provide strategic guidance on product development and AI innovation for healthcare payer applications.

The terms of the acquisition were not disclosed.

About Voicegain

Voicegain offers healthcare payer-focused AI Voice Agents and a private Voice AI platform that enables enterprises to build, deploy, and scale voice-driven applications. Voicegain Casey is designed specifically for healthcare payers, supporting automated and assisted customer service interactions with enterprise-grade security, scalability, and compliance. For more information, visit voicegain.ai.

About TrampolineAI

TrampolineAI was a venture-backed voice AI company focused on healthcare payer solutions. The company applies Generative Voice AI to contact centers to improve operational efficiency, member experience, and compliance through real-time agent assist, sentiment analysis, and automated quality assurance technologies. For more information, visit trampolineai.com.

Media Contact:

Arun Santhebennur

Co-founder & CEO, Voicegain

press@voicegain.ai

Media Contact

Arun Santhebennur, Voicegain, 1 9725180863 701, arun@voicegain.ai, https://www.voicegain.ai

SOURCE Voicegain

Read more → 
Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.
Modernize your VoiceXML IVR into Conversational Voice Bots
Voice Bot
Modernize your VoiceXML IVR into Conversational Voice Bots

The urgent need for IVR platform modernization

Most enterprise IT organizations have mature telephony based IVR applications that serve as the “front door” for all voice based customer support calls. These applications use a combination of touchtone (DTMF) and speech to interact with callers. They have been carefully designed, developed and tuned over the years.


The objectives of any IVR are two fold 1) Automate simple routine queries (like balance inquiry, payment status, etc) and 2) Authenticate and intelligently route calls that require live support to the appropriate agent.


IT organizations across industry verticals like financial services, travel, media, telecom, retail or health-care have a small staff of in-house or outsourced IVR developers to maintain these applications. While enterprises have been focused on scaling and upgrading their digital support channels (like chat and email),  IVR applications have largely remained un-touched for years.


As CIOs and CDOs (Chief Digital Officers) embark on strategic initiatives to migrate enterprise workloads to the Cloud, one "niche" workload on this list is the IVR. However migrating IVRs "as-is" to the cloud is tricky. The languages, protocols and platforms that these telephony based IVRs were built on is from the early 2000s and are approaching obsolescence. Also while they support directed dialogs with limited customer spoken utterances, they are not a good fit for conversational bot interactions.


So IT organizations are faced with a Catch 22 situation. On one-hand, it is cumbersome to maintain these IVR workloads. On the other hand, the rationale to migrate existing platforms "as-is" to modern cloud infrastructure is questionable. Why bear the trouble and expense if IVRs are eventually are going to be replaced by conversational bots?


So there is a real need to modernize these IVRs as part of their cloud migration strategy.


A brief look at the underlying infrastructure of these IVR applications

Traditionally speech IVR applications ran on on-premise Contact Center telephony platforms. Companies like Avaya, Nortel, Cisco, Intervoice, Genesys and Aspect dominated the vendor landscape. In the early to mid-2000s, these vendors worked collaboratively as part of the W3C consortium to develop VoiceXML, an open vendor agnostic language for speech-enabled IVR applications.


VoiceXML enabled developers to build interactive voice dialogs and provided a standard way to interact with an automatic speech recognizer (ASR). This was done using a telephony based protocol called MRCP. The standard also provided a method to define speech grammars called SRGS and a format called GRXML.


The architecture and supporting jargon/terminology around VoiceXML borrowed heavily from the web world. The VoiceXML platform was referred to as a “Voice browser” that could “render VoiceXML pages” just like how a web browser could render HTML pages. Most contact center platforms provided visual IDEs to help   build and maintain these interactive call flows. Some also automated the generation of the VoiceXML pages. The IDE generated code that could run on  application server (like Apache Tomcat) which in turn generated VoiceXML pages that were sent to a VoiceXML platform over standard HTTP. The application server was also responsible for making web-services requests to enterprise database resources that were required for the IVR interaction; for e.g. billing/payment systems or CRM systems.


Also most ASRs from the late 90s and early 2000s were based on Hidden Markov Models and Gaussian Mixture models. They mainly supported grammar-based recognition - which meant that as a Speech IVR developer you had to anticipate all possible utterances that a user could say in response to a question/prompt. There were some options to build open-ended statistical language models but these were tricky and required careful selection of the training corpus.

Why modernize now?

While VoiceXML worked well in the past, it is a niche and outdated language. The last release of VoiceXML 2.1 was back in 2007!! That is more than a decade ago.

And a lot has changed in the web world since then. VoiceXML was developed at a time when JSP (Java Server Pages) was widely used. So it was before JSON, YAML, RESTful APIs & AJAX.


For enterprises, it is expensive to maintain a dedicated staff - whether in-house or outsourced - with niche skills in technologies like VoiceXML and MRCP.


Enterprises should ideally be able to run IVR app like any other modern web application. Most enterprise web apps are built on programming languages like Python, Node.JS that are popular with web developers. They are containerized using docker and orchestrated using Kubernetes.


It would be ideal for an enterprise IT organization for its IVR app to be built on similar programming languages so that it can be supported or maintained just like other applications in the IT portfolio.


In addition to the obsolescence of VoiceXML, the speech recognition engine (ASR) that was deployed in the early 2000s has also become outdated. Modern speech-to-text engines are built on Deep Neural Networks that run on powerful GPU infrastructure. They offer amazing accuracy and allow the use of a very large vocabulary - which is what is needed for bot like conversational experience. Also modern NLU engines allow you to easily extract intents from the transcribed text.


So if an enterprise wants to offer a voice bot that supports an open conversational experience, they need to move to a modern DNN based Speech-to-Text platform that can integrate with such NLU engines.


Our recipe for IVR App modernization



At Voicegain, we recommend that an enterprise first modernize the underlying infrastructure while retaining the existing IVR application logic. This is a great first step. It allows an enterprise to continue serving existing users while taking a step towards providing a more conversational user experience.

How can an enterprise modernize its legacy IVR app?

We suggest that the existing call flow logic - which is typically maintained using visual IDEs of contact center platforms - get rewritten (ideally with the help of automated tools) into a modern programming language like Python or Node.Js.

Instead of generating legacy VoiceXML pages, enterprises should use web friendly data representation languages like JSON or YAML to interact with modern RESTful Speech-to-Text APIs using web callbacks.

How Voicegain supports IVR App modernization?

At Voicegain, we provide a modern Voice AI platform that includes

  1. A modern DNN based speech recognizer accessible using RESTful APIs
  2. Ability to interface directly with telephone calls delivered over SIP/RTP
  3. JSON style callback APIs to replace functionality of a VoiceXML
  4. Ability to deploy on your VPC/Private Cloud or use as a cloud service
  5. Fully feature compatible with legacy standards (support SRGS grammars,  universals)
  6. Training of the underlying acoustic model & language models to get high recognition accuracy

Voicegain is developing tools to automatically convert VoiceXML to equivalent JSON/YAML representation that talks to our callback APIs.


How is this a "future proof" architecture for an enterprise?

The Voicegain platform is capable of large vocabulary transcription which is a requirement for NLU based Voice Bots. This will be the way customers interact with enterprises in the future.


We allow developers to  switch between grammar based recognition and large vocabulary recognition at each and every turn of the dialog; or you could simultaneously use both to achieve more flexibility.


Our Telephony Bot APIs can also integrate with Bot Frameworks like Google Dialog Flow, .


We are inviting enterprise web developers for a free trial of our platform.






Read more → 
Why Voice AI is critical for enterprises in a post Covid world
Enterprise
Why Voice AI is critical for enterprises in a post Covid world

Digital Transformation efforts in most enterprises have only gained pace as a result of the pandemic. The maxim going around in corporate circles in 2020 (and very likely to continue in 2021) is that the coronavirus was the real Chief Digital Officer (CDO) for most enterprises!! CIOs, CTOs and the CDOs today have stronger and bolder mandates to fundamentally alter the economics of their businesses.

They are increasingly being asked by their CEOs to make big bets and take on initiatives that can "materially" transform the underlying economics of their businesses.

A significant area of focus for digital enterprises is what is being referred to as "Practical AI". How businesses use AI and ML in a practical yet fundamental manner to transform themselves? Enterprises in different industries - financial services, travel, telecommunications, media and retail - are realizing that investing in strong AI & ML capabilities in their teams is critical to their post-pandemic digital future. In many Fortune 1000 companies, businesses are 'insourcing' and aggressively hiring AI & ML teams even as they outsource maintenance of legacy back-end systems to gain competitive advantage.

And one of the most practical AI applications in the enterprise is Voice AI - which refers to the use of AI & ML on voice conversations within the enterprise.

Why Voice will remain significant & relevant for the Enterprise?

Despite the proliferation of digital channels like chat/text messaging, email and social, higher value sales conversations, meetings, and involved customer service discussions are conducted pre-dominantly over voice. Speaking is not just more efficient than typing, it is also more engaging!! The human touch with voice is something that we as humans will always value. Voice is here to stay and its enduring significance is as immutable as the laws of gravity!

So what is changing in the world of Voice? It is just that the underlying plumbing is transforming - voice conversations traditionally took place over legacy telephony networks. They are quickly moving to meeting platforms like Zoom, Microsoft Teams and Webex; so a voice only conversation is being replaced by a richer voice & video conversation conducted over the internet.

The barriers historically associated with voice - costs and complexity of voice infrastructure- have been eliminated with technologies like WebRTC, 4G/5G and cloud computing. For consumers, the cost of making a voice call is now zero - it is the cost of their WiFi or 4G/5G bandwidth (as consumers use free mobile apps like Facetime, Skype and WhatsApp).

What is Voice AI? And why is it exciting?

Voice AI is highly accurate Speech-to-Text and NLU that is built on highly specialized and customizable (trainable) Deep Neural Networks running on GPUs.

What is unique about Deep Neural Networks is that the underlying Speech-to-Text and NLU models can be trained - easily and affordably - on enterprise specific datasets. You can leverage enterprise's lexicon and corpus - both voice & text. So instead of a 'one-size-fits-all approach', each enterprise can have its own Voice AI infrastructure - that is trained on its product names, industry jargon, employee & customer names, unique accents etc. Once it is trained, there are two big applications - 1) Voice AI for Automation and using 2) Voice AI for Analytics.

Voice AI for Automation

Enterprises can build Voice bots to intelligently respond to contact requests from their prospects and customers anytime anywhere. Voice Bots may also be used to respond to internal employees queries in a service/help desk context. The automation use-case is one that has really accelerated during the pandemic. Bots can help businesses deal with massive disruption caused by everyone - in sales, customer success and service - working from home during the pandemic. McKinsey has written about automation using AI.

Voice AI for Analytics

Voice AI also makes it possible for businesses to transcribe 100% of their voice conversations and subsequently mine the text for sentiment and analytics/insights.

With Voice AI, businesses can ensure that its frontline sales staff is able to pitch its core value proposition, benefits, product and service features in a consistent and compelling manner. This can be a massive boost to sales teams as they can improve conversion ratios and accurately forecast pipeline with Voice AI.

Voice AI can also ensure that customer success and service personnel are provided with tailored/customized insights to improve not just their efficiency (metrics like AHT in contact center) and but also enhance effectiveness measures like CSAT and NPS scores.

At Voicegain, we are passionate about helping enterprises, small and mid-size businesses, entrepreneurs and startup companies with their Voice AI efforts. Our mission is to build the world's most open developer friendly Voice AI platform. Be a part of our mission by signing up here. You can transcribe your calls/meetings, try out our APIs, building amazing telephony bots and more !

About the Author:

Arun Santhebennur is the Co-founder & CEO of Voicegain. To have a more in-depth conversation, please connect with Arun on LinkedIn or send us an email.

Read more → 
Selecting a Speech-to-Text API for your SaaS app is not a slam dunk!
Developers
Selecting a Speech-to-Text API for your SaaS app is not a slam dunk!

Developers building voice-enabled SaaS applications that embed Speech-to-Text or Transcription as part of their product have multiple vendors to choose from.

However, the decision to pick the right Speech-to-Text platform or API is rather involved. This writeup outlines three types of vendors and the three key criteria (summarized as the 3 As - Accuracy, Affordability and Accessibility)  to consider while making that choice.

Most voice-enabled SaaS apps that incorporate Speech-to-Text APIs broadly fall into two categories 1) Analytics and 2) Automation.

Whether you are developing an analytics app or an automation app, developers have the following vendor choices.

The Vendor Landscape

There are 3 distinct types of vendors

  1. Big Three Cloud Providers
  2. Large Enterprise ASR Platforms
  3. Pure-play Speech-to-Text/Voice AI Startups

1. Big Three Cloud Providers



The first set of choices for most developers are Speech-to-Text APIs from the big cloud companies - Google, Amazon and Microsoft. These big players offer Speech-to-Text APIs as part of their portfolio of Cloud AI & ML services. The strategy for the Big Cloud providers is to sell their entire stack - from cloud infrastructure to APIs and even products.

However the Cloud service providers may compete directly with the developers they look to serve. E.g. Amazon Connect directly competes with Contact Center platforms that are hosted on AWS. Google Dialogflow directly competes with other NLU startups that may be looking to build and offer Voice bots and Voice Assistants to enterprises.  

2. Large Enterprise ASR Platforms

Other than the big 3, Nuance and IBM Watson are large companies that have a rich history of providing Automated Speech Recognition (ASR). Of the two, Nuance is better known and has been a dominant player both in the enterprise call center market with its Nuance ASR engine and in the medical transcription space with its Dragon offering. IBM has a long history of fundamental speech recognition and IBM Watson Speech-to-Text is their developer oriented offering.

3. Pure-play Speech-to-Text/Voice AI Startups

Voicegain.ai, our company, plays alongside other startup companies like Deepgram  that target SaaS developers with their best-of-breed DNN based speech-to-text. Since these startups are specialized providers, they are focused on beating the big cloud providers and legacy players with respect to price, performance and ease of use.

Key Criteria - Accuracy, Affordability, Accessibility

The key criteria while picking an ASR or Speech-to-Text platform are the 3 As - Accuracy, Affordability and Accessibility.

1. Accuracy - Establish Target & Baseline accuracy

The first and most important criteria for any Speech-to-Text platform is recognition accuracy. However accuracy is a tricky metric to assess and measure. There is no 'one-size-fits-all' approach to accuracy. We have shared our thoughts & benchmarks here. While Voicegain matches or exceeds the "out-of-the-box" transcription accuracy of most of the larger players, we suggest that you do additional diligence before making a choice. The audio datasets used in these benchmarks may or may not be similar to the use case or context for which the developer intends to use the API.

While accuracy is usually measured using Word Error Rate (WER), it is important to note that this metric too has limitations. For a SaaS app, getting some important and critical words right may be even more important than just a low overall WER.

That being said, it is important for developers to establish and calculate a quick baseline "out-of-the-box" accuracy for their application with their audio datasets.

At Voicegain, we have open sourced tools to benchmark our performance against the very best in business. We strongly recommend that developers & ML Engineers calculate a benchmark baseline accuracy for their vendor choices using a statistically significant volume of audio datasets for their application.

From a developer perspective, a baseline accuracy measure will provide insights into the how closely your datasets match the datasets that the underlying STT models from the vendors have been trained on.

Here are a set of important factors that may affect your "out-of-the-box" accuracy:

  1. Length of audio: Does your application involve audio data that is comprised or short words/phrases or full sentences? Bots involve use of short words and phrases while analytics apps involve transcription of long sentences
  2. Industry jargon: Does your audio data have industry specific jargon and terms that are not part of normal vocabulary?
  3. Audio quality - 8kHz or 16 kHz: Is the source of your audio data - telephony - that is sampled at 8 kHz or is it 16 kHz data captured in a meeting platform like Zoom or Webex? Does the vendor have models that are tuned to 8 kHz and 16 kHz?
  4. Separate channels: If there are multiple speakers, are you able to provide separate channels for each speaker to the Speech-to-Text engine? Accuracy could be higher if you are able to.
  5. Background noise: Does your audio have a lot of background noise - e.g say news playing in the background or cross talk in a call center context. If so, how "sensitive" is the Speech-to-Text engine to such background noise
  6. Accents: Does your application support speakers with different accents?

Developers also need to establish a "Target" accuracy that their SaaS application or product requires. Usually Product Managers determine this based on their needs.

It is possible to bridge the gap between the Target Accuracy and the Baseline "out-of-the-box" accuracy. While it is outside the scope of this post, here is an overview of some ways in which developers can improve upon the Baseline accuracy.

  1. Acoustic Model Training. Voicegain allows developers to train the underlying acoustic model. This is the best way to address issues related to accents and background noise. Here is a link to some results we have demonstrated with model training. Of the larger players, currently only Microsoft and IBM allow for acoustic model customization whereas most players only allow customizing the language model (described below)
  2. Language Model: Customizing the language model is usually the fastest and easiest way to boost the accuracy of the Speech-to-Text engine - especially for things like product names. This is accomplished in a few different ways. Some platforms allow developers to pass hints along with a recognition request while others allow you to load an entire corpus as a domain specific language model. Voicegain allows both options.
  3. Speech Grammars: We have written extensively about Speech Grammars here, here and here and how they really simplify development of Voice Bots and Assistants. They boost accuracy of specific entities like zipcode, addresses, dates etc. They also improve recognition of short phrases like "card", "cash", etc. usually with better end results than using hints mentioned above. While Speech Grammars were commonly used for building telephony based Speech-enabled IVRs in the past (which were based on Speech-to-Text platforms based on HMMs and GMMs), most modern back-end developers are not familiar with the use of Grammars.

However not all Speech-to-Text platforms support one or more of these options.

At Voicegain.ai, we support all the above options. Picking the right approach  involves a more in-depth technical conversation. We invite you to get in touch with us.

To summarize, the choice may not be as simple as picking the one with the best "out-of-the-box' accuracy. It could in fact be a platform that provides the most convenient and least expensive path to bridge the gap between Target and Baseline accuracy.

2. Affordability

The second most important factor after accuracy is price. Most SaaS products are very disruptively priced. It is not uncommon for the SaaS product to be sold at 'tens of dollars' ($35-100) per user per month. It is critical that Speech-to-Text APIs make up as small a fraction of the SaaS price as possible. The price directly impacts the "gross-margin" of the SaaS application, a critical financial metric/KPI that SaaS companies care dearly about.

In addition to the top-line usage based price for the platform, it is also important to understand what the minimum billable time and billing increment for each interaction. Many of the large Cloud providers have a very high minimum billable times - 12 or 18 seconds. This makes it very expensive for Voice Bots or Voice Assistant.

Another cost related aspect is the price for transcribing multi-channel audio, where only one speaker is active at the time. Does the platform charge for transcribing silence on the inactive channel ?

3. Accessibility - Ease/Simplicity of Integration

The last (but not the least!) important criterion is how accessible - or in other words how simple and easy is it to integrate the Speech-to-Text platform with the SaaS Application.

This ease of integration becomes even more important if the SaaS Application streams audio real-time to the Speech-to-Text platform. Another important criterion for real-time streaming is latency - which is the time to receive recognition results from the platform. For a Bot or Voice Assistant, it is important to get API latency down to 500 milliseconds or lower. Also, reliable and fast end-of-speech detection is crucial in those scenarios for natural dialog turn taking.


About Voicegain


At Voicegain, we support multiple options - ranging from TCP-based methods like gRPC and Websockets to telephony/UDP protocols like SIP/RTP, MRCP and SIPREC.

The choice made by the developer depends on the following factors:

  1. The actual backend programming language or web framework that the SaaS app is built on (i.e., the libraries that they support).
  2. Familiarity or past-experience in usage of certain protocols for developers
  3. For apps that are accessed over traditional telephony (PSTN), integration with modern telephony platforms becomes really important (CCaaS, On Premise Contact Center or CPaaS Platforms like Twilio & SignalWire). At Voicegain, we integrate with most prominent Cloud and Premise based contact center platforms. We also allow you to use our JSON Callback based APIs with any platform that supports SIP Invite.

In conclusion, selecting the right Speech-to-Text or ASR platform for a SaaS application is a diligent exercise; it is by no means a slam dunk!!

We really dig having a conversation with you about this. We are keen to learn about what you are building Connect with us on LinkedIn, give us a shout!! Or email us at info@voicegain.ai.
Take Voicegain for a test drive!

1. Click here for instructions to access our live demo site.

2. If you are building a cool voice app and you are looking to test our APIs, click hereto sign up for a developer account  and receive $50 in free credits

3. If you want to take Voicegain as your own AI Transcription Assistant to meetings, click here.


Read more → 
SIPREC support in Voicegain platform
Platform
SIPREC support in Voicegain platform

Voicegain Speech-to-Text and Speech Analytics platform supports SIPREC protocol as one of the ways an audio stream of a telephone call can be fed to the speech recognizer.

The Session Recording Protocol (SIPREC) is an open SIP-based protocol for call recording. The standard is defined by Internet Engineering Task Force. It is supported by many phone platforms and call recording system vendors.

The SIPREC standard defines a protocol used to interact between a Session Recording Client (the role generally performed by PBX system or Session Border Controller) and a Session Recording Server (a third party call recorder, in our case a Voicegain-provided SIPREC server). SIPREC opens two RTP streams (one for inbound and one for outbound audio of the call) to the Recording Server. SIPREC protocol also is able to transfer call metadata to the Recorder, this is important so that the recordings can be tied to the information about the calls.

Use Cases

SIPREC is usually used for call recording but the standard essentially provides a real-time audio stream from the telephone call which makes it suitable for applications which have to work real-time like, e.g., agent assist or agent monitoring. Using the SIPREC interface Voicegain can provide real-time transcript of the call as well as perform speech analytics tasks in real time, e.g., keyword and phrase detection, personally-identifiable information scrubbing, sentiment and mood estimation, named-entity recognition, and variety of metrics (like silence, overtalk, etc.).

Audio obtained via SIPREC can also be recorded and transcribed, analyzed, or retrieved at a later time.

Supported Clients

Voicegain SIPREC interface has been tested with the following platforms:

Voicegain can capture relevant call metadata in addition to obtaining the audio (the metadata capture functionality may differ in capabilities depending on the client platform).

Voicegain platform can be configured to automatically launch transcription and speech-analytics as soon as the new SIPREC session gets established.

The output from transcription and  speech analytics is available via a Web API. We also support websockets for more convenient streaming of the transcription and/or speech analytics data.  SIPREC support is available both in the Cloud and the Edge (OnPrem) deployments of the Voicegain Platform.

SIPREC is an Enterprise feature of the Voicegain platform and is not included in the base package. Please contact support@voicegain.ai or submit a Zendesk ticket for more information about SIPREC and if you would like to use it with your existing Voicegain account.

Notes about Genesys Platform

Genesys Voice Platform does not support SIPREC directly. However, it does support streaming of the inbound and outbound RTP media to two separate SIP endpoints - the end result being pretty much the same as if SIPREC was used. We are currently working on implementing support for this feature of the Genesys  Voice Platform for real-time audio streaming to Voicegain Platform. It should be available in Q1 2021.

Read more → 
Voice command applications made easier
Announcements
Voice command applications made easier

New continuous recognition option

In latest Voicegain release (1.16.0) we have added a new option to our /asr/recognize/async API for ASR/speech-to-text. It is called continuousRecognition and if enabled modifies the default behavior of the grammar-based recognition.


Normally when /asr/recognize/async API is used the recognizer will return when the grammar is matched and the complete timeout expires. That means that it is only possible to get a single recognition in one /asr/recognize/async API request. If a no-match or no-input is detected the recognition will terminate.


However, sometimes there are use cases which demand that the recognizer e.g. ignores all no-matches until a match is found. This is what the continuousRecognition option is for.


With continuousRecognition you have fine control over which of the 4 events - no-input, no-match, match, and error - will be returned in a callback and which (if any) event will terminate recognition. If you do not set any event to terminate recogntion, the recognition session can be stopped by closing the audio stream or by returning stop:true from the callback.


What is it good for?

An example might be a use case where a voicemail is being played to a caller and during the playback we want to interpret caller commands like: stop, next, previous, save, delete. If we used normal recognition we would encounter situations where what is said was not understood. Stopping recognition on no-match would not make much sense because either: (1) re-prompting would mess up the flow of the call, or (2) restarting recognition  might introduce a gap in recognition that may result in missing a part what the caller said.


In scenario like this it is best to ignore no-match and continue to listen, the caller will notice no response to what he said and will naturally repeat that.


The settings for continuous recognition that would work in this case would be:

  • stopOn : match, error
  • noCallbackFor : no-input, no-match - notes: (1) in this case we suggest setting a noinputTimeout very long so that internally no no-inputs are generated, (2) application could also decide to accept no-match callbacks - they could be tracked and if too numerous acted upon.

Continuous Recognition is supported in Voicegain integration for Twilio Media Streams - either TwiML <Stream> or <Connect><Stream> in Twilio Programmable Voice

It is not yet supported in Voicegain Telephony Bot APIs.

Read more → 
Python script for testing automated speech recognition (ASR) accuracy
Developers
Python script for testing automated speech recognition (ASR) accuracy

Many of our customers have been asking us for help in benchmarking Voicegain speech-to-text recognizer (ASR) on their specific audio files. To make this benchmarking easier we have released a python script that accomplishes just that. With a single command line you can transcribe all audio files from the input directory and compare them against reference transcripts - calculating the WER for each file. You can also do a 2-way comparison of reference vs Voicegain transcript vs Google Speech-to-Text transcript.

The script and the documentation is available at: https://github.com/voicegain/platform/tree/master/utility-scripts/test-transcribe

See our benchmark blog post to give you an idea of what kind of accuracy to expect from the Voicegain recognizer.


Read more → 
Category 1
This is some text inside of a div block.
by Jacek Jarmulak • 10 min read

Donec sagittis sagittis ex, nec consequat sapien fermentum ut. Sed eget varius mauris. Etiam sed mi erat. Duis at porta metus, ac luctus neque.

Read more → 
Category 1
This is some text inside of a div block.
by Jacek Jarmulak • 10 min read

Donec sagittis sagittis ex, nec consequat sapien fermentum ut. Sed eget varius mauris. Etiam sed mi erat. Duis at porta metus, ac luctus neque.

Read more → 
Category 1
This is some text inside of a div block.
by Jacek Jarmulak • 10 min read

Donec sagittis sagittis ex, nec consequat sapien fermentum ut. Sed eget varius mauris. Etiam sed mi erat. Duis at porta metus, ac luctus neque.

Read more → 
Category 1
This is some text inside of a div block.
by Jacek Jarmulak • 10 min read

Donec sagittis sagittis ex, nec consequat sapien fermentum ut. Sed eget varius mauris. Etiam sed mi erat. Duis at porta metus, ac luctus neque.

Read more → 
Category 1
This is some text inside of a div block.
by Jacek Jarmulak • 10 min read

Donec sagittis sagittis ex, nec consequat sapien fermentum ut. Sed eget varius mauris. Etiam sed mi erat. Duis at porta metus, ac luctus neque.

Read more → 
Category 1
This is some text inside of a div block.
by Jacek Jarmulak • 10 min read

Donec sagittis sagittis ex, nec consequat sapien fermentum ut. Sed eget varius mauris. Etiam sed mi erat. Duis at porta metus, ac luctus neque.

Read more → 
Sign up for an app today
* No credit card required.

Enterprise

Interested in customizing the ASR or deploying Voicegain on your infrastructure?

Contact Us → 
Voicegain - Speech-to-Text
Under Your Control