Blog | Speech-to-Text Platform

Enterprise

Announcing Voicegain Casey, a Generative AI Voice Agent for Health Plan and TPA Call Centers

Arun Santhebennur

•

min read

•

April 17, 2025

Voicegain is excited to announce the launch of Voicegain Casey, a payer focused AI Voice Agent that transforms the end-to-end call center experience with the power of generative AI. Voicegain Casey is a software suite of the following three Voice AI SaaS applications that helps a health plan or TPA call center improve operational efficiency and increase the CSAT and NPS (Net Promoter Score):

A. Voicegain Casey - Suite of Generative AI-Powered SaaS Applications

1. AI Voice Assistant:

The AI Voice Assistant replaces a touch-tone IVR with a modern LLM-powered conversational AI Phone Agent. The AI Phone Agent can answer all calls that are received at a Health Plan or TPA Call center. It engages callers in a natural conversation and automates routine telephone calls like Claims Status, eligibility inquiries and eligibility verifications. In our experience, there is a very compelling business case to automate provider phone calls in Health Plan and TPA call centers and Voicegain Casey is specifically designed to do this. The AI Voice Assistant is also trained to perform HIPAA Validation and triaging of calls. So if the AI has not been trained to answer a specific question, it routes the call to the call center for live assistance.

2. AI Co-Pilot:

Voicegain AI Co-Pilot is a browser extension that runs as a browser side-panel of Call Center Agent's CRM. The Co-Pilot is integrated with the Contact Center/CCaaS platform of the Payer. When a call transferred by the AI Voice Assistant is eventually answered by a Live Agent, all the information collected by the AI Voice Assistant is presented as a "Screen-Pop" on the Desktop of the Live Agent (also referred to as CTI). This CTI/Screen pop feature ensures that the front-line call center staff do not have to ask the customer to repeat any information that was provided to the AI Voice Assistant. In addition to the Screen-Pop, the AI Co-Pilot also guides the front-line call center staff in real-time by listening, transcribing and analyzing the conversation and providing real-time guidance . The AI Co-Pilot also generates a summary of the conversation within five seconds of the completion of the call. This automated summarization easily saves 1-2 mins of wrap-up time or after call work which is very common in these health plan and TPA call centers.

3. AI QA & Coach:

Voicegain AI QA & Coach is a browser-based AI SaaS application that is used by Team-leaders, QA Call Coaches/Analysts and Operations Managers in a call center. This AI SaaS app can record and measure the sentiment of the callers, analyze the QA score and provided automated coaching tips to the Agents. Voicegain uses the latest open-source reasoning LLMs (like LLAMA 3, Gemma) and closed-source reasoning models like o-3 from Open AI. With the power of modern reasoning models, almost the entire QA score-card (at least 80% of the questions) can be easily answered with modern reasoning-based LLM models. This SaaS App also provides a database of all whole-call-recordings of the entire conversation of the customer - which includes the AI Voice Assistant part, the transfer to the specific Call Center queue and eventually the entire conversation between the Live Agent and the Caller.

‍

B. Integrations

Voicegain Casey requires the following 3 key integrations to help with automation and real-time assistance.

1. Contact Center Platform/CCaaS Platform

Voicegain Casey integrates with modern CCaaS platforms. Current Integrations include Aircall, Five9, Genesys Cloud. Planned integrations include Ringcentral, NICE CXOne and Dialpad.

2. CRM Software

Voicegain Casey integrates with the CRM software of the Health plan or the TPA. This can be an off-the-shelf CRM like Zendesk or Saleforce. It can also be a proprietary/homegrown CRM. As long as the CRM is a browser-based SaaS application, this should not be an issue. Voicegain Casey AI Co-Pilot is a browser-extension that is installed in the side-panel of the same browser tab as the CRM. At the end of the call, the summary of the call is automatically generated and available on the browser extension within 5 seconds of the end of the call.

3. Eligibility & Claims

Voicegain Casey needs access to the member data (for HIPAA Validation) and claims data.

C. Demo and Additional Information

For further information on Voicegain casey, including a demo, please visit this link

D. Give us a shout!

If you would like to understand Voicegain Casey in more detail or if you would prefer a detailed product demo over a Zoom video call, please do not hesitate to send us an email. You can reach us at sales@voicegain.ai or support@voicegain.ai

‍

CPaaS

Accurate & Affordable Speech-to-Text for SignalWire developers

Jacek Jarmulak

•

min read

•

August 10, 2021

This blog post describes how SignalWire developers should integrate with Voicegain Speech-to-Text/ASR based on the application that they are building.

Voicegain offers a highly accurate Speech-to-Text/ASR option on SignalWire. Voicegain is very disruptively priced and one of the main benefits is that it allows developers to customize the underlying acoustic models to achieve very high accuracy for specific accents or industry verticals.

#1: Real-time Transcription and Speech Analytics using LaML <Stream>

SignalWire developers can fork audio to Voicegain using the <Stream> instruction in LAML. The <Stream> instruction makes it possible to send raw audio streams from a running phone call over WebSockets in near real time, to a specified URL.

Developers looking to just get the raw text/transcript may use the Voicegain STT API to get real time transcription of streamed audio from SignalWire.

For developers that need NLU tags like sentiment, named entities, intents and keywords in addition to the transcript, Voicegain's Speech Analytics API provides those metrics in addition to the transcript.

Applications of real-time transcription and speech analytics include live agent assist in contact centers, extraction of insights for sales calls conducted over telephony, and meeting analytics.

#2: Voice Bot or Directed Dialog Speech IVR

If you want to build a Voice Bot or a directed dialog Speech IVR application that handles calls coming over SignalWire then we suggest using Voicegain Telephony Bot API. This is a web callback API similar to LaML and has instructions or commands specifically helpful in building IVRs or Voice Bots. This API handles speech-to-text, DTMF digits and also plays prompts (TTS, pre-recorded, or a combination).

Your calls are transferred from SignalWire to a Voicegain provided SIP endpoint (based on FreeSWITCH) using a simple SIP INVITE.

Voicegain Telephony Bot API allows you to build two types of applications:

Voice Bot applications using a Bot framework of your choice. The "ear" and "mouth" of the bot is provided by Voicegain where as the Bot Framework manages the dialog and extracts intents from the transcribed text. This blog post describes how you can build a Voice Bot using the RASA Bot Framework.

Directed Dialog IVRs using call flows and grammars. You can either program them directly using Telephony Bot API by implementing appropriate callbacks. Alternatively, we provide a simple script that allows you to specify the entire IVR application in a declarative way in a YAML file. You can find a complete example how to do this on our github: platform/declarative-ivr at master · voicegain/platform (github.com)

#3: Custom Applications

If your application has only a limited need for speech recognition, you can invoke Voicegain STT API only as needed. Every time you need speech recognition in your application, you can simply start a new ASR session with Voicegain either in transcribe (large vocabulary transcription) or recognize (grammar-based recognition) mode. You can use the LAML <stream> command

An example application that could be a voice controlled voicemail retrieval or dictation application where Voicegain recognize API is used in a continuous mode and listens to commands like play, stop, next, etc.

In addition to SignalWire, Voicegain also offers integrations with FreeSWITCH using the built-in mrcp plug-in and a separate module for real-time transcription.

If you are a developer on SignalWire and would like to build an app that requires Speech-to-Text/ASR, you can sign up for a developer account using the instructions provided here.

CPaaS

Four ways to use Voicegain Speech-to-Text with Telnyx

Jacek Jarmulak

•

min read

•

August 1, 2021

This blog post will describe 4 ways you can use Telnyx with the Voicegain's Deep Neural Network based Speech-to-Text/ASR platform.

#1: Real-time Transcription and Speech-Analytics

For developers looking to get the raw text/transcript, the Voicegain STT API supports real time transcription of streamed audio from Telnyx.

For conversational AI applications that need NLU tags like sentiment, named entities, intents and keywords in the submitted audio, Voicegain's real-time Speech Analytics API provides those metrics in addition to the transcript.

While both the STT API and the Speech Analytics API support multiple methods to stream audio, Voicegain recommends RTP streaming as the primary method with Telnyx. Developers can stream either 1-channel or 2-channel RTP (the two channels are tied together which is important for some Speech Analytics features).

You can use Telnyx Call Control API to fork the call audio and send it to Voicegain. Call Control API allows you to send either inbound (rx) or outbound (tx) audio or both, this is done using the fork_start command. You can find a complete example of a code needed for real-time transcription of a call here: platform/examples/telnyx/call_control_fork_of_bridged_call at master · voicegain/platform (github.com)

Applications of real-time transcription and speech analytics include live agent assist in contact centers, extraction of insights for sales calls conducted over telephony, and meeting analytics.

#2: Voice Bot or IVR using Voicegain Telephony Bot API

If you want to build a Voice Bot or an IVR application that handles calls coming over Telnyx then we suggest using Voicegain Telephony Bot API - this is a callback API similar in style to Twilio's TwiML. This API handles speech-to-text, DTMF digits and also plays prompts (TTS, pre-recorded, or a combination).

Your calls are transferred from Telnyx to Voicegain using a simple SIP INVITE. The SIP INVITE is accomplished using Telnyx Call Control Dial command. You can find a complete example how to do this here: platform/telnyx-dial-outbound-lambda.py at master · voicegain/platform (github.com)

Voicegain Telephony Bot API allows you to build two types of applications:

Voice Bot applications using either your own Bot framework or using frameworks like RASA or Google Dialog flow for the bot logic. The "ear" and "mouth" of the bot is provided by Voicegain. This blog shows how a Voice Bot using RASA can be constructed: Easy How-To: Build a Voicebot using Voicegain, RASA, and AWS Lambda.
Alternatively, you can build more traditional IVRs using call flows and grammars. You can either program them directly using Telephony Bot API by implementing appropriate callbacks. Alternatively, we provide a simple script that allows you to specify the entire IVR application in a declarative way in a YAML file. You can find a complete example how to do this on our github: platform/declarative-ivr at master · voicegain/platform (github.com)

#3: Use Voicegain STT API as needed in your Call Control App

If your application has only a limited need for speech recognition, you can invoke Voicegain STT API only as needed. Every time you need speech recognition you simply start a new ASR session with Voicegain either in transcribe (large vocabulary transcription) or recognize (grammar-based recognition) mode. The session will return an RTP ip:port to which you can fork your Telnyx audio. You can receive speech-to-text results either over a websocket or via a callback. After you a done with the transcription/recognition session you stop the Telnyx audio fork.

An example application that could be built like that is a voice controlled voicemail retrieval application where Voicegain recognize API is used in a continuous mode and listens to commands like play, stop, next, etc.

#4: Build your own Voice Bot using Long-Session STT API

Finally, you could use Voicegain Long-Session API (planned to be released later in 2021). This API allows you to establish single long session that takes an ongoing stream of inbound audio from Telnyx (via fork command). Once the session is established you can issue commands for transcription or recognition. They would return results upon finding a speech endpoint or when you explicitly stop them. After processing the results you could issue additional commands on the same Voicegain session.

In addition to returning recognition results, Long-Session STT API returns important events, like e.g. start-of-speech that allows you to implement proper barge-in behavior.

Using this API you could build your own Voice Bot just like the Voice Bots from #2, but you could have more control over your Telnyx session, e.g. you could use conference commands.

Benchmark

Speech Analytics Comparison: NER Capabilities & Accuracy

Jacek Jarmulak

•

min read

•

July 22, 2021

This post is the first in a series of posts that compares the performance of Voicegain Speech Analytics against Google and Amazon. This post compares the capabilities and accuracy of recognition/extraction of Named Entities. The Google APIs used for comparison were those under Cloud Natural Language and the Amazon APIs were under AWS Comprehend.

Named Entity Recognition (NER) or extraction of Named Entities is a one of the features of the Voicegain Speech Analytics API. Named Entities Recognition locates and classifies named entities in unstructured text that may be obtained e.g. from the transcription of the audio files. Although there is a lot of overlap between Google, Amazon and Voicegain with respect to the classification categories, there are also some significant differences which are summarized below.

Supported NER Categories

The full spreadsheet linked here shows the named entities extracted by the Voicegain Speech Analytics API and it compares them to the named entity categories available in Google and Amazon Comprehend APIs. Amazon has two NER API: Entity, and PII Entity.

If you look at the spreadsheet you will see that Amazon non-PII Entity API offers little granularity in the named entity categories. For example, it groups a lot of numerical named entities into single QUANTITY category. It groups dates and time (of day) into a single category DATE. On the other hand then PII Entity API has a lot of fine categories related items typically PII-redacted, but it misses a lot of other common entity categories.

Google API seems to cover the usual categories but misses some entities used in call-center application, e.g. CC, SNN, EMAIL>

A category that Voicegain does not support is OTHER. This category which is available in Google and Amazon requires additional application logic to interpret the string that it matches.

Accuracy Comparison

We have tested all 4 APIs on a set of call center calls.

The overall results show that Voicegain and Amazon non-PII PAI detect similar named entities (with the caveat that Amazon NER categories are less specific). Compared to these two, Google NER API misses more entities, but it also marks many additional words falling into the OTHER categories (which is generally is not very useful, at least not when analyzing call center calls.

Looking at the Amazon PII Entities we noticed that:

was good on NAME, BANK_ACCOUNT_NUMBER
EMAIL and PHONE worked mostly OK, but had some strange false positives
CREDIT_DEBIT_NUMBER had false positives (e.g. from phone) or partial matches
DATE_TIME was not picking all phrases that the description said this category should recognize
ADDRESS was working with mixed success - sometimes not picking clear address text or recognizing only part of it
EXPIRY_DATE had many false positives - combinations of 4 digits that clearly were not valid expiry dates

Where Voicegain has a matching entity category for AWS PII Entity it performed same or better.As you see it is difficult to summarize the results because the entities are not directly comparable. If you want to know how Voicegain NER will perform on your data we suggest you test the Voicegain Speech Analytics API which includes NER, keyword, phrase detection, sentiment analysis, etc.

For testing, you have two options:

You can create a free developer account on the Voicegain Platform. Here is how you can sign up. Once you sign up, please use the Transcribe+ feature. If you have any questions, please email us at support@voicegain.ai
You can also use the beta version of our Speech Analytics app and upload your 2 channel audio recording. To get access, please email us at support@voicegain.ai

Use Cases

CPaaS

SIP INVITE Voicegain from Twilio, SignalWire, Telnyx CPaaS

Jacek Jarmulak

•

min read

•

July 13, 2021

Voicegain Telephony Bot API allows developers to use Voicegain Speech-to-Text to build Voice Bots or programmable speech IVR using a simple callback API. With latest Voicegain Platform release 1.21.0 it is now possible to establish SIP sessions to Voicegain Telephony Bot API using a simple SIP Invite.

Before release 1.21.0, the only way for voice app developers to use the Voicegain Telephony Bot API was to call the application using phone numbers that were purchased from Voicegain (via the Web Console). However, we have always wanted to allow clients to bring their own carrier or CPaaS platform and this release allows developers to do just that.

At Voicegain our focus is on offering our ASR/Speech Recognition functionality and our full featured Speech-to-Text APIs. We understand developers rely on their CPaaS platforms for a whole host of important features - messaging, emails, conferencing and international coverage. Now, it is possible to integrate Voicegain Telephony Bot API with any CPaaS that supports SIP Invite. You can combine powerful and affordable Speech Recognition features of the Voicegain Platform with the comprehensive API features of these CPaaS platforms

We have already tested SIP Invite extensively on Twilio, SignalWire, and Telnyx platforms. Other similar platforms should also work without issues. We will report any additional platforms that we have explicitly tested in the future.

How SIP INVITE works with Twilio & SignalWire

On Twilio and SignalWire platforms is trivial to establish SIP session to Voicegain. The only thing needed is the <Dial><Sip> command from TwiML or LaML, for example:

‍

Some notes about the above example:

The SIP URI user name is a unique random identifier assigned on Voicegain Platform to each Telephony Bot Application.
After the SIP connection gets established, the application prompts and speech recognition will be under control of Voicegain Platform based on commands passed using our Telephony Bot API
Once Voicegain `disconnect` command is issued, the control of the application flow will be returned back to the host platform (i.e. Twilio, SignalWire or any other CPaaS platform).
It is possible to pass custom headers to Voicegain during SIP Invite - this way it is possible to associate host sessions with Voicegain sessions.
It is possible to make multiple <Dial><Sip> requests to Voicegain from host application during a single host session.

On our github you can find sample code showing how to dial a outbound call and then bridge it to Voicegain SIP:

What about Telnyx

On Telnyx we tested SIP INVITE using the Telnyx Call Control API. The only functional difference from Twilio and SIgnalWire is that on Telnyx you cannot choose TCP as SIP transport (only UDP is supported).

Here is a sample Python code showing how to dial Voicegain SIP:

‍

The complete code for an AWS Lambda function that dials a number using Telnyx and then bridges it to Voicegain SIP is available here: platform/telnyx-dial-outbound-lambda.py at master · voicegain/platform (github.com)

What can I build with the Telephony Bot API?

Our Telephony Bot API is a callback API in similar fashion as TwiML or LaML. The main difference is that it is based on JSON and our functionality is focused on Speech Recognition. You can read more about it in our blog post announcing release of that API back in August.

On out Github you can find an example of a Node.js function on AWS Lambda that demonstrates how to interface Voicegain Telephony Bot API with a RASA NLU bot: platform/examples/voicebot-lambda-vg-rasa at master · voicegain/platform (github.com)

You can also check out our sample python function code on AWS Lambda which shows how to implement more traditional (VoiceXML like) IVRs with the use of Speech grammars on top of our Telephony Bot API: platform/declarative-ivr at master · voicegain/platform (github.com)

‍

Developers

How to signup for a developer account and start using Voicegain

Jacek Jarmulak

•

min read

•

July 9, 2021

Here are all the steps needed to signup for a developer account on the Voicegain Platform. Once you have the account you can access the Web Console and you can find all the info on how to use the Web Console and the APIs on our Zendesk Knowledge Base .

1. Start at console.voicegain.ai/signup

2. Enter your name and email. If you wish you can check the Terms of Service and/or Privacy Policy.

3. On the next page let us know how you learned about Voicegain, how you wan to use Voicegain, and accept Terms of Service.

5. After you click Next, Voicegain will send you an email with the link to the next step. If you do not get the email, please check a Junk Mail folder, and if it is not there, please follow instruction on the page shown below.

‍

6. Once you get the email, click on the Set Password button.

7. You will be directed to a web page where you can set your Voicegain password.

8. After you click (Re)set Password you will be directed to the login page where you can enter your login credentials.

9. On the next page click the right arrow icon next to "Cloud Web Console"

10. This will take you to the home page of the Voicegain Web Console. You can follow the mini tutorial that is available on the home page.

11. Help articles are available under the question mark (?) menu. There also you will find our helpdesk support link. Note, some of the support articles are available only to logged in users while others are public.

Developers

Test Voicegain realtime Speech-to-Text from your browser

Arun Santhebennur

•

min read

•

June 29, 2021

You can now test the accuracy of both our realtime and offline speech-to-text by visiting our demo page.

Read out paragraphs of your favorite book, give a speech that inspires, mimic your favorite actor or just play a podcast or YouTube video!

Health check for the demo

We currently support Chrome and Edge browser only.
Please ensure that your CPU utilization is not too high (<50%) and your internet bandwidth is reasonable (10 Mbps both directions).
Ensure that your microphone is not being used by another program like Zoom, Teams, Skype or Webex.

If you are noticing delays in real-time transcription results, they are likely because of resource issues on your computer.

Real-time Transcription

Simply click on the microphone icon to get started. You can either speak or stream audio into your microphone from your browser for a full minute.

You can also play back the audio to make sure that it was indeed streamed to us accurately.

Offline Transcription

Click on the upload recording icon to get started. You can upload up a mono or stereo recorded file - wav or FLAC - that is up to 15MB in size. If you need to transcribe a larger file, please sign up for a free account.

Drop us an email (support@voicegain.ai) if you have any comments.