Our Blog

News, Insights, sample code & more!

Enterprise
Announcing Voicegain Casey, a Generative AI Voice Agent for Health Plan and TPA Call Centers

Voicegain is excited to announce the launch of Voicegain Casey, a payer focused AI Voice Agent that transforms the end-to-end call center experience with the power of generative AI. Voicegain Casey is a software suite of the following three Voice AI SaaS applications that helps a health plan or TPA call center improve operational efficiency and increase the CSAT and NPS (Net Promoter Score):

A. Voicegain Casey - Suite of Generative AI-Powered SaaS Applications

1. AI Voice Assistant:

The AI Voice Assistant replaces a touch-tone IVR with a modern LLM-powered conversational AI Phone Agent. The AI Phone Agent can answer all calls that are received at a Health Plan or TPA Call center. It engages callers in a natural conversation and automates routine telephone calls like Claims Status, eligibility inquiries and eligibility verifications. In our experience, there is a very compelling business case to automate provider phone calls in Health Plan and TPA call centers and Voicegain Casey is specifically designed to do this. The AI Voice Assistant is also trained to perform HIPAA Validation and triaging of calls. So if the AI has not been trained to answer a specific question, it routes the call to the call center for live assistance.

2. AI Co-Pilot: 

Voicegain AI Co-Pilot is a browser extension that runs as a browser side-panel of Call Center Agent's CRM. The Co-Pilot is integrated with the Contact Center/CCaaS platform of the Payer. When a call transferred by the AI Voice Assistant is eventually answered by a Live Agent, all the information collected by the AI Voice Assistant is presented as a "Screen-Pop" on the Desktop of the Live Agent (also referred to as CTI). This CTI/Screen pop feature ensures that the front-line call center staff do not have to ask the customer to repeat any information that was provided to the AI Voice Assistant. In addition to the Screen-Pop, the AI Co-Pilot also guides the front-line call center staff in real-time by listening, transcribing and analyzing the conversation and providing real-time guidance . The AI Co-Pilot also generates a summary of the conversation within five seconds of the completion of the call. This automated summarization easily saves 1-2 mins of wrap-up time or after call work which is very common in these health plan and TPA call centers.

3. AI QA & Coach:

Voicegain AI QA & Coach is a browser-based AI SaaS application that is used by Team-leaders, QA Call Coaches/Analysts and Operations Managers in a call center. This AI SaaS app can record and measure the sentiment of the callers, analyze the QA score and provided automated coaching tips to the Agents. Voicegain uses the latest open-source reasoning LLMs (like LLAMA 3, Gemma) and closed-source reasoning models like o-3 from Open AI. With the power of modern reasoning models, almost the entire QA score-card (at least 80% of the questions) can be easily answered with modern reasoning-based LLM models. This SaaS App also provides a database of all whole-call-recordings of the entire conversation of the customer - which includes the AI Voice Assistant part, the transfer to the specific Call Center queue and eventually the entire conversation between the Live Agent and the Caller.

B. Integrations

Voicegain Casey requires the following 3 key integrations to help with automation and real-time assistance.

1. Contact Center Platform/CCaaS Platform

Voicegain Casey integrates with modern CCaaS platforms. Current Integrations include Aircall, Five9, Genesys Cloud. Planned integrations include Ringcentral, NICE CXOne and Dialpad.

2. CRM Software

Voicegain Casey integrates with the CRM software of the Health plan or the TPA. This can be an off-the-shelf CRM like Zendesk or Saleforce. It can also be a proprietary/homegrown CRM. As long as the CRM is a browser-based SaaS application, this should not be an issue. Voicegain Casey AI Co-Pilot is a browser-extension that is installed in the side-panel of the same browser tab as the CRM. At the end of the call, the summary of the call is automatically generated and available on the browser extension within 5 seconds of the end of the call.

3. Eligibility & Claims

Voicegain Casey needs access to the member data (for HIPAA Validation) and claims data.

C. Demo and Additional Information

For further information on Voicegain casey, including a demo, please visit this link

D. Give us a shout!

If you would like to understand Voicegain Casey in more detail or if you would prefer a detailed product demo over a Zoom video call, please do not hesitate to send us an email. You can reach us at sales@voicegain.ai or support@voicegain.ai

Read more → 
Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.
"Hello World" Example
Developers
"Hello World" Example

In this post we show in three steps what is needed to run your first transcription using Voicegain API.

We assume that you already signed up for Voicegain account and logged into the portal.


Step 1: Create new Context

Main reason to create new Context is to establish new authentication realm. Access to each Context can be separately controlled, so it is easy to disable access to certain Context without affecting other Contexts.

Contexts are also used for specifying default ASR settings.

You can create a new Context from the Context Dash



Step 2: Generate Authentication token

Voicegain APIs use JWT (JSON Web Tokens) to identify and authenticate the account making the request. In order to make API requests you need to generate a JWT which can easily be done from the portal.



Step 3: Run the curl command

Below is the complete input and output from curl command that submits a Web API request to Voicegain Synchronous Speech-to-Text API https://api.voicegain.ai/v1/asr/transcribe


In this case, the audio to be transcribed was retrieved from a URL. Audio can alternatively also be submitted in-line (within request).

Note that synchronous transcription has audio length limit of 60 seconds. Longer audio requires use of asynchronous transcription API.

For asynchronous transcription requests it is possible to stream the audio, e.g. via websocket. You can see some of Voicegain API documentation at: https://www.voicegain.ai/api

Read more → 
Benefits of Edge Deployment
Edge
Benefits of Edge Deployment

There is no denying that services available in the Cloud have significant benefits and is hence a popular choice. That is why Voicegain Speech-to-Text Platform is available both in the Cloud and at the Edge. The key benefits of accessing Voicegain as a Cloud services are:

  • Ease of Use - All it takes to start accessing Voicegain on the Cloud is to create an account on the Voicegain Web Console and get the developer API keys/security tokens. You can immediately start accessing the APIs that have been extensively documented.
  • No Maintenance - Voicegain ensures availability of the infrastructure and is responsible for the software updates and patches, backups, resources, etc.
  • High Security - We have the provider spends one time effort on securing the Cloud services for all of the tenants. Although Cloud is potentially more exposed, but the provider can devote more resources to address security in a systematic way.
  • High Availability - Cloud provides redundancy of the virtual platform and often geographic distribution. Geographic distribution provides more resiliency to network wide outages, etc.
  • Scalability - Cloud provider takes care of the growing demand for resources.
  • Lower Sys Admin, DBA etc. costs - This is largely related to the No Maintenance point.


What is Edge Deployment?

Before we discuss the benefits of Edge Deployment let's define what we mean by it.

  • Edge Computing is defined broadly as all computing outside the cloud happening at the edge of the network, and more specifically in applications where real-time processing of data is required. Edge of the network, in turn, is usually understood as within the "last mile", that part of the network that physically reaches the end-user's premises.
  • What we call Edge Deployment is a deployment of Edge Computing (in our case specifically Speech-to-Text services) either on customer premises (datacenter) or in a VPC of a cloud provider. Compute resources are either owned by or rented by the customer. However the Cloud can 'orchestrate' the deployed application and services it provides is deployed and managed from the Cloud . These services run in virtualized environment (in our case Kubernetes).

Benefits of Edge Deployment

Edge Computing for Speech-to-Text services has many advantages:

  1. Low Network Latencies & High Network Reliability - With Edge Computing processing of speech audio is brought close to where the audio originates. For example, all processing can be done in the same location where the Telco phone lines terminate for an IVR application. If the speech processing were to happen in the Cloud the audio data  would need to be sent over Internet which would introduce additional latency, jitter, and would make the service susceptible to occasional incidents on wide internet like trunks overloaded by DDoS attacks, fiber cuts, etc. One can avoid some of those issues by deploying more reliable network connectivity to the Cloud, e.g., Google Cloud Interconnect, but that comes at the cost and still does not solve the basic reality of extra latency.
  2. Lower Bandwidth Cost - Some Speech-to-Text applications generate a lot of data, e.g., Call Analytics application that processes 100% of the calls. Edge Deployment allows for putting processing resources right next to where the data is generates, e.g. right at the Call Center.
  3. Data Privacy and Control - with all the incoming and generated data confided to the Edge Computing environment and none of it going to the Voicegain Cloud, the customers can apply their own security protocols to protect the data.


Does Edge provide some of the benefits of the Cloud?

You may ask - what about the benefits of the Cloud, mentioned upfront? Do I get some of these with the Edge Deployment?

The answer is (qualified) "yes", and specifically:

  • Ease of Use - Edge Deployment is fully managed from the Cloud. Deployment of the entire application stack takes a few mouse clicks.
  • No Maintenance - Voicegain takes care of managing the components of the application - all the application components will be automatically updated and/or patched.The customer still needs to take care of the hardware and the Kubernetes cluster.
  • High Security - The same core application is deployed for all our customers and we have made sure that it is secure. In case of any new vulnerabilities found, they will be automatically patched.The network entry and exit points from the Edge environment are well defined and the customers can provide additional network security for these.
  • High Availability - Running on Kubernetes platform our application has been designed with high availability in mind - there are multiple instances of each services, and Kubernetes takes care of failover in case of hardware node failure.Because of the ease of deployment, it is easy for our customers to deploy multiple Edge instances, for example, to achieve geographic distribution.
  • Scalability - Again, thanks to the underlying Kubernetes platform, new processing resources can be added by adding new hardware nodes to the Kubernetes  cluster, they will be automatically taken advantage of by the Voicegain application.

Read more → 
Real-Time Transcription for the hearing Impaired
Transcription
Real-Time Transcription for the hearing Impaired

Countryside Bible Church has been using VoiceGain platform for real-time transcription since September 2018 (when our platform was still in alpha).

How it Started

In August 2018 one of our employees was approached by staff at CBC with a question about a software that would allow a deaf person to follow sermons live via transcription. One of the members at CBC is both hearing and vision impaired and cannot easily follow sign language; however, she can read large font on a computer screen from close by.

In August, Voicegain just started alpha tests of the platform, so his response was that indeed he knew such software and it was Voicegain. At that time, our testing was focusing on IVR use cases, so we still needed a few weeks to polish the transcription APIs and develop a web app that could consume the transcript stream (via websocket) and present it as scrolling text in a browser.

To improve recognition, we used about 200 hours of previously transcribed sermons from CBC to adapt our Acoustic DNN Model. Additionally, we created a specific CBC Language Model, by adding a corpus of text from several Bible translation, various transcribed sermons, list of CBC staff names, etc.

As far as the input audio is concerned, initially, we were streaming audio using a standard RTP protocol from ffmpeg tool. We had some issues with a reliability of raw RTP, so later we switched to a custom Java client that sends the audio using a proprietary protocol. The client runs as a daemon on a small Raspberry Pi device.




Current State

CBC audio-visual team has been running real-time transcription using our platform since  September 2018, pretty much ever Sunday. You can see an example of the transcription in action in the video below


Plans

Current plans for the transcription service is to integrate it into CBC website and to make it available together with streamed video. This will allow hearing impaired to follow the services at home via streaming. For now, the transcription text will be presented as an embedded web page element under the embedded video.

Because the streamed video is  more than 30 seconds delayed w.r.t. the real-time, we will be feeding the audio simultaneously to two ASR engines, one optimized for real-time response, and one optimized for accuracy. This is easy, because Voicegain Web API provides methods that allow for attaching two ASR sessions to a single audio stream. Each session, can in turn feed its own websocket stream. By accessing the appropriate websocket stream, web UI can display either the real-time of delayed transcript.

Example transcribed sermons

Because of their Terms of Use, we cannot provide direct results for any of the major ASR engines, but you can download the audio linked below, as well as the corresponding exact Transcripts and run comparison tests on a recognizer of your choice. Note that Voicegain ASR does ignore most of duplicated words that are in audio, that is why the transcript does have those duplicates removed.

The audio is Copyright of  Countryside Bible Church and transcripts are Copyright of Voicegain.

1.  God's Plan for Human History (Part 2)

Tom Pennington  |  Daniel 2  |  2018-11-04 PM

55 minutes 13 seconds, 7475 words

Audio Transcript VoiceGain Output

Accuracy: 1.08% character error rate

Note: Voicegain output is formatted to match Transcript. Normally it also includes timing information. This specific output was obtained on 4/30/19 from real-time recognizer which has slightly lower accuracy compared to off-line recognizer.


Read more → 
Raspberry Pi as Audio Streaming Client
Edge
Raspberry Pi as Audio Streaming Client

You can stream audio for Voicegain transcription API from any computer, but sometimes it is handy to have a dedicated inexpensive  device just for this task. Below we relay experiences of one of our customers in using a Raspbery Pi to stream audio for real-time transcription. It replaced a Mac Mini which was initially used for that purpose. Using Pi had two benefits: a) obviously the cost, and b) it is less likely than Mac Mini to be "hijacked" for other purposes.

Hardware

Voicegain Audio Streaming Daemon requires very little as far as computing resources, so in even a Raspberry Pi Zero is sufficient ; however, we recommend using Raspberry Pi 3 B+ mainly because it has on-board 1Gbps wired Ethernet port. WiFi connections are more likely to have problems with streaming using UDP protocol.

Here is a list of all hardware used in the project (with amazon prices (as of July 2019)):

  • Element14 Raspberry Pi 3 B+ Motherboard - $37.78
  • Miuzei Raspberry Pi 3 b+ Screen, 3.5 Inch - $23.99
  • Miuzei 3.5 Inch Screen Case for 3.5 LCD - $9.99
  • iPazzPort Wireless Mini Handheld Keyboard - $13.99
  • UGREEN USB Audio Adapter - $8.99
  • SanDisk Ultra 32GB microSDHC UHS-I card - $7.23
  • plus some existing USB 5V power supply was uses.

All the components added up to a total of $101.97. The reason why a mini monitor and a mini keyboard were included is that they make it more convenient to control the device while it is in the audio rack. For example, the alsa audio mixer can be easily adjusted this way, while at the same time monitoring the level of the audio via headphones.



Raspberry PI running AudioDaemon

Software

The device is running standard Raspbian which can easily be installed from an image using e.g.  balenaEtcher. After base install, the following was needed to get things running:

  • enable ssh access
  • change default audio device to USB sound card (Raspbian comes default with alsa and basic USB sound drivers)
  • installing driver for the display (otherwise output font is too tiny and not readable)
  • installing OpenJDK 9
  • use link generated from Voicegain Portal to download Voicegain AudioDaemon jar file and correct JSON config
  • seting the correct audio source number the AudioDaemon start script and launching the daemon

Observations

Here are some lessons learned from using this setup over the past 6 months:

  • While streaming the CPU use stays under 10%
  • Java heap is set to 128m, which seems to be more that enough because GCs manage to reduce it to about 54m
  • Raspberry Pi turned out to be very reliable - we have not had a single issue with the hardware nor with the Raspbian OS
  • Cheap USB audio card delivers very good sound quality (for speech recognition at least)
  • Very cheap USB power supplies should be avoided - sometimes they cause a hum in the audio (but that also depends on what audio device is being connected).

Read more → 
Voicegain Story
Announcement
Voicegain Story

The team behind VoiceGain has more than 12 years experience of using Automated Speech Recognition in real wold - developing and hosting complete IVR systems for large enterprises.

​We started of as Resolvity, Inc., back in 2005. We built our own IVR Dialog platform, utilizing AI to guide the dialog and to improve the recognition results from commercial ASR engines.

Resolvity Dialog Platform

The Resolvity Dialog platform, had some advanced AI modules. For example:


  • It had ontology that could be used to model Dialog Domain . This ontology then could be used to automatically drive the dialog. It would automatically generate follow up questions based on the information that was already acquired. We used this often in IVR applications that required recognition of product names.
  • It had an Incremental Case-Based Reasoning (CBR) troubleshooting engine which together with Ontology could be used to diagnose technical problems based on presented symptoms.
  • It had a module to correct systematic errors of the ASR engine to improve the accuracy (we received a US Patent for this)
  • It had an NLCR module that could automatically handle "How may I help you?" type of interactions. It used a combination on Ontology, Bayesian and Neural Network classifiers.


Hosted IVR

Starting from 2007 we were building complete IVR applications for Customer Support and hosting them on our servers in data centers. We build a Customer Solutions team that interacted with our customers ensuring that the IVR applications were always up to date and an Operations team that ensured that we ran the IVRs 24/7 with very high SLAs.

Resolvity Dialog Platform had a set of tools available that allowed us to analyze speech recognition accuracy in high detail and also allowed us to tune various ASR parameters (thresholds, grammars).

Moreover, because that platform was ASR-engine agnostic, we were able to see how a number of ASR engines from various brands performed in real life.



VoiceGain 1.0 Cloud PBX

In 2012-2013 Resolvity built a complete low-cost Cloud PBX platform on top of Open Source projects. We launched it for the India market under the brand name VoiceGain. The platform was providing complete end-to-end PBX+IVR functionality.

The version that we used in prod supported only DTMF, but we also had a functional ASR version. However, at that time it was built using conventional ASR technologies (GMM+HMM) and we found that training it for new  languages presented quite a bit of challenges.

VoiceGain was growing quite fast. We had presence in data centers in Bangalore and Mumbai. We were able to provision both landline and mobile numbers for our PBX+IVR customers. Eventually, although our technology was performing quite well, we found it expensive to run a very hands-on business in India from the USA and sold our India operations.

Augmented Recognition

​When the combination of hardware and AI developments made Deep Neural Networks possible, we decided to start working on our own DNN Speech Recognizer, initially with the goal to augment the results from the ASR engines that we used in our IVRs. Very quickly we noticed that with our new customized ASR used for IVR tasks we could achieve results better than with the commercial ASRs. We were able to confirm this by running comparison tests across data sets containing thousands of examples. The key to higher accuracy was ability to customize the ASR Acoustic Models to the specific IVR domain and user population.

​Own ASR Platform

Great results with augmented recognition lead us to launch a full scale effort to build a complete ASR platform, again under Voicegain (.ai) brand name, that would allow for easy model customization and be easy to use in IVR applications.

From our IVR experience we knew that large enterprise IVR users are (a) very price sensitive plus (b) require tight security compliance, that is why from day 1 we also worked on making the Voicegain platform deployable on the Edge.

Read more → 
Category 1
This is some text inside of a div block.
by Jacek Jarmulak • 10 min read

Donec sagittis sagittis ex, nec consequat sapien fermentum ut. Sed eget varius mauris. Etiam sed mi erat. Duis at porta metus, ac luctus neque.

Read more → 
Category 1
This is some text inside of a div block.
by Jacek Jarmulak • 10 min read

Donec sagittis sagittis ex, nec consequat sapien fermentum ut. Sed eget varius mauris. Etiam sed mi erat. Duis at porta metus, ac luctus neque.

Read more → 
Category 1
This is some text inside of a div block.
by Jacek Jarmulak • 10 min read

Donec sagittis sagittis ex, nec consequat sapien fermentum ut. Sed eget varius mauris. Etiam sed mi erat. Duis at porta metus, ac luctus neque.

Read more → 
Category 1
This is some text inside of a div block.
by Jacek Jarmulak • 10 min read

Donec sagittis sagittis ex, nec consequat sapien fermentum ut. Sed eget varius mauris. Etiam sed mi erat. Duis at porta metus, ac luctus neque.

Read more → 
Category 1
This is some text inside of a div block.
by Jacek Jarmulak • 10 min read

Donec sagittis sagittis ex, nec consequat sapien fermentum ut. Sed eget varius mauris. Etiam sed mi erat. Duis at porta metus, ac luctus neque.

Read more → 
Category 1
This is some text inside of a div block.
by Jacek Jarmulak • 10 min read

Donec sagittis sagittis ex, nec consequat sapien fermentum ut. Sed eget varius mauris. Etiam sed mi erat. Duis at porta metus, ac luctus neque.

Read more → 
Sign up for an app today
* No credit card required.

Enterprise

Interested in customizing the ASR or deploying Voicegain on your infrastructure?

Contact Us → 
Voicegain - Speech-to-Text
Under Your Control