Today we are really excited to announce the launch of Voicegain Whisper, an optimized version of Open AI's Whisper Speech recognition/ASR model that runs on Voicegain managed cloud infrastructure and accessible using Voicegain APIs. Developers can use the same well-documented robust APIs and infrastructure that processes over 60 Million minutes of audio every month for leading enterprises like Samsung, Aetna and other innovative startups like Level.AI, Onvisource and DataOrb.
The Voicegain Whisper API is a robust and affordable batch Speech-to-Text API for developersa that are looking to integrate conversation transcripts with LLMs like GPT 3.5 and 4 (from Open AI) PaLM2 (from Google), Claude (from Anthropic), LLAMA 2 (Open Source from Meta), and their own private LLMs to power generative AI apps. Open AI open-sourced several versions of the Whisper models released. With today's release Voicegain supports Whisper-medium, Whisper-small and Whisper-base. Voicegain now supports transcription in over multiple languages that are supported by Whisper.
Here is a link to our product page
There are four main reasons for developers to use Voicegain Whisper over other offerings:
While developers can use Voicegain Whisper on our multi-tenant cloud offering, a big differentiator for Voicegain is our support for the Edge. The Voicegain platform has been architected and designed for single-tenant private cloud and datacenter deployment. In addition to the core deep-learning-based Speech-to-text model, our platform includes our REST API services, logging and monitoring systems, auto-scaling and offline task and queue management. Today the same APIs are enabling Voicegain to processes over 60 Million minutes a month. We can bring this practical real-world experience of running AI models at scale to our developer community.
Since the Voicegain platform is deployed on Kubernetes clusters, it is well suited for modern AI SaaS product companies and innovative enterprises that want to integrate with their private LLMs.
At Voicegain, we have optimized Whisper for higher throughput. As a result, we are able to offer access to the Whisper model at a price that is 40% lower than what Open AI offers.
Voicegain also offers critical features for contact centers and meetings. Our APIs support two-channel stereo audio - which is common in contact center recording systems. Word-level timestamps is another important feature that our API offers which is needed to map audio to text. There is another feature that we have for the Voicegain models - enhanced diarization models - which is a required feature for contact center and meeting use-cases - will soon be made available on Whisper.
We also offer premium support and uptime SLAs for our multi-tenant cloud offering. These APIs today process over 60 millions minutes of audio every month for our enterprise and startup customers.
OpenAI Whisper is an open-source automatic speech recognition (ASR) system trained on 680,000 hours of multilingual and multitask supervised data collected from the web. The architecture of the model is based on encoder-decoder transformers system and has shown significant performance improvement compared to previous models because it has been trained on various speech processing tasks, including multilingual speech recognition, speech translation, spoken language identification, and voice activity detection.
Learn more about Voicegain Whisper by clicking here. Any developer - whether a one person startup or a large enterprise - can access Voicegain Whisper model by signing up for a free developer account. We offer 15,000 mins of free credits when you sign up today.
There are two ways to test Voicegain Whisper. They are outlined here. If you would like more information or if you have any questions, please drop us an email support@voicegain.ai
The purpose of this blog post is to further elaborate on other posts in which we described various ways you can build a Voice Bot using Voicegain ASR/Speech-to-Text. We also plan to announce a new feature that will soon make Voice Bot development even easier.
Just a quick recap - what is a Voice Bot? A Voice Bot allows users to speak freely and naturally in response to questions asked by the Bot. It can extract multiple "intents" from what a customer says and can respond intelligently. By implementing Voice bots, customers can retire their legacy IVRs and also use a unified Bot platform to power both chatbots and Voice Bots.
It is important to note that Voicegain ASR/Speech-to-Text only provides the "mouth" and the "ear" of the Voice Bot. For building the bot logic and all the back-end integrations (i.e., the brains), a developer has to select a bot framework like Google Dialogflow, RASA, Kore.ai, Microsoft Azure Bot Service, or AWS Lex.
So here are ways you can build a Voice Bot.
This method is described in the blog post: How to build a Voicebot using Voicegain, Twilio, RASA, and AWS Lambda
The important thing to note is that the described setup of using AWS Lambda and S3 to handle the callbacks is for demo purpose only and not ideal for production deployment. The callback server has to be able to handle callbacks from Twilio and from Voicegain and pass information between the two. Because AWS Lambda is stateless the information is being passed in this example via S3 - it makes the end-to-end process slow because of the need for polling. That will not provide a fast response time for your Voice Bot.
For a production-ready setup we suggest replacing AWS Lambda and S3 with a proper web-server that is able to maintain session state - you could use Node.js or Python Flask for that.
This method is described in the blog post: Easy How-To: Build a Voicebot using Voicegain, RASA, and AWS Lambda
This is easier than the method described above. The Voicegain Telephony Bot API uses the Amazon Chime CPaas to provide the functionality otherwise provided by Twilio and this is internally integrated with Voicegain STT API. It uses callbacks, so it needs an intermediate web-service to handle the interaction with a bot platform, e.g. RASA. This web-service may be stateless because Telephone Bot API is capable of maintaining state information.
The example described in the above blog post uses SIP Trunks and phone numbers provided by Amazon Chime which is embedded as part of Voicegain Telephony Bot API. If you would rather retain your CPaaS/Telephony provider (e.g. SignalWire, Twilio, Telnyx, or Bandwidth.com) you can do that and connect to the Telephone Bot API using SIP INVITE. This is described in the blog post: SIP INVITE Voicegain from Twilio, SignalWire, Telnyx CPaaS
This method is described in the blog post: Voicegain announces integration with Audiocodes Voice AI connect.
AudioCodes VoiceAI Connect (VAIC) enables enterprises to connect a bot framework and speech services, such as text-to-speech (TTS) and speech-to-text (STT), to the enterprises’ voice and telephony channels to power Voice Bots, conversational IVRs and Agent Assist use-cases.
AudioCodes provides native integration with Bot Frameworks like Kore.ai, Google Dialogflow and Microsoft Bot Framework.
This setup allows you to directly specify a Voice Bot endpoint instead of specifying a generic http callback destination. The benefit of this is that you do not have to deal with having to provide the callback web-service. Notice that in this setup any back-end requests from your application logic to e.g. data services will now need to be done from the bot platform.
The bot platforms that we already support are RASA and Google Dialogflow. We are currently working on integrating with Microsoft Bot Framework. We hope to have this integration finished in time for the first release of Voicegain-Bot Platform integration. We also plan to very soon work on an integration with Kore.ai.
FreeSWITCH is a very capable telephony platform suitable for building various telephony applications. Some of those applications will rely speech-to-text conversion, for example: ACDs (automatic call distribution), IVRs, Voice-Bots, Real-Time Agent Assist, real-time conference call transcription, call monitoring, etc.
Voicegain Speech-to-Text platform can be used with FreeSWITCH in a variety of ways.
Voicegain STT platform has supported MRCP (Media Resource Control Protocol) for a long time now. Our ASR can be accessed using MRCP and we support both grammar-based recognition (e.g. GRXML) and large-vocabulary transcription. MRCP is a communication protocol designed to connect telephony based IVRs and Voice Bots with speech recognizers (ASR) and speech synthesizers (TTS).
FreeSWITCH can interact with MRCP based recognizers using the included mod_unimrcp module. Voicegain STT has been tested with mod_unimrcp and interfaces with it without problems. You can learn more about using Voicegain STT via mod_unimrcp in this blog post.
Voicegain supports MRCP both in the Cloud and on the Edge (on-prem). We will soon be releasing in OpenSource a recognizer plugin for unimrcp server that will give you even more options in deploying FreeSWITCH with Voicegain and MRCP.
Voicegain provides a Telephony Bot API which is a callback API - similar in style to Twilio TwiML. You can place a call to Voicegain endpoint either using a phone number obtained from Voicegain or using a SIP endpoint unique to your Voicegain application. When a call arrives you will get a web callback and the response you will provide will determine actions that the Voicegain platform will perform, like e.g. play a prompt, recognize speech, detect DTMF, etc.
You can learn more about this API from the following blog posts:
If you have a FreeSWITCH application and you would like to recognize spoken speech you can bridge into Voicegain SIP endpoint and in a callback specify a prompt and the type of speech capture (grammar-based or large vocabulary). Once the recognition finishes you will get a callback and then you can either issue a disconnect command which will transfer call flow back to your Freeswitch app, or you can continue with additional questions and recognitions on Voicegain platform as needed.
Below is an example of a simple interaction with 4 participants:
This is still not Generally Available - please contact us if you are interested in testing.
mod_voicegain will give you capabilities similar to using mod_unimrcp with Voicegain but without the whole overhead of using an MRCP protocol - mod_voicegain talks directly to Voicegain ASR.
mod_voicegain taps into the FreeSWITCH inbound audio stream and sends the audio data to Voicegain ASR in the Cloud or on the Edge. Voicegain ASR processes the audio according to the invocation parameters specified in the data argument. It then communicates the result of transcription or recognition in an Event.
mod_voicegain installs on FreeSWITCH as an app and can be invoked as a such, e.g.:
or from LUA script:
Results will always be returned as a FreeSWITCH event but it is also possible to get the results in a callback to the url specified in callback.uri
The FreeSWITCH event will be of custom type (Event-Name: CUSTOM) and Event-Subclass will be "voicegain_asr_update". The relevant payload will be in the "ASR-Response" field formatted as JSON.
You can read more about mod_voicegain is this Knowledge Base article.
mod_vg_tap has been developed with applications like Real-Time Agent Assist in mind. These apps need access to the audio stream from a FreeSWITCH call but do not otherwise need to interact with FreeSWITCH (unlike IVR and Voice-Bots).
mod_vg_tap installs as an app and has simple commands to start/stop streaming to Voicegain Speech-to-Text engine.
The start command can specify the following destinations:
The results from transcription are generally not returned to a FreeSWITCH app but will be delivered to the destination specified when starting speech-to-text session - the results can be delivered via websocket, polling, or callback.
If you want more information about any of these methods of integrating Voicegain with FreeSWITCH, please email us at support@voicegain.ai.
Dallas, Texas - October 26, 2021: OnviSource, a leading provider of intelligent automation solutions for workforce optimization, contact center operation analytics and automation, customer experience management, and business process automation, announced today a strategic partnership with Voicegain, an innovative Speech-to-Text/ASR company. OnviSource has integrated Voicegain’s deep learning-based speech-to-text platform into its Intellecta™ multichannel analytics solution which utilizes speech-to-text and natural language understanding to analyze customer interactions and audio-based content to discover actionable knowledge and extract business insights.
OnviSource will leverage the Voicegain platform to serve its growing enterprise client base from various industries such as nationwide wireless service providers, banking, financial services, utilities, insurance and others.
“We are pleased to announce this partnership with Voicegain as their AI-driven ASR further augments our AI-driven intelligent automation solutions and our hyper-automation platform that offers integrated AI, conversational AI, RPA, BPA and analytics,” said Ray Naeini, Chairman and CEO of OnviSource. “Our partnership will allow both companies to jointly develop highly sophisticated and customized AI models for various applications and industries in order to deliver unmatched accuracy and performance.”
To achieve high performance, OnviSource deployed the Voicegain ASR Engine on servers with NVIDIA GPUs in its data center. This architecture is referred to as an Edge deployment. While Voicegain also offers a multi-tenant cloud solution, an Edge deployment architecture has two important benefits for OnviSource.
The first major benefit is that it allows OnviSource to meet strict customer contractual commitments related to data privacy, security and control. The second benefit is that it delivers approximately a 75% reduction in costs for OnviSource compared to usage-based pricing models provided by other providers, empowering OnviSource to offer its feature-rich solutions at highly affordable and flexible prices.
“We are excited to be selected by OnviSource for its call center and enterprise speech analytics products. This decision validates the ‘3As’ on which Voicegain differentiates itself in the ASR market – Accuracy, Affordability and Accessibility,” said Arun Santhebennur, Co-founder & CEO of Voicegain. “Our joint product enhancements will deliver highly accurate Speech-to-Text models for complex business applications.”
Selection of the Voicegain product by OnviSource was based on comprehensive trials and pilot programs related to accuracy, performance and applicability of Voicegain’s product, combined with detailed comparative analysis with other products in the market.
Additionally, the Voicegain product offers simplicity in deployment and usage as the entire platform is deployed on a Kubernetes cluster. Its Edge deployment offers a simple script to download and deploy all the packages and dependencies on any server with NVIDIA GPUs.
About OnviSource
For more than a decade, OnviSource has enabled several hundred small-to-large companies across a broad range of industries to cost-effectively manage, automate and improve their customer experience and business processes by offering advanced solutions in multichannel data and media capture, unification, analysis, decision making and automation for their entire enterprise, including their contact centers, back offices and IT organizations.
OnviSource ia.Enterprise Intelligently Automated (IA) solutions offer Workforce Optimization and Workforce Management (WFO/WFM), inclusive Teleservice Customer Engagement Management, Multichannel Customer Engagement Analytics, intelligently automated Customer Survey, Process Automation through Robotic Process Automation (RPA) and Intelligent Process Automation (IPA) and Intelligent Virtual Agent (IVA). The Company delivers its solutions as software products, cloud or Software-as-a-Service (SaaS), managed services, or any combination. OnviSource’s special Advantage Platinum program assures that solutions work for customers’ specific needs by offering a series of customer assistance programs with no obligations. These programs include consultation, proof-of-concept and hands-on operation assistance. OnviSource is headquartered in Plano, Texas (North Dallas area), with an additional operation center in Oklahoma.
About Voicegain
Voicegain is a deep neural network-based Speech-to-Text platform that is focused on developers of voice applications. Voicegain offers a full suite of APIs, SDKs and SaaS apps on top of its platform to automate and analyze voice-based interactions in contact centers, sales and meetings. To learn more, visit Voicegain.ai or create a free account to get started.
Press Contact:Voicegain: Arun Santhebennur, CEO
OnviSource: Deborah Cromwell, Marketing Manager
deborah.cromwell@onvisource.com
One of the previous blog posts described a Voice Bot built using Twilio, Voicegain, RASA, and AWS Lambda. Twilio was used for telephony (phone numbers, SIP Trunking, TwiML for call control) Voicegain provided the ASR/speech recognition, while AWS Lambda was coordinating the actions. The setup works but is involved. The need to pass the speech recognition results via S3 (as Lambda is stateless and does not have memory between function calls) may occasionally cause delays in requests and responses.
Voicegain now integrates with Amazon Chime Voice Connector to offer a pay as you go SIP Trunking service directly from the Voicegain web console. You can also purchase phone numbers and receive inbound calls. Support for making outbound Speech IVR calls is in the works.
Of course, we continue to support developer that use Twilio and SignalWire using simple SIP INVITE - this blog describes how.
The sequence diagram is provided below. It is very simple. Basically, the sequence of operations is as follows:
The sample code for the Lambda function (in python and node.js versions) is available on our github.
1. Click here for instructions to access our live demo site.
2. If you are building a cool voice app and you are looking to test our APIs, click hereto sign up for a developer account and receive $50 in free credits
Voicegain Speech-to-Text platform has already for a while supported many of the Twilio features like:
Release 1.26.0 of the Voicegain platform finally offers a full 2-channel support for Twilio Media Streams. This enables real-time transcription of both the inbound and outbound channels at the same time.
Twilio <Stream> command takes a websocket url parameter as a target to which the selected channels are streamed, for example:
The wss url can obtained by starting a new Voicegain real-time transcription session using https://api.voicegain.ai/v1/asr/transcribe/async API. The session part of the request may look like this (notice that two session are started and each will be fed different channel left/right of the audio stream):
We also need to tell Voicegain to take input in TWIML protocol in stereo:
Notice that we can enable audio capture which in addition will give us a stereo recording of the call once the session is complete.
In the response of the start of Voicegain session we get 3 websocket urls:
On our github we provide an example python code that starts a simple outbound Twilio phone call and then transcribes in real-time both inbound and outbound audio.
The sample code illustrates an outbound calling example which is somewhat simpler because there are no callback involved. In a case of an inbound call, the request to Voicegain would have to be done from your Twilio callback function that gets invoked when a new call comes in, otherwise, the rest of the code would be very similar to our github example.
Some of these are already listed on Twilio Media Streams page:
We will be testing the <Stream> functionality on the LaML command language provided by SignalWire platform which is very similar to Twilio TwiML - we will update our blog with the results of those test.
We are also working on a real-time version of our Speech Analytics API. Once complete then all Speech Analytics functionality will be available real-time to users of Twilio and SignalWire platforms.
1. Click here for instructions to access our live demo site.
2. If you are building a cool voice app and you are looking to test our APIs, click hereto sign up for a developer account and receive $50 in free credits
3. If you want to take Voicegain as your own AI Transcription Assistant to meetings, click here.
We are excited to announce a new Speech-to-Text (STT) API that works with AudioCodes VoiceAI Connect*. AudioCodes VoiceAI Connect (VAIC) enables enterprises to connect a bot framework and speech services, such as text-to-speech (TTS) and speech-to-text (STT), to the enterprises’ voice and telephony channels to power Voice Bots, conversational IVRs and Agent Assist use-cases.
With this new API, enterprises and NLU/Conversational AI platform companies can leverage the capabilities of AudioCodes VAIC with Voicegain as the ASR or STT engine for their contact center AI initiatives.
The two main use-cases in Contact Centers are (1) building Voice Bots (or voice-enabling a chatbot) and (2) building real-time Agent Assist.
While AudioCodes supports Cloud STT options from large players Microsoft, Google and Amazon, introducing Voicegain as an additional ASR option offers three key benefits to prospective customers. These benefits can be summarized as the 3 As - Accuracy, Affordability and Accessibility.
To get very high STT accuracy, companies now understand the importance of training the underlying acoustic models on application specific audio data. While it is necessary to have a reasonable out-of-the-box accuracy, building voice bots or extracting high quality analytics from voice data requires more than what is offered. Voicegain offers a full fledged training data pipeline and easy-to-use APIs to help speed up the building of custom acoustic models. We have demonstrated significant reduction in Word Error Rates even with a few hundred hours of client specific audio files.
Because AudioCodes VAIC makes it very easy two switch between various STT services, you can easily compare performance of Voicegain STT to any of the other STT providers supported on AudioCodes.
Voicegain offers disruptive pricing compared to the big 3 STT providers at essentially the same out-of-the-box accuracy. Our pricing is 40%-75% lower than the big 3 Cloud Speech-to-Text providers. This is especially important for real-time analytics (real-time agent assist) use case in contact centers as the audio/transcription volumes are very large. In addition to APIs, we also provide a white-label reference UI that contact centers can use to reduce the cost and time-to-market associated with deploying AI apps.
In addition to accessing STT as a cloud service, Voicegain can be deployed onto a Kubernetes cluster in a client's datacenter or in a dedicated VPC with any of the major cloud providers. This addresses applications where compliance, privacy and data control concerns prevent use of STT engines on public cloud infrastructure.
Connecting AudioCodes VAIC to Voicegain is done in 3 simple steps. They are:
1) Add Voicegain as the ASR/STT provider on VAIC. This is done through an API (provided by Audiocodes). In this step, you would need to enter a JWT token from Voicegain web console for authentication (instructions provided below).
2) Enter the web-socket entry URL for Voicegain ASR on VAIC. You can get this URL from Voicegain Web Console (instructions provided below)
3) Configure the Speech Recognition engine settings. This includes picking the right model and having the correct timeout and model sensitivity settings. This is done on the Voicegain Web Console (instructions to sign up provided below)
Please contact your Audiocodes customer success contact for Steps 1 & 2.
You would need to sign up for a developer account on Voicegain Web Console. Voicegain offers an open developer platform and there is no need to enter your credit card. We provide 300 minutes of free Speech-to-Text APIs access every month. You can test out our APIs and check out our accuracy.
After you sign up, please go to Settings-> API Security. The JWT Token required for Step 1 and the API entry URL for Step 2 are provided here.
Also you would need to pick the right acoustic model, set the complete timeout & sensitivity specified in Step 3. Please navigate Settings-> Speech Recognition -> ASR Transcription settings.
If you have any questions please email us at support@voicegain.ai
* VoiceAI connect is a product and trademark owned by AudioCodes.
Donec sagittis sagittis ex, nec consequat sapien fermentum ut. Sed eget varius mauris. Etiam sed mi erat. Duis at porta metus, ac luctus neque.
Read more →Donec sagittis sagittis ex, nec consequat sapien fermentum ut. Sed eget varius mauris. Etiam sed mi erat. Duis at porta metus, ac luctus neque.
Read more →Donec sagittis sagittis ex, nec consequat sapien fermentum ut. Sed eget varius mauris. Etiam sed mi erat. Duis at porta metus, ac luctus neque.
Read more →Donec sagittis sagittis ex, nec consequat sapien fermentum ut. Sed eget varius mauris. Etiam sed mi erat. Duis at porta metus, ac luctus neque.
Read more →Donec sagittis sagittis ex, nec consequat sapien fermentum ut. Sed eget varius mauris. Etiam sed mi erat. Duis at porta metus, ac luctus neque.
Read more →Donec sagittis sagittis ex, nec consequat sapien fermentum ut. Sed eget varius mauris. Etiam sed mi erat. Duis at porta metus, ac luctus neque.
Read more →Interested in customizing the ASR or deploying Voicegain on your infrastructure?