Pricing - Developers | Speech-to-Text Platform

Voicegain Cloud

Pay-as-you-go usage-based pricing with no commitments.$50 in Credits provided on signup, No Credit Card Required to start today. Rate-limits apply; get custom rate-limits with revenue commits. Additional costs apply for premium support. Please contact for details.

Get Started - Free Credit

* No credit card required.

Developer Product

Per Second

Per Minute

Per Hour

STT - Offline -Basic

$0.000025

$0.0015

$0.090

STT - Offline - Enhanced

$0.00005

$0.0030

$0.180

STT - Realtime - Basic

$0.00005

$0.0030

$0.180

STT - Realtime - Enhanced/MRCP

$0.00009

$0.0054

$0.324

Telephony Bot API (IVR+ STT+TTS)

Contact Us

Voicegain Cloud - Assumptions

1. Platform usage is measured and billed per second but the invoices generated by our billing system report the usage in hours.
‍
2. Each API request is subject to a minimum billing of 6 seconds and 1 second increment after that. An API request of 4 second is billed for 6 seconds or $0.0012 ($0.00020*6) and a real-time request for 7 seconds is billed $0.00020*7.
‍
3. Basic model offers STT on a mono-channel with no Diarization or PII redaction. Enhanced model offers STT for two-channel call center audio (Agent and Caller on separate channels) It also includes Diarization (mono channel with multiple speakers) and PII Redaction.
‍
4. STT Realtime-Basic and STT Realtime-Enhanced are for streaming audio over Web-socket. Basic is for mono channel with no diarization. Enhanced is for two-channel/stereo call center audio for call center application.

5. Telephony Bot API is an API to build telephony-based based AI Voice Agents. It includes Voicegain's IVR, Speech-to-Text, Text-to-Speech resources and it also includes the connector to LLMs and Chatbot frameworks.
‍
6. MRCP ASR is the real-time Speech-to-Text/ASR as part of an MRCP Session. This price is applicable for the entire duration of the MRCP Session. It is does not include cost of 100% whole-call recording of sessions.
‍
7. Rate Limits apply for pay as you go. We offer higher rate limits and lower pricing with volume & term commits. Please contact us at sales@voicegain.ai to get the details.

Voicegain Edge (Datacenter/Private Cloud)

Deploy Voicegain on your private infrastructure. Free 30 day trial provided. Session/Port-based licensing. Port prices are paid yearly in advance while shown in the table as monthly. Minimum purchase of ports/usage is applicable. In addition, there is an Annual Support Cost. Discounted OEM pricing is available.

Contact Us

Developer Product

Per Port/Month

Per Audio/Hour

STT - Offline -(Enhanced & Multi-channel)

$60

$0.16

STT - Realtime - Transcription

$72

$0.20

STT - Custom

Contact Us

MRCP ASR (Tier 1, Tier 2)

$35, $65

Not offered

Voicegain Edge - Assumptions

1. Voicegain Edge refers to our platform being deployed on client's private infrastructure (Bare-metal, on VM or Virtual Private Cloud). Voicegain can be deployed using RPM-DEB, Docker Compose on VMs, OVA/OVF or on a Kubernetes Cluster.

2. For high throughput/concurrency, we recommend NVIDIA-GPU-based VMs or Kubernetes Clusters. CPU-based VMs are recommended for low concurrency use-cases. We also offer fully air gapped deployments where the Licensing Server is deployed in Client's datacaenter.
‍
3. Client shall incur infrastructure costs and is responsible for monitoring resource usage of platform. For Private Cloud, we recommend managed Kubernetes from the cloud provider. For Datacenter, please contact us for support options.
‍‍
4. "Port" - for STT Offline - is defined as throughput. So 25 Ports would allow offline transcription of 25 hours of offline audio per hour. For Real-time STT and MRCP ASR, Port is the number of concurrent Websocket sessions or MRCP Sessions respectively. E.g A 25-Port license would allow a maximum of 25 concurrent Websocket or MRCP Sessions.

5. MRCP Tier 1 provides access to our Grammar-based ASR. Voicegain supports grXML and JSGF grammars. Tier 2 provides access to our large vocabulary transcription.
‍
5. For usage based licensing (STT-Offline & STT-Realtime), each request is subject to a minimum billing of 6 seconds and 1 second increment after that. E.g. a real-time request for 4 seconds shall be billed for 6 seconds or $0.0012 ($0.00020*6) and a real-time request for 7 seconds shall be billed for 7 seconds.
‍
5. Voicegain offers discounts for volume & term commits. Please contact us at sales@voicegain.ai to receive custom pricing.

FAQs

Check out our blog for insights, benchmarks, sample code, and more

Voicegain Blog

Voice Bot

How to build a Voicebot using Voicegain, Twilio, RASA, and AWS Lambda

Jacek Jarmulak

•

min read

You can find the complete code (minus the RASA logic - you will have to supply your own) at our github repository.

What does it do ?

The setup allows you to call a phone number and then interact with a Voicebot that uses RASA as the dialog logic engine.

How does it work ?

The Components

Twilio Programmable Voice - We configure a Twilio phone number to point to a TwiML App that has the AWS Lambda function as the callback URL.
AWS Lambda function - a single Node.js function with an API Gateway trigger (simple HTTP API type).
Voicegain STT API - we are using /asr/transcribe/async api with input via websocket stream and output via a callback. Callback is to the same AWS Lambda function but Voicegain callback is POST while Twilio callback is GET.
RASA - dialog logic is provided by RASA NLU Dialog server which is accessible over RestInput API.
AWS S3 for storing the transcription results at each dialog turn.

November 2021 Update: We do not recommend S3 and AWS Lambda for a production setup. A more up to date review of various options to build a Voice Bot is described here. You should consider replacing the functionality of S3 and AWS Lambda with a web server that is able to maintain state - like Node.js or Python Flask.

The Steps

The sequence diagram is provided below. Basically, the sequence of operations is as follows:

Call a Twilio phone number
Twilio makes an initial callback to the Lambda function
Lambda function sends "Hi" RASA and RASA responds with the initial dialog prompt
Lambda function calls Voicegain to start an async transcription session. Voicegain responds with a url of a websocket for audio streaming
Lambda function responds to Twilio with a TwiML command <Connect><Stream> to open a Media Stream to Voicegain. The command will also contain the text of the question prompt.
Voicegain uses TTS to generate from the text of the RASA question an audio prompt and streams it via websocket to Twilio for playback
The Caller hears the prompt and says something in response
Twilio streams caller audio to Voicegain ASR for speech recognition
Voicegain ASR transcribes the speech to text and makes a callback with the result of transcription to Lambda function
Lambda function stores the transcription result in S3
Voicegain closes the websocket session with Twilio
Twilio notices end of session with ASR and makes a callback to Lambda function to find out what to do next
Lambda function retrieves result of recognition from S3 and passes it to RASA.
RASA processes the answer and generates next question in the dialogue
We continue next turn same as in step 4.

‍

What our customers are saying..

“We selected Voicegain because they are accurate, affordable and easy to use. We deployed the entire platform in our datacenter in under 30 minutes.”

Ray Naeini -
Chairman and CEO, Onvisource

“We selected Voicegain for Sutherland CX360, our AI/ML SaaS offering to evaluate all of Sutherland’s CX interactions. We were looking for an accurate PCI-compliant ASR/STT offering for our enterprise customers and we found that in Voicegain..”

Doug Gilbert - CIO and CDO, Sutherland

“Voicegain is amazing! They have a great ASR and a modern architecture. But what we really value is their prompt and timely support. We use both their MRCP ASR and STT APIs and they work great “

Chirayu Oza -
Director of Engineering, Hammer

Casey

Transcribe

Platform for Developers

Voicegain Cloud

Voicegain Cloud - Assumptions

Voicegain Edge (Datacenter/Private Cloud)

Voicegain Edge - Assumptions

FAQs

What does it do ?

How does it work ?

The Components

The Steps

Enterprise