Pricing | Speech-to-Text Platform

Voicegain Cloud

Pay-as-you-go usage-based pricing with no commitments. $50 in Credits provided on signup, No Credit Card Required to start today. Rate-limits apply; get custom rate-limits with revenue commits. Please contact for details.

Get Started - Free Credit

* No credit card required.

Developer Product

Per Second

Per Minute

Per Hour

STT - Offline -Basic³

$0.00005

$0.0030

$0.180

STT - Offline - Enhanced³

$0.00006

$0.0036

$0.216

STT - Offline - Multi-Channel³

$0.00010

$0.0060

$0.36

STT-Realtime - Transcription⁴

$0.00009

$0.0054

$0.324

STT - Custom⁵

Contact Us

Conatct Us

Contact Us

STT-Realtime - Bots/IVR (MRCP & Bot API)⁶

$0.00015

$0.0090

$0.54

Voicegain Cloud - Assumptions

1. Platform usage is measured and billed per second but our billing system displays usage in hours.
‍
2. Each API request is subject to a minimum billing of 6 seconds and 1 second increment after that. A real-time request of 4 second is billed for 6 seconds or $0.0012 ($0.00020*6) and a real-time request for 7 seconds is billed $0.00020*7.
‍
3. STT Offline-Basic offers STT on a mono-channel with no Diarization and no PII Redaction. Voicegain Whisper-small is provided at the Basic price. STT Offline-Enhanced offers Diarization and PII Redaction in addition the Transcription. Voicegain Whisper-medium is provided at Enhanced price. It also supports 2-channel for Call Center recordings where Agent & Caller are on separate channels. STT-Offline - Multi-Channel is for meeting recordings on Zoom or any other meeting platform where each speaker is on a separate audio file.

4. STT Realtime-Transcription is for Voicegain's streaming Speech-to-Text over Web-sockets. Price on the table is per channel. We provide a 50% discount to call center customers where the Agent Channel and the Caller Channel are streamed over separate channels.

5. Custom Speech-to-Text model is built by training our standard model with additional client data (using transfer learning). Please contact us for pricing.
‍
6. STT-Realtime with MRCP or Telephony Bot API is the price for use of our Speech-to-Text/ASR as part of an MRCP or Telephony Bot API Session. This price is applicable for the entire duration of the MRCP or Telephony Bot/SIP Session. It does not include 100% whole-call recording of sessions.
‍
7. Rate Limits apply for pay as you go. We offer higher rate limits and lower pricing with volume & term commits. Please contact us at sales@voicegain.ai to get the details.

Voicegain Edge (Datacenter/Private Cloud)

Deploy Voicegain on your private infrastructure. Free 30 day trial provided. Port-based or Usage-based licensing offered. Minimum purchase of ports/usage is applicable. Additional Annual Support Costs maybe applicable.

Contact Us

Developer Product

Per Port/Month

Per Audio/Hour

STT - Offline -(Enhanced & Multi-channel)

$60

$0.15

STT - Realtime - Transcription

$72

$0.20

STT - Custom

Contact Us

STT - Realtime - Bots/IVR ( Bot API/MRCP)

$66

$0.18

Voicegain Edge - Assumptions

1. Voicegain Edge refers to our platform being deployed on client infrastructure (bare-metal or VPC). Voicegain is deployed on a Kubernetes Cluster. We prefer NVIDIA GPUs for apps that require high concurrency. CPUs are supported for low concurrency apps. Orchestration of the cluster is from Voicegain cloud.
‍
2. Client shall incur infrastructure costs and is responsible for monitoring of Kubernetes. For VPC, we recommend managed Kubernetes from the cloud provider and for Datacenter, you can contact us for support options.
‍‍
3. "Port" - for STT Offline - is defined as throughput. So 25 Ports would allow you to transcribe 25 hours of offline audio per hour. For Real-time STT, Port is the number of concurrent web-socket sessions. E.g 25 Ports means a maximum of 25 Concurrent Real-time STT sessions during a month.
‍
4. For usage based licensing, each request is subject to a minimum billing of 6 seconds and 1 second increment after that. E.g. a real-time request for 4 seconds shall be billed for 6 seconds or $0.0012 ($0.00020*6) and a real-time request for 7 seconds shall be billed for 7 seconds.
‍
5. Voicegain offers discounts for volume & term commits. Please contact us at sales@voicegain.ai to receive custom pricing.

FAQs

Check out our blog for insights, benchmarks, sample code, and more

Voicegain Blog

Edge

Raspberry Pi as Audio Streaming Client

Jacek Jarmulak

•

min read

You can stream audio for Voicegain transcription API from any computer, but sometimes it is handy to have a dedicated inexpensive device just for this task. Below we relay experiences of one of our customers in using a Raspbery Pi to stream audio for real-time transcription. It replaced a Mac Mini which was initially used for that purpose. Using Pi had two benefits: a) obviously the cost, and b) it is less likely than Mac Mini to be "hijacked" for other purposes.

Hardware

Voicegain Audio Streaming Daemon requires very little as far as computing resources, so in even a Raspberry Pi Zero is sufficient ; however, we recommend using Raspberry Pi 3 B+ mainly because it has on-board 1Gbps wired Ethernet port. WiFi connections are more likely to have problems with streaming using UDP protocol.

Here is a list of all hardware used in the project (with amazon prices (as of July 2019)):

Element14 Raspberry Pi 3 B+ Motherboard - $37.78
Miuzei Raspberry Pi 3 b+ Screen, 3.5 Inch - $23.99
Miuzei 3.5 Inch Screen Case for 3.5 LCD - $9.99
iPazzPort Wireless Mini Handheld Keyboard - $13.99
UGREEN USB Audio Adapter - $8.99
SanDisk Ultra 32GB microSDHC UHS-I card - $7.23
plus some existing USB 5V power supply was uses.

All the components added up to a total of $101.97. The reason why a mini monitor and a mini keyboard were included is that they make it more convenient to control the device while it is in the audio rack. For example, the alsa audio mixer can be easily adjusted this way, while at the same time monitoring the level of the audio via headphones.

Raspberry PI running AudioDaemon

Software

The device is running standard Raspbian which can easily be installed from an image using e.g. balenaEtcher. After base install, the following was needed to get things running:

enable ssh access
change default audio device to USB sound card (Raspbian comes default with alsa and basic USB sound drivers)
installing driver for the display (otherwise output font is too tiny and not readable)
installing OpenJDK 9
use link generated from Voicegain Portal to download Voicegain AudioDaemon jar file and correct JSON config
seting the correct audio source number the AudioDaemon start script and launching the daemon

Observations

Here are some lessons learned from using this setup over the past 6 months:

While streaming the CPU use stays under 10%
Java heap is set to 128m, which seems to be more that enough because GCs manage to reduce it to about 54m
Raspberry Pi turned out to be very reliable - we have not had a single issue with the hardware nor with the Raspbian OS
Cheap USB audio card delivers very good sound quality (for speech recognition at least)
Very cheap USB power supplies should be avoided - sometimes they cause a hum in the audio (but that also depends on what audio device is being connected).

‍

Voice Bot

How to build a Voicebot using Voicegain, Twilio, RASA, and AWS Lambda

Jacek Jarmulak

•

min read

You can find the complete code (minus the RASA logic - you will have to supply your own) at our github repository.

What does it do ?

The setup allows you to call a phone number and then interact with a Voicebot that uses RASA as the dialog logic engine.

How does it work ?

The Components

Twilio Programmable Voice - We configure a Twilio phone number to point to a TwiML App that has the AWS Lambda function as the callback URL.
AWS Lambda function - a single Node.js function with an API Gateway trigger (simple HTTP API type).
Voicegain STT API - we are using /asr/transcribe/async api with input via websocket stream and output via a callback. Callback is to the same AWS Lambda function but Voicegain callback is POST while Twilio callback is GET.
RASA - dialog logic is provided by RASA NLU Dialog server which is accessible over RestInput API.
AWS S3 for storing the transcription results at each dialog turn.

November 2021 Update: We do not recommend S3 and AWS Lambda for a production setup. A more up to date review of various options to build a Voice Bot is described here. You should consider replacing the functionality of S3 and AWS Lambda with a web server that is able to maintain state - like Node.js or Python Flask.

The Steps

The sequence diagram is provided below. Basically, the sequence of operations is as follows:

Call a Twilio phone number
Twilio makes an initial callback to the Lambda function
Lambda function sends "Hi" RASA and RASA responds with the initial dialog prompt
Lambda function calls Voicegain to start an async transcription session. Voicegain responds with a url of a websocket for audio streaming
Lambda function responds to Twilio with a TwiML command <Connect><Stream> to open a Media Stream to Voicegain. The command will also contain the text of the question prompt.
Voicegain uses TTS to generate from the text of the RASA question an audio prompt and streams it via websocket to Twilio for playback
The Caller hears the prompt and says something in response
Twilio streams caller audio to Voicegain ASR for speech recognition
Voicegain ASR transcribes the speech to text and makes a callback with the result of transcription to Lambda function
Lambda function stores the transcription result in S3
Voicegain closes the websocket session with Twilio
Twilio notices end of session with ASR and makes a callback to Lambda function to find out what to do next
Lambda function retrieves result of recognition from S3 and passes it to RASA.
RASA processes the answer and generates next question in the dialogue
We continue next turn same as in step 4.

‍

Casey

Transcribe

Platform for Developers

Voicegain Cloud

Voicegain Cloud - Assumptions

Voicegain Edge (Datacenter/Private Cloud)

Voicegain Edge - Assumptions

Hardware

Software

Observations

What does it do ?

How does it work ?

The Components

The Steps

Enterprise