Our Blog

News, Insights, sample code & more!

ASR
Voicegain MRCP ASR - Quick, Affordable and Simple replacement for the Nuance Recognizer which is rapidly approaching EOL

This article outlines how the modern Voicegain deep-learning based Speech-to-Text/ASR can be a simple and affordable alternative for businesses that are looking for a quick and easy replacement to their on-premise Nuance Recognizer. Nuance has announced that its going to end support for Nuance Recognizer, its grammar-based ASR which uses the MRCP protocol, sometime in 2026 or 2027. So organizations that have a Speech-enabled IVR as their front door to the contact center need to start planning now.

The future belongs to Generative AI powered Voice Agents

With the rise of Generative AI and highly accurate low latency speech-to-text models, the front door of the call center is poised for major transformation. The infamous and highly frustrating IVR phone menu will be replaced by Conversational AI Voicebots; but this will likely happen over the next 3-5 years. As enterprises start to plan their migration journey from these tree-based IVRs to an Agentic AI future, they would like to do this on their timelines. In other words, they do not want to be forced to do this under the pressure of a deadline because of EOL of their vendor.

Staying On-Premise or in a VPC for both the IVR platform and the ASR

In addition, the migration path proposed by Nuance is a multi-tenant cloud offering. While a cloud based ASR/Speech-to-Text engine is likely to make sense for most businesses, there are companies in regulated sectors that are prevented from sending their sensitive audio data to a multi-tenant cloud offering. 

In addition to the EOL announcement by Nuance for their on-premise ASR, a major IVR platform vendor like Genesys has also announced that its premise-based offerings - Genesys Engage and Genesys Connect - will also approach EOL at the same time as the Nuance ASR.

So businesses that want a modern Gen AI powered Voice Assistant but want to keep the IVR on-premise in their datacenter or behind their firewall in a VPC will need to start planning very quickly what their strategy is going to be.

At Voicegain, we allow enterprises that are in this situation and want to remain on-premise or in their VPC with a modern Voicebot platform. This Voicebot platform runs on modern Kubernetes clusters and leverages the latest NVIDIA GPUs. 

Switching Nuance Recognizer with Voicegain is quick and easy!

Rewriting the IVR Application logic to migrate from a tree-based IVR menu to a conversational Voice Assistant is a journey. It would require investments and allocation of resources. Hence a  good first step is to simply replace the underlying Nuance ASR (and possibly the IVR platform too). This will guarantee that a company can migrate to a modern Gen-AI Voice Assistant on its timelines.

Voicegain offers a modern highly accurate deep-learning-based Speech-to-text engine trained on hundreds of thousands of hours of telephone conversations. It is integrated into our native modern telephony stack. It can also talk over the MRCP protocol with VoiceXML based IVR platforms and it supports the traditional Speech grammars (SRGS, JJSGF). Voicegain also supports a range of built-in grammars (like Zipcode, Dates etc).

As a result, it is a simple "drop-in" replacement to the Nuance Recognizer. There is no need to rewrite the current IVR application. Instead of pointing to the IP address of the Nuance Server, the VoiceXML platform just needs to be reconfigured to point to the IP address of the Voicegain ASR server. This should take no more than a couple of minutes.

Voicegain Telephony Bot API - a Callback API for Telephony-based AI Voice Assistant

In addition to the Voicegain ASR/STT engine, we also offer a Telephony Bot API. This is a callback style API that includes our native IVR platform and ASR/STT engine can be used to build Gen AI powered Voicebots. It integrates with leading LLMs - both cloud and open-source premise based - to drive a natural language conversation with the callers.

Talk to us about your IVR migration journey!

If you would like to discuss your IVR migration journey, please email us at sales@voicegain.ai . At Voicegain, we have decades of experience in designing, building and launching conversational IVRs and Voice Assistants.

Here is also a link to more information. Please feel free to schedule a call directly with one of our Co-founders.

Read more → 
Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.
Speech-to-Text Accuracy Benchmark - June 2020 Results
Benchmark
Speech-to-Text Accuracy Benchmark - June 2020 Results

[UPDATE - October 31st, 2021:  Current benchmark results from end October 2021 are available here. In the most recent benchmark Voicegain performs better than Google Enhanced.]

"What is the accuracy of your recognizer?"

That is the question that we are frequently asked by our potential customers. Often we answer "that depends" and we get a feeling that the other side thinks "must be really bad if they do not give a straight answer". However, "that depends" is really the right answer. Accuracy of automated speech recognition (ASR) depends on the audio in many ways and the effect is not small. Basically, accuracy can be all over the place depending on factors like:

  • Does the speech follow proper grammar or is the speaker making things up as they are saying it. Prepared speeches will have better, i.e. lower WER (word error rate) scores compared to unscripted speech.
  • What is the subject of the speech. Rare and obscure words or word combinations, like e.g. people or other names, will make life difficult for the NLM (natural language model).
  • Are there more than one speakers? Are they constantly switching over or even talk over one another.
  • Is there music in the background - very common for youtube productions.
  • Is there background noise? What is the type of noise?
  • Are parts of the speech audio unusually slow or fast?
  • Is there room reverb or echo in the recording?
  • Is the recording volume very low. Are there variations in the recording volume (e.g. recorder placed on one edge of a very long table)
  • Is the recording quality bad, e.g., due to a codec or insane archival compression levels.
  • etc. etc.

Testing / Benchmarking Speech-to-Text Accuracy

Because the accuracy or Word Error Rate questions are somewhat meaningless without specifying the type of speech audio, it is important to do testing when choosing a speech recognizer. As a test set, one would choose a set of audio files, that accurately represent the spectrum of the speech that will be encountered by the recognizer in the expected use cases.  For each speech audio file from the set one would obtain a gold/reference transcript that is 100% accurate. After that, things can be automated -- transcribe each file on the recognizers being evaluated, compute WER against the reference for each of the generated transcripts, and collate the results. The combined results will present a clear picture of how the recognizers perform on the specific speech audio that we care about. If you are going to repeat this process often, e.g., to evaluate new candidates on the recognizer marker, it is good to standardize the test set, basically creating a repeatable benchmark that can be referenced in the future.

Our benchmark

The benchmark results that we are presenting here are somewhat different than the use-case driven tests or benchmarks. Because we are building a general recognizer for an unspecified use case, we intentionally decided to use a very broad set of audio files.  Rather than collecting the test files ourselves, we decided to use the data set described in "Which Automatic Transcription Service is the Most Accurate? — 2018" from September 2018 by Jason Kincaid. The article presents a comparison of Speech Recognizers from various companies using a set of 48 YouTube videos (taking 5 minutes of audio from each of the videos). By the time we decided to do a retest of Jason's benchmark, 4 videos were no longer accessible, so our benchmark presented here uses data from only 44 videos.

We compared the results presented by Jason to the results from the big 3 - Google, Amazon, and Microsoft - recognizers as of June 2020. Of course, we also included our Voicegain recognizer, because we wanted to see how we stacked against those. All the tested recognizers use Deep Neural Networks. The Voicegain speech recognizer ran on the Google Cloud Platform using Nvidia T4 GPUs. All recognizers were run with default settings and no hints nor user language models were used.

It is important to mention that none of the benchmark files are included in the training set that Voicegain uses. Neither is other audio from the speakers from the benchmark files, nor the same content but spoken by other speakers.

So what are the results? Who has the best recognizer?

Again, the best recognizer is not the right question, because it all depends on your actual speech audio it is used on. But the key results from testing on the 44 files are as follows:

  • Every recognizer has improved. The biggest improvement in median WER was by Microsoft Speech to Text.
  • The best recognizer in our data set was Google Speech to Text - Enhanced (video), but the new Microsoft Speech to Text is very close second.
  • Taking price into consideration, Microsoft might be declared Best Buy
  • Voicegain recognizer is definitely Best Value.
  • Google Speech to Text - Standard, although somewhat improved, is still clearly the worst performing on the data set.    
  • The single bad data point for Google Enhanced (video) is real. We ran repeated test on the file and got the same result. The old Google Enhanced recognizer did not have problems with that file.

How does the Voicegain recognizer stack up?

Here are our thoughts and some details:

  • Up until October 2019 the training set we were using to train our recognizer was relatively unchanged. Moreover, our training set was heavily biased towards some categories of speech audio. You can see that in the chart, e.g., by the fact that our best results were better than old Amazon Transcribe but our worst results were quite a bit more worse than Amazon Transcribe.
  • Based on the first results from the benchmark we analyzed what kind of audio gave us trouble, and collected data with the particular characteristics but sourced very broadly (to avoid training to  benchmark) to make our recognizer more robust.  That effort paid off and you can see that now the Voicegain recognizer WER spread is much tighter and overall is now very close to new Amazon Transcribe.
  • Overall Voicegain is the most improved recognizer. Just over 6 months ago we were just better than Google Standard, but now we are closing on Amazon Transcribe. This is result of both changes to the Neural Network architecture and a large increase in the training data set hours.
  • If you look into the details, Voicegain recognizer was better than new Amazon on 11 out of 44 files, better than Google Video on 5 files, and better than Microsoft also on 5 out of 44 files.
  • If you consider the price, we think that Voicegain presents a great value. We have talked to customers who were not doing large scale transcription due to large cost of the 3 big platforms and our low pricing suddenly made new uses of transcription viable.

We welcome anyone to test our platform and see how it performs on speech audio types that matter for your use cases.

Any software that can help me in testing recognizers?

We have Open Sourced the key component of our benchmark suite, the transcribe_compare python utility. It is available here: https://github.com/voicegain/transcription-compare under MIT license.

It is useful for automatic benchmarking but it can also output data to an html file which can be viewed in a web browser. We use it often this way to do a manual review of the transcription errors or differences in errors between two recognizers or recognizer versions.

How can I test drive Voicegain?

If you are building an app that requires transcription, sign up today for a developer account and get $50 in free credits  (~5000 minutes of platform use). You can check out our accuracy add test our APIs. Instructions to sign up for a developer account are provided here.

3. If you want to make Voicegain your own AI Transcription Assistant, click here. You can take Voicegain to meetings, webinars, talks, lectures and more.

We expect to catch up soon

We are still in the middle of extensive data collection effort and the training is not over yet. We are seeing continuing improvement in our recognizer, with the new improved versions of the acoustic model deployed to production about twice a month. We will report updated benchmark results on our blog in a few months.

User-Customized Acoustic Model

We have another blog post planned that is going to quantify the benefit one can expect from using additional user data to train the acoustic model used in the recognizer. We have selected a large data set with a very specific English accent that currently has higher WER. We will report on the impact on WER of training on such a data set. We will quantify the improvement based on the size of the data set and the duration of training.

Voicegain provides easy to use tools that allow users to build their own custom acoustic models. This upcoming post will provide a clear insight as to what improvements to expect and how much data is needed to make a difference in reducing WER.

References

Contact Us

If you have any questions regarding this article or our platform and recognizer you can contact us at info@voicegain.ai


Read more → 
Transcription for Live Streamed Event - an example
Use Cases
Transcription for Live Streamed Event - an example

The video below shows an example of Voicegain Live Transcribe used to provide transcription for an event streamed over video.


Here are some details about this particular setup:

  • the video part is streamed using BoxCast
  • the audio for transcription is tapped live at the source on site
  • audio is streamed to Voicegain Cloud for processing using a small Java client running on raspberry pi computer
  • the audio client was downloaded pre-configured from the Voicegain portal and reads audio directly from USB audio device plugged into raspberry pi
  • speech is transcribed in the Cloud using Voicegain semi-real-time mode which delivers results in about 30 seconds (the real-time mode delivers results will less than 1 second delay))
  • the transcription output goes via a delay component that allows us to dial in the precise delay to match the streaming video delay - in this case the delay was 35.5 seconds
  • the transcribed words are sent to a Web Client over websocket - each word is sent with the set delay
  • the words are displayed with the gray font shade corresponding to the confidence in the words and the gap proportional to the gap between the spoken words
  • the Acoustic Model used here has been custom trained with additional 200h+ hours from this particular speaker
  • custom training data consisted simply of previously transcribed speeches by the speaker that were readily available on the website
  • we are also using a custom Language Model (on top of the base NLM) that was created from user provided corpus
Read more → 
Key Differentiators
Insights
Key Differentiators

Current speech-to-text enterprise market can be divided into 3 distinct groups of players. Note, that we are focusing here on speech-to-text platforms rather than complete end-user products (so we do not include consumer products like Dragon NaturallySpeaking, etc.)

  • The old ASRs - for example Nuance (and every speech company that Nuance acquired over the years) and Lumenvox. These speech-to-text engines go back to late 1990s early 2000s. They were built using technology relying on Gaussian Models and Hidden Markov Chains. They do require on-prem install.
  • Established Cloud Speech-to-Text services - like Google, AWS, Microsoft Azure, IBM. Some of these also began with recognizers build using Gaussian Models and Hidden Markov Chains, but by 2012 started transitioning to recognizers using Deep Neural Network models for speech recognition.
  • New players - these are new companies going back to about 2015. That is when Nvidia made it possible for pretty much anyone to train DNNs on Nvidia's new GPUs. A lot of small companies arose which built their own speech-to-text engines either from scratch or using open-source foundations. Now, 5 years later, many of them are entering speech-to-text market with mature products and delivering high recognition accuracy.

Where does Voicegain fit here?

We consider ourselves as as one of the new players as we started working on our own DNN-based speech-to-text engine at the end of 2016. However, we have been working with old style ASRs since 2006 and as a result we knew very well limitations of those. That was what motivated us to develop ASRs of our own.

We are also very familiar with employing ASRs in real-world large volume applications so we know which features the users of ASRs want - be it developers who build the applications, or IT personnel that has to host and maintain them.

All of this guided us  in decisions we made when developing our speech-to-text platform.

So how is Voicegain product different?

Below we list what we think are 4 key differentiators of our speech-to-text platform compared to competition. Note that the competitive field is pretty broad, and we consider a particular feature a differentiator if it is not a common feature in the market.

1) Edge Deployment

By, Edge Deployment we mean a deployment on customer premises (datacenter) or on VPC. Moreover, the deployment is fully orchestrated and managed from the Cloud (for more information see our blog post about Benefits of Edge Deployment). The aspect of orchestration and built-in management makes it essentially different from the old ASRs which were also deployed on-prem and required Support Contracts do deploy them successfully and to maintain them over time.

We think that Edge Deployment is critical for a speech-to-text platform which is to replace many of the old ASRs in their applications.

2) Acoustic Model Customization

Over the years when working with ASRs we noticed that there were cases where the ASR would show consistently higher error rates. Usually, this was related to IVR calls coming from customers in regions of the country with distinct accents.

In some of our use cases so far, ability to customize models has allowed us to reduce WER very significantly (e.g. from 8% WER to 3%).

We are currently working on a rigorous experiment where we are customizing our model to support Irish English. We plan to report in detail on the results in April.

3) Targeted support for IVR

Voicegain speech-to-text platform was developed specifically with IVR use cases in mind. Currently the platform supports the following 3 IVR uses cases, and we are working on adding conversational NLU later this year.

a) ASR with support for legacy IVR Standards

In order to make our speech-to-text engine an attractive solution for replacement of old ASRs, we implemented it to support legacy standards like MRCP and GRXML. That support is not a mere add-on, simply tagging a Web API on the back of an MRCP server, but is more integral - our core speech-to-text engine directly interprets a superset of MCRP protocol commands.

We also support GRXML and JSGF grammars - via MRCP, in IVR callbacks, and over Web API.

When used with grammars, big advantage of Voicegain recognizer is that at the core it is a large vocabulary recognizer. Grammars are used to do constrain the recognized utterances to facilitate semantic mapping, but the recognizer can also recognize Out-of-Grammar utterances, which opens new possibilities for IVR tuning.

b) Web-hook IVR Support (without VXML)

Flow-based IVR systems have traditionally been built using two approaches - (i) either having the dialog interactions interpreted on a VXML platform (VXML browser), or (ii) using webhooks invoking application logic running on standard web back-end platforms (examples of the latter are offerings of e.g. Twilio, Plivo, or Tropo).

Our platform supports webhook style IVRs. Incoming calls can be interfaced via standard telephony SIP/RTP, and the IVR dialog can be directed from any platform that implements web-hooks (e.g. Node.js, Django)

c) Enabling IVRs that use chatbot back-end

Many companies have invested significant effort into building their text based chatbots rather than using products like Google Dialogflow. What Voicegain platform provides is an easy way to deploy the existing chatbot logic on a telephony speech channel. This takes advantage of our platform's webhook-ivr IVR support and can feed real-time text (including multiple alternatives) to a chatbot platform. We also provide audio output either via TTS or prerecorded clips.

4) End-to-end support for Real-Time Continuous Speech-to-Text

Because IVR has always been our focus, we built our Acoustic Models to support low latency real-time speech-to-text (both continuous large vocabulary and with context-free grammars).  We also focused on convenient ways to stream audio into our speech-to-text platform, and to consume the generated transcript.

One of our products is Live Transcribe which allows for real-time transcription (with just few seconds delay) which is then broadcast over websockets and can be consumed on provided web clients. This opens possibility to do live speaker transcription with uses cases that may include conferences, lectures, etc. making these events easier to participate by hearing impaired audience members.

Read more → 
"Hello World" Example
Developers
"Hello World" Example

In this post we show in three steps what is needed to run your first transcription using Voicegain API.

We assume that you already signed up for Voicegain account and logged into the portal.


Step 1: Create new Context

Main reason to create new Context is to establish new authentication realm. Access to each Context can be separately controlled, so it is easy to disable access to certain Context without affecting other Contexts.

Contexts are also used for specifying default ASR settings.

You can create a new Context from the Context Dash



Step 2: Generate Authentication token

Voicegain APIs use JWT (JSON Web Tokens) to identify and authenticate the account making the request. In order to make API requests you need to generate a JWT which can easily be done from the portal.



Step 3: Run the curl command

Below is the complete input and output from curl command that submits a Web API request to Voicegain Synchronous Speech-to-Text API https://api.voicegain.ai/v1/asr/transcribe


In this case, the audio to be transcribed was retrieved from a URL. Audio can alternatively also be submitted in-line (within request).

Note that synchronous transcription has audio length limit of 60 seconds. Longer audio requires use of asynchronous transcription API.

For asynchronous transcription requests it is possible to stream the audio, e.g. via websocket. You can see some of Voicegain API documentation at: https://www.voicegain.ai/api

Read more → 
Benefits of Edge Deployment
Edge
Benefits of Edge Deployment

There is no denying that services available in the Cloud have significant benefits and is hence a popular choice. That is why Voicegain Speech-to-Text Platform is available both in the Cloud and at the Edge. The key benefits of accessing Voicegain as a Cloud services are:

  • Ease of Use - All it takes to start accessing Voicegain on the Cloud is to create an account on the Voicegain Web Console and get the developer API keys/security tokens. You can immediately start accessing the APIs that have been extensively documented.
  • No Maintenance - Voicegain ensures availability of the infrastructure and is responsible for the software updates and patches, backups, resources, etc.
  • High Security - We have the provider spends one time effort on securing the Cloud services for all of the tenants. Although Cloud is potentially more exposed, but the provider can devote more resources to address security in a systematic way.
  • High Availability - Cloud provides redundancy of the virtual platform and often geographic distribution. Geographic distribution provides more resiliency to network wide outages, etc.
  • Scalability - Cloud provider takes care of the growing demand for resources.
  • Lower Sys Admin, DBA etc. costs - This is largely related to the No Maintenance point.


What is Edge Deployment?

Before we discuss the benefits of Edge Deployment let's define what we mean by it.

  • Edge Computing is defined broadly as all computing outside the cloud happening at the edge of the network, and more specifically in applications where real-time processing of data is required. Edge of the network, in turn, is usually understood as within the "last mile", that part of the network that physically reaches the end-user's premises.
  • What we call Edge Deployment is a deployment of Edge Computing (in our case specifically Speech-to-Text services) either on customer premises (datacenter) or in a VPC of a cloud provider. Compute resources are either owned by or rented by the customer. However the Cloud can 'orchestrate' the deployed application and services it provides is deployed and managed from the Cloud . These services run in virtualized environment (in our case Kubernetes).

Benefits of Edge Deployment

Edge Computing for Speech-to-Text services has many advantages:

  1. Low Network Latencies & High Network Reliability - With Edge Computing processing of speech audio is brought close to where the audio originates. For example, all processing can be done in the same location where the Telco phone lines terminate for an IVR application. If the speech processing were to happen in the Cloud the audio data  would need to be sent over Internet which would introduce additional latency, jitter, and would make the service susceptible to occasional incidents on wide internet like trunks overloaded by DDoS attacks, fiber cuts, etc. One can avoid some of those issues by deploying more reliable network connectivity to the Cloud, e.g., Google Cloud Interconnect, but that comes at the cost and still does not solve the basic reality of extra latency.
  2. Lower Bandwidth Cost - Some Speech-to-Text applications generate a lot of data, e.g., Call Analytics application that processes 100% of the calls. Edge Deployment allows for putting processing resources right next to where the data is generates, e.g. right at the Call Center.
  3. Data Privacy and Control - with all the incoming and generated data confided to the Edge Computing environment and none of it going to the Voicegain Cloud, the customers can apply their own security protocols to protect the data.


Does Edge provide some of the benefits of the Cloud?

You may ask - what about the benefits of the Cloud, mentioned upfront? Do I get some of these with the Edge Deployment?

The answer is (qualified) "yes", and specifically:

  • Ease of Use - Edge Deployment is fully managed from the Cloud. Deployment of the entire application stack takes a few mouse clicks.
  • No Maintenance - Voicegain takes care of managing the components of the application - all the application components will be automatically updated and/or patched.The customer still needs to take care of the hardware and the Kubernetes cluster.
  • High Security - The same core application is deployed for all our customers and we have made sure that it is secure. In case of any new vulnerabilities found, they will be automatically patched.The network entry and exit points from the Edge environment are well defined and the customers can provide additional network security for these.
  • High Availability - Running on Kubernetes platform our application has been designed with high availability in mind - there are multiple instances of each services, and Kubernetes takes care of failover in case of hardware node failure.Because of the ease of deployment, it is easy for our customers to deploy multiple Edge instances, for example, to achieve geographic distribution.
  • Scalability - Again, thanks to the underlying Kubernetes platform, new processing resources can be added by adding new hardware nodes to the Kubernetes  cluster, they will be automatically taken advantage of by the Voicegain application.

Read more → 
Real-Time Transcription for the hearing Impaired
Transcription
Real-Time Transcription for the hearing Impaired

Countryside Bible Church has been using VoiceGain platform for real-time transcription since September 2018 (when our platform was still in alpha).

How it Started

In August 2018 one of our employees was approached by staff at CBC with a question about a software that would allow a deaf person to follow sermons live via transcription. One of the members at CBC is both hearing and vision impaired and cannot easily follow sign language; however, she can read large font on a computer screen from close by.

In August, Voicegain just started alpha tests of the platform, so his response was that indeed he knew such software and it was Voicegain. At that time, our testing was focusing on IVR use cases, so we still needed a few weeks to polish the transcription APIs and develop a web app that could consume the transcript stream (via websocket) and present it as scrolling text in a browser.

To improve recognition, we used about 200 hours of previously transcribed sermons from CBC to adapt our Acoustic DNN Model. Additionally, we created a specific CBC Language Model, by adding a corpus of text from several Bible translation, various transcribed sermons, list of CBC staff names, etc.

As far as the input audio is concerned, initially, we were streaming audio using a standard RTP protocol from ffmpeg tool. We had some issues with a reliability of raw RTP, so later we switched to a custom Java client that sends the audio using a proprietary protocol. The client runs as a daemon on a small Raspberry Pi device.




Current State

CBC audio-visual team has been running real-time transcription using our platform since  September 2018, pretty much ever Sunday. You can see an example of the transcription in action in the video below


Plans

Current plans for the transcription service is to integrate it into CBC website and to make it available together with streamed video. This will allow hearing impaired to follow the services at home via streaming. For now, the transcription text will be presented as an embedded web page element under the embedded video.

Because the streamed video is  more than 30 seconds delayed w.r.t. the real-time, we will be feeding the audio simultaneously to two ASR engines, one optimized for real-time response, and one optimized for accuracy. This is easy, because Voicegain Web API provides methods that allow for attaching two ASR sessions to a single audio stream. Each session, can in turn feed its own websocket stream. By accessing the appropriate websocket stream, web UI can display either the real-time of delayed transcript.

Example transcribed sermons

Because of their Terms of Use, we cannot provide direct results for any of the major ASR engines, but you can download the audio linked below, as well as the corresponding exact Transcripts and run comparison tests on a recognizer of your choice. Note that Voicegain ASR does ignore most of duplicated words that are in audio, that is why the transcript does have those duplicates removed.

The audio is Copyright of  Countryside Bible Church and transcripts are Copyright of Voicegain.

1.  God's Plan for Human History (Part 2)

Tom Pennington  |  Daniel 2  |  2018-11-04 PM

55 minutes 13 seconds, 7475 words

Audio Transcript VoiceGain Output

Accuracy: 1.08% character error rate

Note: Voicegain output is formatted to match Transcript. Normally it also includes timing information. This specific output was obtained on 4/30/19 from real-time recognizer which has slightly lower accuracy compared to off-line recognizer.


Read more → 
Category 1
This is some text inside of a div block.
by Jacek Jarmulak • 10 min read

Donec sagittis sagittis ex, nec consequat sapien fermentum ut. Sed eget varius mauris. Etiam sed mi erat. Duis at porta metus, ac luctus neque.

Read more → 
Category 1
This is some text inside of a div block.
by Jacek Jarmulak • 10 min read

Donec sagittis sagittis ex, nec consequat sapien fermentum ut. Sed eget varius mauris. Etiam sed mi erat. Duis at porta metus, ac luctus neque.

Read more → 
Category 1
This is some text inside of a div block.
by Jacek Jarmulak • 10 min read

Donec sagittis sagittis ex, nec consequat sapien fermentum ut. Sed eget varius mauris. Etiam sed mi erat. Duis at porta metus, ac luctus neque.

Read more → 
Category 1
This is some text inside of a div block.
by Jacek Jarmulak • 10 min read

Donec sagittis sagittis ex, nec consequat sapien fermentum ut. Sed eget varius mauris. Etiam sed mi erat. Duis at porta metus, ac luctus neque.

Read more → 
Category 1
This is some text inside of a div block.
by Jacek Jarmulak • 10 min read

Donec sagittis sagittis ex, nec consequat sapien fermentum ut. Sed eget varius mauris. Etiam sed mi erat. Duis at porta metus, ac luctus neque.

Read more → 
Category 1
This is some text inside of a div block.
by Jacek Jarmulak • 10 min read

Donec sagittis sagittis ex, nec consequat sapien fermentum ut. Sed eget varius mauris. Etiam sed mi erat. Duis at porta metus, ac luctus neque.

Read more → 
Sign up for an app today
* No credit card required.

Enterprise

Interested in customizing the ASR or deploying Voicegain on your infrastructure?

Contact Us → 
Voicegain - Speech-to-Text
Under Your Control