Voicegain Speech-to-Text platform has already for a while supported many of the Twilio features like:
- <Connect> <Stream> for speech-enabled IVR / Voicebot applications
- SIP INVITE - for integration of Voicegain Callback API into Twilio originated calls - also mainly focusing on VR / Voicebot applications
- SIPREC - for either real-time speech-to-text or offline speech-to-text and speech analytics
- plain media <Stream> - but so far only in 1-channels applications with focus of offering an alternative for <Gather>
Release 1.26.0 of the Voicegain platform finally offers a full 2-channel support for Twilio Media Streams. This enables real-time transcription of both the inbound and outbound channels at the same time.
How does it work?
Twilio <Stream> command takes a websocket url parameter as a target to which the selected channels are streamed, for example:
The wss url can obtained by starting a new Voicegain real-time transcription session using https://api.voicegain.ai/v1/asr/transcribe/async API. The session part of the request may look like this (notice that two session are started and each will be fed different channel left/right of the audio stream):
We also need to tell Voicegain to take input in TWIML protocol in stereo:
Notice that we can enable audio capture which in addition will give us a stereo recording of the call once the session is complete.
In the response of the start of Voicegain session we get 3 websocket urls:
- one for the inbound audio - this one we pass to Twilio TwiML <Stream> command
- two for receiving transcription results in real-time - individual messages will look like, e.g. {"utt": "one", "conf": 0.4047, "start": 440}
Example code
On our github we provide an example python code that starts a simple outbound Twilio phone call and then transcribes in real-time both inbound and outbound audio.
The sample code illustrates an outbound calling example which is somewhat simpler because there are no callback involved. In a case of an inbound call, the request to Voicegain would have to be done from your Twilio callback function that gets invoked when a new call comes in, otherwise, the rest of the code would be very similar to our github example.
Use Cases
Some of these are already listed on Twilio Media Streams page:
- real-time transcription
- NLU - e.g. detect and respond to events during the call
- automated Knowledge-Base lookup
- sentiment analysis - use text in to determine sentiment during the call
Coming Soon
We will be testing the <Stream> functionality on the LaML command language provided by SignalWire platform which is very similar to Twilio TwiML - we will update our blog with the results of those test.
We are also working on a real-time version of our Speech Analytics API. Once complete then all Speech Analytics functionality will be available real-time to users of Twilio and SignalWire platforms.
Interested in Voicegain, Take us for a test drive!
1. Click here for instructions to access our live demo site.
2. If you are building a cool voice app and you are looking to test our APIs, click hereto sign up for a developer account and receive $50 in free credits
3. If you want to take Voicegain as your own AI Transcription Assistant to meetings, click here.