This article outlines various options for how developers and builders of real-time Gen AI voice applications in contact center should design and architect access to streaming audio data from IP-based Contact Centers systems. These Contact Center systems can be premise-based contact center platforms like Avaya, Cisco, Genesys or CCaaS platforms like Five9, Genesys Cloud, NICE CXOne and Aircall.
Use Case for Realtime Generative AI Voice for Contact Center
One of the main use cases for Realtime Generative Voice AI in a contact center is Realtime Agent Assist (RTAA) or a generative AI Co-Pilot. The first step for any such realtime application is to stream audio from Contact Center platforms to a streaming Speech-to-Text model and get the speaker separated transcript. This transcript in turn can be integrated with an LLM for real-time sentiment analysis, QA automation agent assist, summarization and other real-time AI use cases in the contact center.
Voicegain's inhouse Kappa model is one such streaming speech-to-text model. The real-time transcript is made available by Voicegain over websockets.
Architecture Options to get Real-time Audio data
Overall there are 3 main approaches to get access to real-time audio streams
- Voicegain SIP Media Stream B2BUA (For On-Premise Systems)
- SIPREC from the SBC (Under Development)
- Programmable Integration (leveraging APIs provided by CCaaS platforms )
The details of each of those approaches are described below
SIP Media Stream B2BUA
Most on-premise contact center platforms, like Avaya, Genesys and Cisco do not provide programmatic access to the media streams. Instead they all offer the ability to transfer a call to a SIP destination/URI. This is in turn can be provided by the Voicegain SIP Media Stream B2BUA. In other words, the Voicegain SIP Media Stream B2BUA can accept a call from such a SIP INVITE.
More details of the SIP Media Stream B2BUA can be found here
SIPREC from Session Border Controller (currently in Beta)
Most enterprise premise-based Contact Center platforms include a network element called the Session Border Controller (SBC). The SBCs can be thought of as a SIP-aware firewall that is architected "in front" of a premise-based IP Contact Center. SBCs support the forking of audio streams using a protocol called SIPREC and this has been used over the years by active/compliant call recording vendors like NICE and Verint.
With SIPREC, an SBC essentially provides a mirror or fork of the real-time RTP stream from the telephone call. This can be sent to Voicegain's SIPREC Server (currently in beta).
Voicegain has a beta version of a SIPREC interface has been tested with the following platforms:
- Avaya Enterprise SBC
- Ribbon/Sonus SBC
- Broadsoft SIPREC sipua
- Cisco Cisco Unified Border Element (CUBE)
- Metaswitch SIPREC sipua - The minimal version of Metaswitch that supports SIPREC is 9.0.10
- Oracle SBC SIPREC - SelectiveCall Recording SIPREC (oracle.com)
- Twilio TwiML <Siprec>
Voicegain can capture relevant call metadata in addition to obtaining the audio (the metadata capture functionality may differ in capabilities depending on the client platform).
Voicegain platform can be configured to automatically launch transcription and speech-analytics as soon as the new SIPREC session gets established.
SIPREC support is available both in the Cloud and the Edge (OnPrem) deployments of the Voicegain Platform.
SIPREC is an Enterprise feature of the Voicegain platform and is not included in the base package. Please contact support@voicegain.ai or submit a Zendesk ticket for more information about SIPREC and if you would like to use it with your existing Voicegain account.
Programmable Integration with CCaaS real-time audio streaming APIs
Some CCaaS platforms, in particular the modern one provide APIs to get programmatic access to the real-time audio stream. In many of them such a capability was added specifically to simplify integration with Cloud Speech-to-Text services.
Examples of such CCaaS platforms are :
- Five9 VoiceStream
- Genesys Audiohook
- Avaya DMCC (which is part of Avaya Aura® Application Enablement (AE) Services) to open RTP streams with the content of the call
- Use Extended Media Forking (XMF) provided by Cisco Unified Communications Gateway Services
Voicegain Platform integrates with the APIs multiple protocols that allow for flexible programmable integration:
- websockets - sending binary audio data over websocket is supported. In addition to binary data, message protocols used in Twilio and SignalWire for audio streaming over websocket are also supported. (If required, we can easily add support for additional message protocols.)
- gRPC - binary audio data may also be sent using gRPC protocol. Note, that this capability is currently in beta.
- plain RTP. Voicegain also supports plain RTP. The IP/port/encoding negotiation, however, has to be done using our HTTP API. We do not support RTCP nor RTSP. The HTTP API is very simple and we have already had some of our customers integrate this type of plain RTP streaming using XMF within the Cisco UC environment.
All those protocols support uLaw, aLaw, and Linear 16-bit encoding in either 8- or 16kHz sample rate.
Contact us to discuss or brainstorm!
If you are building a voice Gen AI application and you would like to discuss getting access to realtime audio data, please contact us at support@voicegain.ai