Our latest release (1.24.0) expands Voicegain Speech Analytics and Transcription API with ability to redact sensitive data both in transcript and in audio. This allows our customers to be compliant with standards like HIPAA, GDPR, CCPA, PCI or PIPEDA.
Any of the following types of Named Entities can be redacted in transcript text and/or the audio file.
- ADDRESS - Postal address.
- CARDINAL - Numerals that do not fall under another type.
- CC - Credit Card
- DATE - Absolute or relative dates or periods.
- EMAIL - (coming soon) Email address
- EVENT - Named hurricanes, battles, wars, sports events, etc.
- FAC - Buildings, airports, highways, bridges, etc.
- GPE - Countries, cities, states.
- NORP - Nationalities or religious or political groups.
- MONEY - Monetary values, including unit.
- ORDINAL - "first", "second", etc.
- ORG - Companies, agencies, institutions, etc.
- PERCENT - Percentage, including "%".
- PERSON - People, including fictional.
- PHONE - (coming soon) Phone number.
- QUANTITY - Measurements, as of weight or distance.
- SSN - Social Security number
- TIME - Named documents made into laws.
- ZIP - (coming soon) Zip Code (if not part of an Address)
In the audio they are replaced with silence and in the transcript they are replaced with a string specified when making the API request.
This feature is supported both in Cloud and on the Edge (on-prem).
Two typical use cases are:
- Enable redaction as part of normal processing, of e.g. call center calls
- Do a bulk processing of previously underacted audio in storage to achieve compliance. Combined with low per minute price of Voicegain APIs, this allows our customers to cost effectively process large qualities of audio data.