Amazon is adding a new privacy-focused feature to its business transcription service, one that automatically redacts personally identifiable information (PII) such as names, social security numbers, and credit card credentials.
Amazon Transcribe constitutes part of Amazon’s AWS cloud unit, and was launched into general availability back in 2018. In a nutshell, Transcribe is an automatic speech recognition (ASR) service that enables enterprise customers to convert speech into text — this can be useful to make audio content searchable from a database, for example, while it can be used by contact centers to mine call data to garner insights and carry out sentiment analysis. However, the issue of privacy is now front-and-center in public dialog, in terms of how technology companies store and manage consumers’ data.
The problem that Amazon is looking to solve is ultimately all about minimizing access to sensitive information. It may be useful to use text-to-speech services to search for keywords and sentiment at a later date, but because phone calls often feature significant private data, this may also be transcribed by Amazon and stored in a searchable database — even if that information is not necessary for analysis purposes. Moreover, there is also a growing number of regulations around the world designed to protect consumer data — this includes the recently-implemented California Consumer Privacy Act (CCPA) and Europe’s General Data Protection Regulation (GDPR).
And it’s against that backdrop that Amazon Transcribe will now enable companies to automatically redact personal data, including: Credit / debit card number, expiration date, CVV code, and PIN; social security number; bank account number; customer name; email address; phone number; and postal address. It’s worth noting here that Google Cloud Platform offers a data loss prevention API which could be used in conjunction with its speech-to-text service to identify and redact sensitive data. But building automated redaction directly into Amazon Transcribe should make the process a lot more straight forward to implement.
Companies using Amazon Transcribe can use automatic redaction how they see fit, and can choose which PII elements they wish to obfuscate. The transcribed text will then display a [PII] tag in place of the sensitive information, and the corresponding timestamps mean that those with sufficient system access can still locate the necessary PII in the original audio file. Additionally, this may also prove useful if a company wishes to carry out extra audio processing to fully redact the information in the original recording.
Amazon Transcribe is available in 31 languages, six of which are supported by real-time transcription, though for now the automated redaction feature is limited to U.S. English. In terms of costs, the feature is billed monthly at a rate of $0.00004 per second of content.