DLP for Developers Overview— Amazon Macie, Google DLP API and VoiceBase

John Wang
RingCentral Developers
5 min readAug 9, 2018
Data Loss Prevention (DLP) for Developers Overview — Amazon Macie, Google Cloud DLP API, and VoiceBase

We manage cloud business communications for customers at RingCentral, so our servers host a lot of data including call recordings, voicemail, SMS, MMS, and fax. Some of our customers are interested in Data Loss Prevention (DLP) to detect and redact Personally-Identifiable Information (PII) and other sensitive data, so it is useful to review some of the solutions provided by our partners, namely Amazon Macie, Google DLP API and VoiceBase. These solutions work differently so we’ll take quick overview of each. If you have any questions on how these can work with your RingCentral data, please reach out to us.

TL;DR

These three tools have different capabilities can and be used in different ways.

  • Amazon Macie is a security tool for Information Security staff to monitor the data in their S3 buckets. Audio and video data must be transcribed first. There is no API, aside from CloudWatch alerts.
  • Google DLP API can classify and redact sensitive data. It supports a number of customizations including dictionaries, regular expression patterns and detection rules. It requires text data as input and can work on data already in GCS, BigQuery and Cloud Datastore. Audio must be transcribed such as using the Google Speech-to-Text API.
  • VoiceBase API supports classification and redaction and can act on audio directly as well as text transcripts. This is the only solution that can redact an audio. You can also send it a link to a media file such as a RingCentral media URL with an access token query parameter.

All three of these can be used with RingCentral’s voice and video data. The approaches include transcribing the data and copying it to the relevant storage location such as S3 or GCS, or submitting the data via API.

Amazon Macie

Amazon Macie is a service automatically classifies, discovers and protects sensitive information in S3. It is designed for the Information Security (InfoSec) team to monitor data in their S3 buckets.

Amazon Macie Dashboard

Features

  • Data Classification: It identifies PII using a set of regular expression detectors against a variety of file types including MS Office and PDF files. It can then classify files using rules-based themes and a Support Vector Machine (SVM) classifier. Custom detectors are not supported at this time.
  • Access Monitoring: In addition to classification, Macie can monitor access and send alerts when there is anomalous activity such as a sudden increase in download activity.
  • Alerting via CloudWatch: CloudWatch Alerts can be created to monitor events

Considerations

  • Because the data needs to be in your S3 bucket, you will need data in a S3 bucket that your organization manages. You can use RingCentral’s Call Log and Message Store APIs to retrieve data to store in your S3 buckets.
  • Macie supports a variety of file types including MS Office and PDF, but some files will need pre-processing including audio, video and images. The AWS Transcribe API can be used for speech-to-text transcription.
  • Macie is an information security tool first, and as such, not API-driven. It is primarily a UI-based classification and alerting tool for InfoSec teams monitoring their S3 content.

Links

Google DLP API

The Google DLP API is an API to classify and redact sensitive text content information. It includes an API with language-specific SDKs, customization support, redaction support, ability to operate on file storage such as Google Cloud Storage (GCS) and BigQuery, and can also operate on images.

Google DLP API Dashboard

Features

  • Data Classification: 90+ pre-built detectors with the ability to support custom detectors
  • Filetypes: This API expects data in text format so the text content needs to be extracted first, such as using the Google Speech-to-Text API on audio files. It can handle text supplied via API or stored in Google Cloud Storage (GCS), Google BigQuery, and Google Cloud Datastore.
  • Text Redaction: The DLP API has a number of ways to protect sensitive data including (a) replacement with generic text, (b) a custom text or token or (c) encryption of the data.
  • Image processing including JPEG, BMP, PNG, and SVG.

Considerations

  • Google DLP API is a flexible API solution that supports customization and works with the Google Cloud including GCS and BigQuery.
  • It requires transcription of audio and PDF files.
  • It does not support audio file redaction.

Links

VoiceBase PCI, SSN, PII Detection

VoiceBase, is specifically focused on voice. As such, it can directly receive an audio file and detect / redact the audio file, as well as perform and redact text transcriptions.

VoiceBase Redaction

Features

  • Works directly with audio. Detection includes start and end timestamps for each occurance.
  • Detects credit, debit, payment card numbers; expiration dates; CVV values; SSN; and other PII numbers.
  • Redaction for both text transcripts and audio files.

Considerations

  • Specifically and especially useful for call recordings and voicemail when redacted audio is necessary. This can support anonymized audio analytics including words per minute, voice volume, etc.

Links

Connecting to RingCentral

Data Loss Prevention is an important aspect of any enterprise. Using solutions from AWS, Google and VoiceBase, you can put together the right solution for your RingCentral data. We have used these APIs and will be putting together example code for hooking up these solutions to your RingCentral data.

The following RingCentral APIs are generally used to retrieve call and text data for use with these services:

If you have any questions, please post here, on our Developer Community or on Developer Glip Chat. If you have a RingCentral account or wish to enroll for a free developer account, please visit https://developer.ringcentral.com .

--

--

John Wang
RingCentral Developers

AVP Platform Products for @RingCentral with a focus on improving life through innovative products and software