DLP for Developers Overview— Amazon Macie, Google DLP API and VoiceBase
We manage cloud business communications for customers at RingCentral, so our servers host a lot of data including call recordings, voicemail, SMS, MMS, and fax. Some of our customers are interested in Data Loss Prevention (DLP) to detect and redact Personally-Identifiable Information (PII) and other sensitive data, so it is useful to review some of the solutions provided by our partners, namely Amazon Macie, Google DLP API and VoiceBase. These solutions work differently so we’ll take quick overview of each. If you have any questions on how these can work with your RingCentral data, please reach out to us.
TL;DR
These three tools have different capabilities can and be used in different ways.
- Amazon Macie is a security tool for Information Security staff to monitor the data in their S3 buckets. Audio and video data must be transcribed first. There is no API, aside from CloudWatch alerts.
- Google DLP API can classify and redact sensitive data. It supports a number of customizations including dictionaries, regular expression patterns and detection rules. It requires text data as input and can work on data already in GCS, BigQuery and Cloud Datastore. Audio must be transcribed such as using the Google Speech-to-Text API.
- VoiceBase API supports classification and redaction and can act on audio directly as well as text transcripts. This is the only solution that can redact an audio. You can also send it a link to a media file such as a RingCentral media URL with an access token query parameter.
All three of these can be used with RingCentral’s voice and video data. The approaches include transcribing the data and copying it to the relevant storage location such as S3 or GCS, or submitting the data via API.
Amazon Macie
Amazon Macie is a service automatically classifies, discovers and protects sensitive information in S3. It is designed for the Information Security (InfoSec) team to monitor data in their S3 buckets.
Features
- Data Classification: It identifies PII using a set of regular expression detectors against a variety of file types including MS Office and PDF files. It can then classify files using rules-based themes and a Support Vector Machine (SVM) classifier. Custom detectors are not supported at this time.
- Access Monitoring: In addition to classification, Macie can monitor access and send alerts when there is anomalous activity such as a sudden increase in download activity.
- Alerting via CloudWatch: CloudWatch Alerts can be created to monitor events
Considerations
- Because the data needs to be in your S3 bucket, you will need data in a S3 bucket that your organization manages. You can use RingCentral’s Call Log and Message Store APIs to retrieve data to store in your S3 buckets.
- Macie supports a variety of file types including MS Office and PDF, but some files will need pre-processing including audio, video and images. The AWS Transcribe API can be used for speech-to-text transcription.
- Macie is an information security tool first, and as such, not API-driven. It is primarily a UI-based classification and alerting tool for InfoSec teams monitoring their S3 content.
Links
- Homepage: https://aws.amazon.com/macie/
- Documentation: https://docs.aws.amazon.com/macie/latest/userguide/
Google DLP API
The Google DLP API is an API to classify and redact sensitive text content information. It includes an API with language-specific SDKs, customization support, redaction support, ability to operate on file storage such as Google Cloud Storage (GCS) and BigQuery, and can also operate on images.
Features
- Data Classification: 90+ pre-built detectors with the ability to support custom detectors
- Filetypes: This API expects data in text format so the text content needs to be extracted first, such as using the Google Speech-to-Text API on audio files. It can handle text supplied via API or stored in Google Cloud Storage (GCS), Google BigQuery, and Google Cloud Datastore.
- Text Redaction: The DLP API has a number of ways to protect sensitive data including (a) replacement with generic text, (b) a custom text or token or (c) encryption of the data.
- Image processing including JPEG, BMP, PNG, and SVG.
Considerations
- Google DLP API is a flexible API solution that supports customization and works with the Google Cloud including GCS and BigQuery.
- It requires transcription of audio and PDF files.
- It does not support audio file redaction.
Links
- Homepage: https://cloud.google.com/dlp/
- Docs: https://cloud.google.com/dlp/docs/
- Blog: https://cloudplatform.googleblog.com/2018/03/take-charge-of-your-sensitive-data-with-the-Cloud-DLP-API.html
VoiceBase PCI, SSN, PII Detection
VoiceBase, is specifically focused on voice. As such, it can directly receive an audio file and detect / redact the audio file, as well as perform and redact text transcriptions.
Features
- Works directly with audio. Detection includes start and end timestamps for each occurance.
- Detects credit, debit, payment card numbers; expiration dates; CVV values; SSN; and other PII numbers.
- Redaction for both text transcripts and audio files.
Considerations
- Specifically and especially useful for call recordings and voicemail when redacted audio is necessary. This can support anonymized audio analytics including words per minute, voice volume, etc.
Links
- Solutions Page: https://www.voicebase.com/use-cases/compliance-monitoring/
- Detection Docs: http://voicebase.readthedocs.io/en/v3/how-to-guides/pci-ssn-pii-detection.html
- Redaction Docs: http://voicebase.readthedocs.io/en/v3/how-to-guides/pci-ssn-pii-redaction.html
Connecting to RingCentral
Data Loss Prevention is an important aspect of any enterprise. Using solutions from AWS, Google and VoiceBase, you can put together the right solution for your RingCentral data. We have used these APIs and will be putting together example code for hooking up these solutions to your RingCentral data.
The following RingCentral APIs are generally used to retrieve call and text data for use with these services:
- Company Call Log
- Company Call Log Sync
- User Call Log
- User Call Log Sync
- User Message List
- User Message Sync
- Subscriptions / Webhooks
If you have any questions, please post here, on our Developer Community or on Developer Glip Chat. If you have a RingCentral account or wish to enroll for a free developer account, please visit https://developer.ringcentral.com .