AWS Summit 2017: Glue, Macie, Spectrum, and more…
The scene was buzzing as tens of thousands gathered on Monday August 14th at the Javits Center in New York for the AWS Summit New York 2017. The day began with registration and a light breakfast followed by the opening Keynote.
Technologists from every walk of life — from partner companies, independent developers, tech evangelists, and startups flooded the overflowing keynote auditorium to kick off the day.
Amazon’s Keynote Speakers covered new integrations with existing products, but also announced a few entirely new services such as Glue and Macie. Alluding to the struggles of ETL prep-work — that’s Extract, Transform, and Load — they announced Glue which claims to be a “Simple, flexible, and cost-effective ETL.”
Amazon Glue comes on the back of many data cataloging services from other providers, such as Microsoft’s Azure Data Catalog, combined with the shift toward very much de rigueur serverless applications. What’s a bit different about Glue though, is that it’s a fully managed data catalog and ETL service that automatically crawls and constructs data sources using pre-built classifiers.
The unlock? Light-touch ETL for most of your data, and automatic generation of code to execute transformations. You can even start digging into the data with Python and Spark with just a click. Lastly, code bookmarking from other services such as Lambda lets your scripts take note of any new data so you’re just running jobs on what’s new. Glue can:
- Serve as a fully-managed data catalogue & ETL service
- Provision & manage under-the-hood
- Discover & catalogue data on a schedule or when data changes
- Auto-generate transform code on Python
- Run ETL Jobs on catalogue scripts
Speakers at the opening keynote also announced Macie, after emphasizing the wide coverage of Machine Learning throughout the sessions following. Macie is “A machine learning-powered security service to discover, classify, and protect sensitive data.”
What’s so special about Macie? Macie is a bit like Life-Lock for your S3 Data in that it detects egress traffic for directories and “continuously monitors data access anomolies for anomalies, and generates detailed alerts when it detects risk of unaluthorized access or inadvertent data leaks.”
Slide notes indicate that “identifying and protecting sensitive data is a challenging task today” and Macie addresses that problem by providing a dasboard on globally shared content and tracks continual compliance on standards such as GDPR, PCI, and PII.
With Macie, companies will have much more insight into provisioning effectiveness and protecting themselves by identifying potential leaks of PII before they become a serious problem.
AWS Athena & Redshift Spectrum
Athena is not really a new service offering from AWS, since it launched in November 2016. However, it’s worth noting that some sessions were geared towards having ability to query S3 content using standard ANSI SQL, hinting at how integral it might become to services such as Glue.
Redshift Spectrum gives you the ability to run queries on large data sets (exabytes) from a S3 Data Lake with virtually no loading or transformation of that data — and even with colder datasets.
There were many popular sessions that filled up quickly, such as the Alexa Skills workshop, Security Jam, and sessions with very few seats, like the Lightning Talk with Buzzfeed in the “Startup Loft.”
Many sessions also revealed a little bit of magic behind Amazon services, like the “7 Things You Must now to Build Better Alexa Skills” session, which started with humble dialogue about the beginnings of Alexa from the Alexa team.
The “Building Your Data Lake on AWS” session covered many services for stocking, fishing, searching, and indexing the lake. They also shared a neat service called a using “Kinesis Data Generator” that lets you populate an S3 Location with data.
The day was a great overview of both existing and new services and products. Between the sessions I attended on Glue, Macie, and data lakes, the free admission was more than worth spending the entire day learning about cloud technologies. Lastly, I would highly recommend checking out these other resources below: