Evolving Landscape of Analytics on AWS

Jeremy Gilmore
State of Analytics
Published in
4 min readDec 10, 2017
SageMaker announcement — AWS re:Invent2017

Far too many to share in this post, but, there were a lot of analytics related announcements coming out of Amazon Web Services (AWS) re:Invent 2017. Some recurring themes in new services and improvements to existing services support advancements for analytics in both small businesses and enterprise corporations. Development environments have gotten a lot more consistent and integrated. Building on the platform is easier than ever, but there is still effort required to manage and support your applications and infrastructure.

Below are some notable service announcements and how they could add value to your platform:

Containers

Working with Docker and containers is becoming more widely adopted in cloud architecture designs. Traditional container deployments rely on containers being deployed to one or multiple nodes on a cluster of servers. AWS has alleviated the need to manage these clusters with AWS Fargate; single container deployments. If you need to manage multiple containers then consider the new managed elastic container service for Kubernetes (EKS). Both services make deploying containers on AWS much easier.

Serverless

For architectures where the cloud ecosystem is designed as an application, functions as a service is critical to successfully integrating services. AWS’s serverless compute service, Lambda, has gotten more memory which allows for more complex functions. Cloud9, a cloud-based IDE that among other features allows for multiple code contributors, allows for testing of Lambda functions. This functionality was not possible before, and drastically improves the ability to develop and test with Lambda.

Graph db as a Managed Service!

Working with graph databases isn’t ideal for every situation, but it can be useful to understand relationships between records or events. Analyzing networks and relationships just got a lot easier in AWS with their new managed database service, Neptune. Easily transfer data into the graph db and begin to query directly, or use third party tools like Graphistry as a reporting layer and visualization service.

Query Your Flat Files

Now there’s no reason to load entire flat files, then filter for the data you want. With S3 Select and Glacier Select filter before using the data in your applications. Amazon boasts up to a 400% improvement in speed and reduction in cost by up to 80% by using S3 Select over the more traditional practice of downloading, decompressing, and processing multiple files.

Machine Learning as a Managed Service

Would-be developers no longer need to be intimidated by machine learning. A fully managed service, SageMaker allows developers to build, train, optimize, and deploy machine learning models at scale. As with most of the other recent AWS announcements, the release of SageMaker dramatically improves the ability to develop, iterate, and deploy. For developers, they now have a framework to integrate complex systems. For data scientists, they now have a framework to deploy models at scale. AWS has made significant progress to integrate data sources and make machine learning a part of your eco-system. Be aware that analyzing and interpreting ML models still requires advanced training and experience.

Working with Text

Last year AWS came out with Lex and Polly which allowed developers and architects to think of text beyond blobs of characters. Three new AWS services continue that tradition. Comprehend is a Natural Language Processing (NLP) service that enables sentiment and context to be derived from text documents. Translating text is now a real-time service on AWS with Translate. Converting speech to text with Transcribe is now possible with AWS’s new automatic speech recognition service.

Do It Yourself Machine Learning

If you prefer to build data science models yourself, AWS has developed and shared a respectable library of Jupyter notebooks and examples which are great templates to start building models. Regardless of whether you are working with Tensorflow or MxNet to build an image recognition model or recommendation engine, the landscape is now easier to navigate.

Other Notables

Ingest streaming video and audio with Kinesis Video Streams. Image processing service Rekognition got a major boost, now capable of real-time analytics for batch and streaming video. Internet of Things (IoT) devices are now capable of receiving updates over the air, and take machine learning to the edge with Greengrass. Databases also received improvements with multiple masters for Aurora and DynamoDb. This feature adds database resiliency and the ability to write across multiple availability zones.

Putting it all Together

The suite of tools AWS now offers makes implementing the right solution easier with many new specialized managed services. An evaluation of your current architecture to see where some of these new tools could be integrated can help propel your analytics platform forward. Regardless of your company’s analytics maturity level, there is a place for some of the services mentioned above to be incorporated into your platform. Making an investment to upgrade your analytics infrastructure will allow you to gain more insights, lead to better decisions, and derive more value.

--

--