Conference @ Scale: the biggest announcements and highlights from AWS re:Invent 2017

Dave Sanders
The Telegraph Engineering
17 min readDec 8, 2017
Taking the stage: Andy Jassy, the CEO of AWS, at re:Invent

I was lucky enough to attend this year’s AWS re:Invent summit in Las Vegas along with 42000+ other like minded technologists. This is a short account of my time at the conference, including interesting breakout sessions, AWS announcements and a little entertainment along the way.

Sunday

Landed in LAS airport after an 11hr flight from Gatwick. There is nowhere else on the planet with this atmosphere and theatrical skyline.

Welcome to Vegas!

After checking into Caesar’s Palace (and spending 20mins trying to find an exit from the casino!) I went over to The Venetian to register and collect my swag. I got a pretty decent hoodie, but I’d have still liked an Echo Dot to go with it.

Monday

Breakout Session : The first session I went to was presented by Dean Perrine from Fox Networks.

Dean presented a use case for National Geographic where they are looking at the AWS Rekognition for auto-tagging their extensive photo library. It’s the first time I’ve come across Step Functions being used to create a multi-stage media processing pipeline and ensure that certain actions, such as ID management and writing to ElasticSearch and DynamoDB, take place in an ordered and guaranteed way.

Image processing pipeline orchestrated using step functions.

Dean stated the importance of asset IDs and the ability to have parent/child relationships for versions/renditions etc of the same asset. This resonated with me since The Telegraph is consolidating search and discovery of multiple media asset types, from multiple repositories behind a single consistent asset API. The handling of IDs from multiple sources never ceases to be a challenge!

A very valuable point made by Dean and other speakers throughout the week was that machine learning (ML) algos such as AWS Rekognition can only get you so far. Inevitably ML algorithms will make mistakes but it is critical to recognise …

“Niche deep learning costs definitely outweigh human-aided corrections.”

This is where Mechanical Turk becomes useful to either create labelled datasets to (re)train existing algorithm or just to manually override incorrect outputs from an algorithm. The example cited referred to the chihuahua vs muffin challenge which defeats even the world’s most advanced machine learning platforms. The costs of simply relabelling any mistakes far outweigh the costs of refining the algorithm to cater for this edge case.

Announcement : There was a comprehensive set of new media services for video announced today. For The Telegraph, this feels late to the party, as we already have established video storage, transcoding and delivery mechanisms in place. For any new companies, the offering looks to be a comprehensive toolset.

Breakout Session : The second session of the day was about building data lakes in S3. The presenter was George Smith, an AWS Solution Architect who went through an interesting architectural approach to building data lakes using S3.

S3 at the heart of the datalake.

The approach centered on keeping a low-cost, structured data lake in S3 using parquet compression. At query run time (ie when we need to answer a business question) the most appropriate underlying query store is chosen (eg EMR, Redshift, Aurora). The data is loaded from S3 and into the appropriate store and the query is executed. Once the query completes, the underlying query store data and infrastructure is torn down.

The obvious concerns over performance seemed to be negated by several factors.

  1. Parquet format is well compressed (75Gb in Oracle compressed to 3.4Gb in parquet) so moving this data across network infrastructure is acceptable.
  2. Choosing the ‘right tool for the job’ means that the business query executes in an optimal way rather than having a ‘one-size-fits-all’ query store.
  3. Only the data required to answer the business question is loaded into the query store. For example if the business question to answer is ‘give me all the customers who bought product X over the last year broken down by month’, then only one years worth of data from S3 is loaded into the query store.
  4. Generally, queries on data lakes are long-running tasks, so the time taken to commission the infrastructure and load the underlying data is still small, relative to the overall query execution time.

The other obvious benefit of this architectural approach is the reduced cost in both storage and compute costs since the query infrastructure is only commissioned at the time at which queries are required to run. If no queries are needed, the only costs incurred were S3 at $0.08 per month. Pretty impressive for a data lake!

Breakout Session : Media and Entertainment — State of the Nation

A group of speakers from across the media industry talked about their experiences with AWS.

Steve Kowalski from Imageworks talked about their use of AWS as burst capacity for highly CPU-intensive rendering activities. They managed to scale to 75,000 vCPUs before things started to break! With some additional modification, such as writing a bespoke scale-back implementation, they regularly run this kind of burst capacity.

John Herbert, from Fox, talked about rewriting its entire media platform in AWS and some of the challenges of moving 10Pb of data to AWS. Additionally they have 200 billion rows of analytics data for which they have seen a 30% improvement in query performance by moving to AWS.

Fabio Luzzi, from Viacom, presented slides on the democratisation of data science. This resonates with most companies who have invested significant capital in building data science and data insight capability, but are struggling with how to surface this capability to the business users themselves.

Viacom is working on the ultimate democratisation of data in that it is writing an Alexa skill for business leaders to interact with from their desks. The pilot is a chatbot on Slack that has gone down well. Echo Dots for everyone please!

Rajneel Kumar, from Viacom 18 India, talked about a really interesting challenge which is presented to media businesses in India. With the average cost of a mobile handset in India at less than $100, mobile devices are significantly constrained in storage and cpu to run rich media apps. Rajneel talked about consumers having to make decisions around whether to keep the family photo on the phone or deleted it to install an app. Clearly this is not a difficult choice for the consumer!

Viacom 18 have trialled VOOT lite which is an OTT (Over The Top) service presented as a PWA. Being a PWA, there is no app to ‘install’ but it feels and behaves like a mobile app. The PWA is built in such a way as to leverage the limited capacity and compute of the handset. So far their trial has proved very successful. Well done Viacom 18 for finding a great use case for building a PWA!

A great use case for a PWA.

Tuesday

It was an early start for me as I took part in the 4k charity fun run.

The run started at 6am from The Mirage and we ran an out and back circuit down Frank Sinatra Dr. which they closed to traffic. It was a gorgeous run with the sun rising over the Nevada mountains on the way back in.

Not a bad run to have on my list.

The run was for a couple of worthy causes — Girls Who Code and the American Heart Foundation.

Keynote : I attended the GPS Keynote by Terry Wise which was an interesting look at how the AWS partner and professional services function has grown. Matt Wood highlighted some great work implementing ML for early detection of diabetic complications in eyes and tumor recognition of images. Pinterest are using their ML services to allow people to select areas of images and search for ‘similar’ things. Expedia are also using it to understand the ‘best’ image in terms of lighting, landscape etc for a particular location.

Announcement : AWS Privatelink can now be used for connecting to services in the AWS Marketplace. This means that when using a SaaS provider from the marketplace, all network traffic stays within the AWS network, improving security and reducing latency.

Andy Jassy presented three key opportunities which he sees in the tech space.

  • Enterprise database opportunity.

I’m yet to find an Oracle customer who is happy about it

Andy points out the opportunity for aurora where customers are free from long licensing commitments, unexpected price rises and expensive support models.

  • ML opportunity. The vast majority of consumers of ML find the tools are hard to use and complex to build. There is an obvious need to simplify them. AWS are working hard on this (see SageMaker announcement below).
  • IoT opportunity. In terms of adoption of an emerging technology, IoT has exploded in terms of the sheer number of devices in production use today. Since these devices have very limited access to compute resources, they therefore disproportionately need the cloud in order to meet the anticipated functionality.

Breakout Session : I attended an afternoon session relating to Blockchain technologies and some use cases for its use. There were representatives from T-Mobile, Intel and PwC who talked about adoption of blockchain and specifically the Sawtooth distributed hyperledger technologies.

There were some interesting use cases around auditing and compliance where a node could actually be provisioned and accessed directly by the auditors themselves, allowing them complete access to audit trails in a secure and scalable way. This vastly reduced the cost to companies in terms of supporting an audit. From a media perspective, there was a use case for tracking asset ownership and usage using the distributed ledger.

That said, every encounter I’ve had with blockchain technology so far has had a number of themes which really calls out the maturity of this technology. Firstly, virtually no one is running this type of technology in production (apart from the obvious cryptocurrencies). Secondly, the technology itself seems complex to understand and difficult to use. Lastly, it is full of unhelpful jargon which as an ‘outsider’ is hard to penetrate and obfuscates its usefulness. In order to be adopted universally, it feels like it will needs to follow the path of Machine Learning (confusion matrix?!?) which is now being democratised for the ‘everyday developer’.

Wednesday

Keynote: The Andy Jassy keynote proved to be one of the main events of the conference where many of the new products and services were announced. Andy started with a summary of cloud computing as a whole and the fact that AWS market share for cloud computing was up from 39% to 44% this year.

In terms of features offered, Andy presented the following slide which you were invited to fill in the vendor names! Very inventive and drew a chuckle from the crowds.

Not the hardest game of ‘guess the vendor’!

There were far too many announcements to cover here, but I’ll touch on a few which I found relevant to The Telegraph and some which are just interesting!

Announcement : ECS now supports Kubernetes! This got a really good reception and is something which AWS has listened to its customers. Elastic Kubernetes Service (EKS) is now in Preview and provides a managed Kubernetes cluster. AWS have had support for containers using ECS for some time, however the rise of Kubernetes as the ‘developers choice’ for managing container clusters and microservices being an established architectural implementation pattern, AWS have recognised the need to support it.

Announcement : A fully managed container service — Fargate. This is possibly the easiest way to deploy and manage container clusters. All the scaling, orchestration and management of the cluster is abstracted away from the developer. As a company exploring the possibilities of providing an internal PaaS to our internal dev teams, this is a really appealing option. Since support for Kubernetes (EKS) is not due until 2018, I suspect this service will see some significant adoption at that point.

Andy talked about the huge adoption of microservices and therefore container technology but also the move to serverless architectures. There has been a 300% increase in the use of Lambda in the last year!

The keynote moved onto persistence technologies with a few more digs at Enterprise database technologies and one vendor in particular.

Another poke at ‘you know who’.

There were announcements made around the preview of multi-master support for Aurora and Aurora serverless. Andy also stated that Aurora is the fastest growing service in AWS ever. Looks like relational databases still have a strong presence in any Enterprise stack.

Moving onto document persistence stores, Dynamo continues to perform well and has a number of new features such as global tables (across regions) and point in time recovery to the second.

Interestingly, Andy went on to say that at its peak on Amazon prime day in July, one of Amazon.com dynamo databases was handling 12 million transactions per second with trillions of transactions taken that day. It really illustrates the scalability of NoSQL technologies.

Announcement : AWS has a graph database — at least in preview — with the release of Neptune, the AWS fully-managed Graph database. It supports RDF, as well as multiple query languages such as Tinkerpop and SPARQL.

As expected Amazon have put significant resources into Machine Learning and its associated disciplines. There were many announcements of new features, services and platforms (which I won’t cover all here) which is an indicator of how seriously AWS is taking ML as a technology.

Announcement : A potentially revolutionary announcement around their ML platform for data scientists and developers — Sagemaker. The platform intends to greatly simplify the development, deployment and management of ML models and provide a fully managed hosting service.

The platform recognises that most of the real world problems require use of one of the relatively well-known set of algorithms (random forest, naive bayes etc). Sagemaker provides pre-build notebooks using these well established algorithms allowing developers to get up and running in a very short period of time.

One of the most interesting features of the platform is the ability for Sagemaker to run multiple versions of your model with different hyperparameter settings. ML is then used to figure out what the optimal values are for your hyperparameters. ML to tune ML models! Sagemaker also allows for simple deployment onto autoscaling infrastructure.

Announcement : Matt Wood hosted a great demo of a new piece of hardware and software Deeplens, intended for ‘regular devs’ to build, deploy and run machine learning algorithms, making ML accessible and understandable to the masses.

Deeplens has an HD video camera and integrates into the AWS ecosystem using AWS Kinesis Video Streams and allows access to AWS rekognition APIs as well as the ability to deploy custom models via Sagemaker.

Matt’s demo centered around deployment of two models. One to recognise an album cover by holding it in front of the video camera and another model to detect the sentiment on Matt’s face whilst holding up the album. Ie a visual album review app! He held up an album cover and the camera detected both the album and his sentiment.

Apologies for the quality of the photo. It doesn’t do the demo justice.

Matt is apparently not too keen on Rick Astley’s Whenever You Need Somebody (although, who is?). An interesting demo and an exciting product which would be great as the basis for a hackathon!

Breakout Session : After the announcement heavy keynote, I attended a session on edge side technologies. Specifically Cloudfront, AWS Shield and WAF. They articulated the benefits of using Cloudfront simply for DDoS (Distributed Denial of Service) protection (ie no-cache).

Flux 7 presented a really interesting use case for helping to meet compliance requirements for one of their clients. The use case centered around blocking access to their site from the ‘dark web’.

Exit nodes from TOR are widely publicised, but do change on a frequent basis. It is therefore not possible to come up with a static firewall rule that will block access from all TOR exit nodes. Flux 7 therefore came up with a solution that uses a scheduled Cloudwatch event, which triggers a Lambda function to scrape the latest TOR exit node IPs from a website. The resulting IPs are used to dynamically configure the WAF (Web Application Firewall) and meet their compliance requirement.

Thursday

Keynote : Werner Vogels eagerly awaited keynote.

A highlight for me was the DJ leading up to the presentation who managed to mix DevOps mantras into a decent tune. “You build it, you run it” was the highlight track for me! Unfortunately, I was too busy laughing to capture all of the set on my phone, but hopefully the video snippet below captures the tail end of it.

Werner arrived on stage to huge applause and looking his usual casual and confident self. It was a nostalgic look back for him on his five years presenting at re:Invent. He talked about 21st century architectures and how the landscape of technology has changed.

Particularly how previous technology has been constrained by the UX. From green screens to typing to swiping the paradigm has always been that the device has always driven the human. However, that is about to change with the human now driving the device.

Alexa of course.

Announcement : “Has anyone ever had a problem with technology in a conference room?” I think we can all answer that with a resounding “yes”. With that, Werner announced Alexa for Business. The intention is to replace complex and expensive video conferencing kit with echo dots. The ability to schedule and book meetings, host conference calls, raise IT tickets and access company information sounds like a real productivity opportunity. That said, it wasn’t clear exactly what comes “out of the box” with Alexa for Business and what would need custom skills.

Werner then introduced Nora Jones, from Netflix (co-author of Chaos Engineering) who presented an interesting talk on how Netflix do chaos engineering. This included how they intentionally create problems in production — whilst making sure they stay within tolerance for one of their key business metrics. This ensures Netflix can tolerate outages such as the S3 outage in Feb 28th this year. It was a great talk and one quote stuck out for me …

Chaos doesn’t cause problems, it reveals them.

Well said Nora!

Werner then went onto talk about AWS Well Architected Framework which is a detailed set of principles, guidelines and best practice when developing and deploying solutions using AWS. As a certified solution architect myself, I am familiar with the framework and have found it a useful knowledge base particularly around security practices and architectural patterns for availability and scalability.

Announcement : If you like lambda, there were several announcements. First was Werner’s announcement of web based Cloud 9 IDE. Cloud 9 is the AWS IDE which eases the burden of writing, running and debugging code. Specifically it allows you to develop, test and debug lambda code! Debugging lambda code has always been a challenge with the need to often resort to debugging via log messages technique. Cloud 9 should help solve this.

Whether Cloud 9 is rich enough to draw people away from the likes of Eclipse, IntelliJ or the many other IDE’s (or simply Sublime) time will tell.

Announcement : One of the big networking announcements was the API Gateway VPC integration. This is the change to API Gateway which now allows access to non-public resources in your VPC. No more public endpoints other than your API Gateway! This is a great step forward and potentially gives API Gateway an edge over other non-AWS API Gateway products.

Breakout Session : Netflix AB testing using DynamoDB. It wasn’t immediately obvious from the title how AB testing and DynamoDB could relate to each other, but Alex Liu from Netflix presented an interesting use case which allows them to package just the js assets required for the target runtime.

Netflix is a heavy user of AB testing so at any one time there are likely to be hundreds of tests active on the site. With the addition of having to support multiple browsers (including IE) they were seeing an ever growing TTI (time to interactive) which is one of their key business metrics. With personalisation in addition to AB testing, their front end bundle was around 8Mb of assets! What they needed was conditional dependencies.

The solution was to build a runtime packaging service which they called codex. There were several steps to the process:

  1. Build a full dependency graph.
  2. Evaluate dynamic inputs. eg browser, location, device type, membership status.
  3. Create ‘truths’ which are essentially the feature flags and personalisation for that user. E.g. instantSearch: true.
  4. Apply dynamic inputs and truths against dependency graph to get only the dependencies which are required. This was codenamed codex.

Since codex is now critical to the availability of the site, they built it with resilience in mind.

Netflix packaging architecture.

DynamoDB was used as a persistence layer to dereference the ‘truths’ to a location of the dependency descriptor in S3.

{ oldSearch: false } -> { newSearch/v3 }

S3 was used to store a dependency descriptor document for that particular version of that feature.

{ newSearch/v3 } -> { foo-1.0.js, bar-2.2.js }

Codex packages just the dependencies required and returns.

They had to make this resilient so they ended up replicating S3 across three regions (working around the limitation of S3 only replicating to one other region). This meant they were one of the few AWS customers who did not experience any downtime during the S3 outage in US-East-1 on Feb 28th. They claim this is due to implementation of chaos engineering which simulates the loss of a complete AWS region.

It wasn’t entirely clear for their plans to open source codex, however they did mention that they are likely to release it as a webpack at “some point”.

re:Play Party

This was hosted in the Linq Lot where they had built a whole playground. The essentials for a good party are entertainment, a great crowd, music, food, drink and drink. Luckily re:Play provided all of these in abundance!

Quite a big tent.

The party was partly outside and partly inside with games like giant tetris, dodgeball, ‘bubble’ football.

For me personally, the music was the star attraction with a big name DJ headlining in the ‘main tent’. OK, it wasn’t Glastonbury, but it was an excellent night, with a buzzing atmosphere and great music.

re:Play in full swing.

It’s just a shame it ended at midnight!

Friday

Breakout Session : One last session to blast away the cobwebs before my flight home. This was a Deep Dive into EKS presented by Eswar Bala.

The session went into a lot of the networking and orchestration aspects of the new EKS service. Not necessarily relevant to developers, but more interesting to those who want to know what’s happening inside the EKS box.

A few interesting points came through :

  • EKS uses the standard open source Kubernetes so existing clusters running open source and custom plugins can port to EKS seamlessly.
  • By default EKS runs across 3 AZs so has resilience built in.
  • As expected, EKS integrates with IAM.
  • Kubernetes masters can be accessed directly from a VPC using PrivateLink. Ie it does not require the endpoint to be public.
  • AWS have built an open source CNI plugin that anyone can use with their Kubernetes clusters on AWS. This allows you to natively use Amazon VPC networking with your Kubernetes pods. (Although there is a note in the summary stating that it should not be used for production workloads!)

Homeward Bound

After the heavy session on EKS and with my body-clock in ruins, it was time for me to head to LAS airport and grab a plane home.

As a technologist and user of AWS, the conference was informative, varied, entertaining and well organised. There were some minor gripes around some sessions being oversubscribed and the shuttle buses taking longer to get between venues than walking! But those were minor issues and did not overshadow the quality of the conference.

All the speakers I saw did an excellent job and I’ll take away some really valuable use-cases and technology insights of which I have only covered a small fraction in this blog.

The content, speakers, venues, people, food, entertainment and above all the atmosphere was spectacular. I would urge anyone with the opportunity to go to fully embrace it.

Well done AWS. After organising re:Invent, running one of the world leading cloud computing platforms should be easy!

Dave Sanders is the Head of Architecture at The Telegraph.

--

--