AWS Basics Training for Data Analysts and Data Scientists

Data analysts/scientists at Ro need to have a general grasp of AWS infrastructure. They also need to know how to do things safely and what questions to ask an administrator.

Sami Yabroudi
Dec 2, 2019 · 4 min read
Image for post
Image for post

Estimated time for completion: 1.5 days

Before starting, make an account on linuxacademy.com .

AWS Concepts

Start by running through AWS basic concepts:

You can skim the “Conclusion” section

Understand AWS Permissioning

The worst but most important part of AWS is permissions (called IAM — Identity Access Management). Do the “Identity and Access Management” section here, including the lab:

The takeaway from this section will not be that you are a master AWS permissions admin but that you remember the general ideas about how policies and roles work. Pay special attention to the idea of an IAM user vs. an IAM role, and the idea that roles can be assumed.

EC2

An EC2 instance is basically a big computer/server in the cloud. EC2 is the most fundamental piece of infrastructure that Amazon offers, and applications running on variations of EC2 power much of Ro and much of Ro Data’s custom infrastructure. Go through the EC2 section here, including the linux version of the lab:

S3

S3 is the main file storage tool for AWS. When S3 famously went down in 2017, the internet “broke”. S3 is integral to any usage of AWS. Go through the S3 section here, though only do the first lab:

RDS (And DynamoDB)

RDS is Amazon’s main transactional database, and it powers Ro’s main applications (Data team refers to it as prod db). DynamoDB is AWS’ competitor to MongoDB — it’s a document storage database (so no tables or columns, but rather a faster way to store and look up large blobs of text). Do “Database Services”->“Summary of AWS Database Services” and “Database Services”->“RDS and DynamoDB Basics” here:

Monitoring and Alerting

Ro’s favored tool for monitoring is DataDog. The purpose of monitoring is to make sure that all systems and infrastructure are running smoothly, and to alert when something goes wrong. Monitoring can be done at the level of infrastructure (ex: EC2 memory usage, RDS transaction rate) but also within applications.

Please make sure to get a Datadog read-level account from IT. You may be able to access through Okta.

Watch this quick overview video of Datadog: https://www.datadoghq.com/product/ (video “Get full visibility into modern applications”)

Within Datadog:

Other information:

Boto 3

Boto 3 is AWS’s SDK for Python (use it in Python simply via “import boto3”). You can use Boto 3 to do lots of what you can do via AWS’s graphical user interface (exs: access files in S3, manipulate EC2 instances, assume IAM roles, etc etc etc).

Lambda

Let’s say you have a python function that takes inputs from an http request and does something with those inputs (exs: performs calculations and returns result, performs calculation and saves results to RDS). One way to host this function is to put it within an app and then put the app on an EC2 instance — that’s a large amount of work to host a rather trivial function. A far simpler approach is to turn this function into a lambda function; a lambda function is “serverless”, meaning you just set up the function using AWS’s UI and then the function can receive calls and execute without your ever having to provision an application, EC2, etc. Lambda functions can accomplish all sorts of stuff.

Do the “Serverless Compute” section here:

Ro Data Team Blog

Ro Data Team Blog: data analytics, data engineering, data…

Medium is an open platform where 170 million readers come to find insightful and dynamic thinking. Here, expert and undiscovered voices alike dive into the heart of any topic and bring new ideas to the surface. Learn more

Follow the writers, publications, and topics that matter to you, and you’ll see them on your homepage and in your inbox. Explore

If you have a story to tell, knowledge to share, or a perspective to offer — welcome home. It’s easy and free to post your thinking on any topic. Write on Medium

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store