AWS Basics Training for Data Analysts and Data Scientists

Data analysts/scientists at Ro need to have a general grasp of AWS infrastructure. They also need to know how to do things safely and what questions to ask an administrator.

Sami Yabroudi
Dec 2, 2019 · 4 min read
Image for post
Image for post

AWS Concepts

Start by running through AWS basic concepts:

Understand AWS Permissioning

The worst but most important part of AWS is permissions (called IAM — Identity Access Management). Do the “Identity and Access Management” section here, including the lab:


An EC2 instance is basically a big computer/server in the cloud. EC2 is the most fundamental piece of infrastructure that Amazon offers, and applications running on variations of EC2 power much of Ro and much of Ro Data’s custom infrastructure. Go through the EC2 section here, including the linux version of the lab:


S3 is the main file storage tool for AWS. When S3 famously went down in 2017, the internet “broke”. S3 is integral to any usage of AWS. Go through the S3 section here, though only do the first lab:

RDS (And DynamoDB)

RDS is Amazon’s main transactional database, and it powers Ro’s main applications (Data team refers to it as prod db). DynamoDB is AWS’ competitor to MongoDB — it’s a document storage database (so no tables or columns, but rather a faster way to store and look up large blobs of text). Do “Database Services”->“Summary of AWS Database Services” and “Database Services”->“RDS and DynamoDB Basics” here:

Monitoring and Alerting

Ro’s favored tool for monitoring is DataDog. The purpose of monitoring is to make sure that all systems and infrastructure are running smoothly, and to alert when something goes wrong. Monitoring can be done at the level of infrastructure (ex: EC2 memory usage, RDS transaction rate) but also within applications.

Boto 3

Boto 3 is AWS’s SDK for Python (use it in Python simply via “import boto3”). You can use Boto 3 to do lots of what you can do via AWS’s graphical user interface (exs: access files in S3, manipulate EC2 instances, assume IAM roles, etc etc etc).


Let’s say you have a python function that takes inputs from an http request and does something with those inputs (exs: performs calculations and returns result, performs calculation and saves results to RDS). One way to host this function is to put it within an app and then put the app on an EC2 instance — that’s a large amount of work to host a rather trivial function. A far simpler approach is to turn this function into a lambda function; a lambda function is “serverless”, meaning you just set up the function using AWS’s UI and then the function can receive calls and execute without your ever having to provision an application, EC2, etc. Lambda functions can accomplish all sorts of stuff.

Ro Data Team Blog

Ro Data Team Blog: data analytics, data engineering, data…

Welcome to a place where words matter. On Medium, smart voices and original ideas take center stage - with no ads in sight. Watch

Follow all the topics you care about, and we’ll deliver the best stories for you to your homepage and inbox. Explore

Get unlimited access to the best stories on Medium — and support writers while you’re at it. Just $5/month. Upgrade

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store