Key Learnings from Cloud Academy’s Junior DevOps Job Role Path — Part I
I completed Jr DevOps Path from Cloud Academy — here’s my review and takeaways
I was recently selected to become an AWS Community Builder(CB). You can find all about the program here
There are multiple perks that come with being an AWS CB. My personal favorite was 1 year’s worth of Cloud Academy (CA) membership.
Cloud Academy has tons of courses which can easily lead to an analysis paralysis situation. This where CA’s Job Role Path comes to your rescue. I chose to go with Junior DevOps for the below reasons:
- I recently earned my AWS SAA and Azure SA certs. I wanted to learn DevOps as it is crucial in making a well rounded Cloud TPM/ TAM/ Solution Architect.
- I am actively pursuing AWS and Azure roles. I am noticing there is a surge of Cloud TPM/ TAM/SA roles that list DevOps knowledge and experience. This is where CA’s Hands-on labs and assessments can help you get a real good grasp of concepts and connect the dots.
My intended audience for this post are tech folks who are looking to get a taste of the DevOps world. Hope it helps you learn something new. If your are curious to continue these learnings, there is also a mid and senior pathway available on Cloud Academy’s platform here.
This will be a 2 part series. This first part will cover the following topics:
- DevOps Fundamentals
- Understanding Operations
- Operations with AWS — Level 1
- Monitoring and Alerting
- Monitoring and Alerting with AWS — Level 1
Let’s dive in. This is a summary from my notes. The entire coursework from Cloud Academy can be found here
DevOps Fundamentals
It is worth repeating,
“DevOps is the philosophy of efficiently developing, deploying and operating high quality, modern software systems”
The 3 tenets of DevOps are:
- Culture: This tenet focuses on collaboration and trust
- Automation: This tenet focuses on reducing human intervention (especially for mundane tasks) in the development -> deployment -> operation pipeline in order to optimize it
- Measurement: Metrics in DevOps should help enhance the quality of discussions. Impactful metrics include frequency of deployment, MTTR (avg amount of time to resolve problems with a production environment), MTTD (how quickly can you identify problems), Customer Complaints, Lead Time (feature request to feature release time), etc.
Understanding Operations (part of DevOps)
Operations engineer need to focus on three important aspects of a system availability, scalability and security:
I. Availability — availability is about having systems that are up and running and usable by those who need to use them. A few reasons that systems can become unavailable include:
- software bugs
- component overload (servers can only handle a finite amount of traffic depending on hardware, the OS, software running, network)
- natural disasters
- hardware failures
- malicious users
This is why targeting 100% uptime won’t be practical.
Helpful tips for building a (highly available) HA system:
- Consider real world constraints
- Avoid single points of failure (with redundancy both at component and environment level)
- Level of availability is a business decision and not a technical one
- Understand percentages (for eg difference between 99.95% vs 99.99% uptime)
II. Scalability — this has to be supported by the underlying technology stack. To understand if you can scale out, ask what would happen if you terminate server running your app. If the answer is something that needs to be addressed such as losing user uploaded assets, then your developers need to address that. However if your answer is more around, ‘we’ll need to deploy a new server with the latest version of the app’, then maybe you’re ready to scale. Ensure that if all servers need access to something, it’s centrally located.
Cloud platforms have made this easy. You can scale out based on metrics such as CPU load and scale back when the load goes down.
III. Security — cloud providers understand and know how to mitigate the Distributed Denial of Service (DDoS) attacks as they have had to do that for their own services for years.
Besides DDoS, other concerns to address include SQL infection and cross-site scripting. Having a web application firewall can help serve as an additional round of security. Patch management is also important; if your servers are not up to date, then you risk ending up with servers that are vulnerable to exploits.
Operations with AWS — Level 1
If you have prepared for your AWS SAA system, this next part should be a breeze. It includes an overview of VPC, CloudFront, R53, CloudWatch and CodeCommit.
Virtual private cloud(VPC) along with some of the network components
- VPC can be defined as an isolated segment of the AWS network infrastructure, allowing you to securely provision your cloud resources.
- Subnets segment your VPC infrastructure into multiple different networks
- To filter traffic at different levels:
NACL: work at subnet level — stateless
SGs: work at instance level — stateful
- Bastion host is a Jump server used to connect to private resources
- Connectivity options for your VPC to connect to remote locations include VPN connection (allows two networks to securely connect to each other across the internet), VPC Peering(connect two VPCs together using the internal AWS infrastructure), AWS Transit Gateway (one-to-many relationships with your VPCs)
Amazon CloudFront
Amazon CloudFront is a web service that speeds up distribution of your static and dynamic web content, such as .html, .css, .js, and image files. Two CloudFront design patterns include:
Pattern 1 — CloudFront to cache and secure content when an Application Load Balancer is the origin
Pattern 2 — CloudFront to cache and secure content when an S3 bucket is the origin.
🧪Hands-on Lab included where within the AWS console, you:
- Create an Amazon S3 bucket
- Create an Amazon CloudFront distribution
- Upload a demo website to an S3 bucket
- Serve S3 content through a CloudFront distribution
- Disable and delete a CloudFront distribution
Amazon R53
Route 53 allows for Domain name registration or DNS management. It also implements traffic management by routing internet traffic to resources for your domain and also ensures availability of your resources using health checks.
🔎Did you know: AWS offers a 100% available SLA for Route 53 because of the distributed nature of DNS system and high redundancy of AWS implementation.
Amazon CloudWatch
This is a global service designed to be your window into health and operational performance of your applications and infrastructure. Two key functional elements are metrics and alarms:
- CloudWatch Metrics enable you to monitor a specific element of an app or resource over a period of time
- CloudWatch Alarms enable you to implement automatic responses and actions based on custom thresholds defined for your metrics
AWS CodeCommit
To appreciate this service, you need to understand how Git repositories work. Git enable multiple developers to work on same code bases without overwriting each other’s code and provide versioning to roll back their changes in case of issues. These git repositories are hosted in source control services such as Github, GitLab, Bitbucket, and you guessed it, AWS CodeCommit.
AWS CodeCommit is a fully managed git-based source control service.
Any CI/CD pipeline begins with a service where you commit your code changes. AWS CodeCommit is often the starting point within CI/CD setups; this can kick off other steps of your CI/CD pipeline, such as a build process. CodeCommit is well integrated with AWS CodeBuild, AWS CodeDeploy, and AWS CodePipeline.
Once you create your git repo, there are three ways you can connect to it: via HTTPS, SSH, or HTTPS (GRC).
When an event happens in an CodeCommit environment, you can choose to do two things:
- You can be notified of the event or
- You can take action
🧪Hands-on Lab included where within the AWS console, you:
- Create a repository
- Access a repository using the Git command-line interface
- Add files to a repository using the Git command-line interface
Monitoring and Alerting
Monitoring helps measure health of your system.
It’s important because if you don’t know how your system is performing, you have no baseline for improvements.
Start monitoring👇:
- Application performance (with application performance monitoring tool, also called an APM tool)
- Server performance (eg splunk and data dog)
- Cloud resource monitoring
The more information you know about how your code and services are running, the more informed your decisions are.
Cloud providers understand this information is important, which is why they provide inbuilt services to monitor your cloud resources. For instance, AWS has CloudWatch, which allows you to monitor cloud resources and even include your own custom metrics through their API.
There are a lot of hosted options for centralized logs such as Loggly, Splunk, Paper trail, Sumo Logic, etc. to have your logs in one centalized place and to parse them for key insights such as warnings, errors, etc
With data, you will also have the ability to create alerts. To handle the potential flood of alerts, tools like PagerDuty and VictorOps can help.
Monitoring and Alerting with AWS — Level 1
AWS provides a comprehensive set of tools and services for monitoring and alerting purposes. Three key services include: CloudWatch, CloudTrail and Config.
Amazon CloudWatch
We already touched upon this service in the Operations with AWS section. Besides metrics and alarms, there are a wide range of components to Amazon CloudWatch that make this a powerful service includes:
CloudWatch Dashboards, CloudWatch Anomaly Detection, CloudWatch EventBridge (provides a means of connecting your own applications to different targets to allow you to implement a level of real-time monitoring), CloudWatch Logs (centralize your logs from different AWS services to monitor inreal time and filter for specific entries) and CloudWatch Insights (3 variants include Log Insights, Container Insights and Lambda Insights)
AWS CloudTrail
AWS CloudTrail is an AWS service that helps you enable operational and risk auditing, governance, and compliance of your AWS account. It has a primary function to record and track events, made from both API’s calls and non-API requests made within your account. Actions taken by a user, role, or an AWS service are recorded as events in CloudTrail.
When an event occurs, it is captured by the Trail and is sent to a log file which can be stored on Amazon S3 or within a CloudTrail Lake.
There are 3 types of events that CloudTrail categorizes and tracks:
- Management Events
- Data Events
- CloudTrail Insight Events
CloudTrail can also interact with CloudWatch.
🧪Hands-on Lab included where within the AWS console, you:
- Create CloudTrail to capture management events and deliver log files to an S3 bucket
- Integrate the trail with CloudWatch Logs
- Generate management events by launching an Amazon EC2 instance
- Configure a CloudWatch metric filter and alarm
AWS Config
Managing a small number of machines is easy. As the number increases, automation helps avoid human error. Automation also guarantees consistency.
AWS Config automatically discovers your AWS resources by providing you an inventory with configuration details, including the history of how a resource was configured at any point in time.
🧪Hands-on Lab included where within the AWS console, you:
- Configure the configuration recorder to AWS resources
- Track and audit security changes using AWS Config
- Explore the integration between AWS Config and CloudTrail
- Use managed and custom rules to check compliance
- Analyze and correct non-compliant resources
Part II continued
Navigate to part II here where we continue down our jr devops role path exploring containers, dockers, kubernetes and more along the way!
⭐⭐⭐ It is always a good day to learn something new ⭐⭐⭐