The Must Know Checklist For DevOps & Site Reliability Engineers (Updated)

This list is not exhaustive but enumerates only technical basic, must-know skills, and some random thoughts. You may use them as a checklist to evaluate yourself or someone else or to prepare for your next DevOps/SRE job interviews. This list is opinionated.

Aymen El Amri
Sep 23, 2018 · 13 min read

Disclaimer: This work was done by both Sahil Sharma and myself. We already published the first version of it here (The Must Know Checklist For DevOps & Site Reliability Engineers). This article is an update of the latest one.


Disclaimer

This content is part of / inspired by one of our online courses/training. We are offering up to 80% OFF on these materials, during the Black Friday 2019.

You can receive your discount here.


We want to hear your feedback and suggestions about:

  • This list, obviously.
  • What kind of technologies would you like to learn? (please share it with us). We are going to launch a DevOps online academy and we want you to be our first contributor, so share your recommendation and don’t be shy!
  • Other stuff that we will be sharing with you in the future and we will be glad to hear your opinion and feedback about it!

In order to get in touch, please subscribe to one or more of our online communities, as it’s a good way to stay up to date and keep in touch with us as well as the community:

DevOpsLinks, a community of aspiring DevOps professionals and practitioners from all over the world.

Shipped, a community focused on technologies like serverless computing and FaaS and other interesting topics.

Kaptain: A Kubernetes community hub, hand-curated newsletter, team chat, training & more (coming soon).


What’s NEXT?

Be more familiar with the DevOps ecosystem:

  • First of all, be sure to understand the importance of the cultural points: read more here The 15-point DevOps Check List
  • You should master *nix systems and have a good understanding of how Linux distributions work.
  • Pick one OS for your production set-up. No need to get master of every OS out there. It would make your job difficult down the line. Pick one and get a grip over it.
  • Be at ease with Terminal. You may have GUIs to manage your servers but you have to LOVE the terminal no matter what’s the case, it is faster, secure and honestly, it is easier once you master it.
  • How to get the CPU/system info (cat /proc/version, /proc/cpuinfo, uptime, et. al.)
  • How cron jobs works. Set cron jobs on specific days/time/month.
  • How to know what OS you are running on your machine (cat /etc/lsb-release)
  • Learn the difference between different *nix OSs and how to know what OS you are running on your machine (e.g. cat /etc/lsb-release)
  • Difference between shells: sh/dash/bash/ash/zsh ..
  • How to set and unset ENV variables. Exporting ENV variable is temporary, how to export permanent variables ?
  • What are shell configuration files : ~/.bashrc, .bash_profile, .environment .. How to “source” settings for program initialization files ?
  • Knowing Vim, its configuration (.vimrc) and some of its basic tips is a must.
  • How logging works in *nix systems, what are logging levels and how to work with log management tools (rsyslog, logstash, fluentd, logwatch, awslogs ..)
  • How swapping works. What is swappiness. (swapon -s, /proc/sys/vm/swappiness, sysctl vm.swappiness ..)
  • Be at ease with scripting languages. Bash is a must (Other scripting languages are very useful like Python, Perl..).
  • Master useful commands like process monitoring commands (ps, top, htop, atop ..), system performance commands (nmon, iostat, sar, vmstat ..) and network troubleshooting and analysis (nmap, tcpdump, ping, traceroute, airmon, airodump ..).
  • What is your backup strategy ? How do you test if a backup is reliable
  • Do you know ext4, ntfs, fat ? Do you know Union filesystems ?
  • How to view/set network configuration on a system
  • How to set static/dynamic IP address on a machine with different subnets? (Hint: CIDR)
  • Use network packet analysis to analyze and understand how networking works: tcpdump, Wireshark ..
  • Are you familiar with the OSI model and the TCP/IP model specifications ? What are the difference between TCP and UDP ? Do you know vxlan ?
  • How to set-up firewalls (iptables at least ufw) : set rules,list rules, route traffic, block a protocol/port ..
  • How to view/set/backup your router settings?
  • How DNS works ? How to set-up a DNS server (Bind, Unbound, PowerDNS, Dnsmasq ..) ? What is the difference between recursive and authoritative DNS ? How to troubleshoot DNS (nslookup, dig ..etc)
  • Get familiar with DNS and A, AAAA, C, CNAME, TXT records
  • What happens exactly when you hit google.com in the browser? From your browser’s cache, local DNS cache, local network configuration(hosts file), routing, DNS, network, web protocols, caching systems to web servers (Most basic question yet difficult if goes deep).
  • Get familiar with CDN providers (fastly, Akamai et. al.)
  • Get familiar with how SSL/TLS works and how digital certificates works (https)
  • Learn about SSL certs (Let’s Encrypt)
  • Get familiar with more secure protocols and tools: TLS, STARTTLS, SCP, SSH, SFTP, FTPS ..
  • Know the difference between PPTP, OpenVPN, L2TP/IPSec
  • Learn to setup Recordset for your domain (You can use managed cloud services like Route53 or CloudFlare ..etc)
  • How SSH works, how to debug it and how you can generate ssh keys and do passwordless login to other machines
  • What is an init system ? Do you know Systemd (used by Ubuntu since 15.04) , Upstart (developed by Ubuntu), SysV ..
  • Compiling any software from its source (gcc, make and other related stuff)
  • How to compress/decompress a file in different formats via terminal (mostly: tar/tar.gz)
  • How to set-up a web server (Apache, Nginx ..)
  • Learn to play with Nginx/Apache log files using “awk, sed, sort, uniq”
  • What are the difference between Nginx and Apache ? When to use Nginx ? When to use Apache ? You may use both of them in the same web application, when and how ?
  • How to set-up a reverse-proxy (Nginx ..)
  • How to set-up a caching server ( Squid, Nginx, Varnish ..)
  • How to set-up a load balancer ( HAproxy, Nginx..)
  • How to set-up an API gateway for your microservices (Ambassador, Kong, Traefik, Nginx ..)
  • Get familiar with Systemd and how to analyze and manage services using commands like systemctl and journalctl
  • Get familiar with OAuth and SAML or Auth0 integration
  • Get familiar with RESTful API’s, Webhooks, GraphQL, gRPC
  • Securing ES Cluster (XPack (commercial), OpenSource: ReadOnlyREST, Search Guard)
  • Taking ES back-up (snapshot and incremental) using _snapshot API or esdump (Caution: requires nodejs/npm)
  • Taking DB’s back-up
  • Learn Python (pip + setup.py) and BASH. Have you started using Golang as a scripting language ? Try it.
  • Develop your Cloud Computing skills. Start by choosing a cloud infrastructure provider: Amazon Web Services, Google Cloud Platform, Digitalocean, Microsoft Azure. Or create your own private cloud using OpenStack.
  • What about staging servers ? What is your testing strategy Unit Testing ? End-to-end ? So you really need staging servers ? Google “staging servers must die”.
  • Read about PaaS/Iaas/Saas/CaaS/FaaS/DaaS and serverless architecture
  • Learn how to use and configure Cloud resources from your CLI using Cloud Shells or from your programs using Cloud SDKs
  • Learn how to use at least one of the configuration management and remote execution tools (Ansible, Puppet, SaltStack, Chef ..etc). Your choice should be based on criterias like: syntax, performance, templating language, push vs pull model, performance, architecture, integration with other tools, scalability, availability ..etc.
  • Packer for image building
  • Integrating Jenkins for CI/CD
  • Setting up Consul (for service discovery)
  • Start looking into infrastructure as code and infrastructure provisioning automation tools like Terraform and Packer
  • Start looking into containers and Docker. It’s underlying architecture (cgroups and namespaces). How it works?
  • Start getting familiar with basic Docker commands (logs/inspect/top/ps/rm). Also look into docker hub (push/pull image)
  • Start looking into container orchestration tools: Docker Swarm, Kubernetes, Mesosphere DC/OS, AWS ECS
  • Read about stateless and stateful applications
  • Learn to build small docker images for your applications (preferable: alpine). Install only the required packages.
  • Learn most used port numbers on which services runs by default (like: SSH (22), Web (80), HTTP/S (443) etc.)
  • Learn networking from a distributed perspectives (networking in containers world). Make yourself at ease with 8 fallacies of distributed systems.
  • Get a decent understanding of L4/L7 load balancers.
  • Learn how to secure proxy and reverse proxy server (Nginx, Traefik, Ambassador ..) and take a look on how their networking systems work.
  • Get familiar with tools that help you create distributable and portable development environments (Examples: Vagrant & Docker).
  • Managing secrets while application deployment. Hashicorp Vault would help you.
  • Learn AWS SQS, Google PubSub or other alternatives
  • Get familiar with Kafka, AWS Kinesis or other alternatives
  • Understand AWS RDS as most of the times Ops find it easy to delegate mundane tasks to service providers to avoid the extra work. But that comes with a price tag.
  • If you’re on Kubernetes then understand all of its components and their workings.
  • Learn how to handle K8s built-in functionality at first then jump to Helm/Istio
  • Get to know how and what to monitor (from the operating system as well as application perspective)
  • Tracing comes later on once you are at a decent stage to understand and dig down and your application supports it natively
  • If you’re dealing with (Big) Data engineering related applications then get familiar with Hadoop, HBase, Zookeeper, Spark and setting-up their clusters
  • Learn to setup and tune Redis for your application need. Add authentication to it
  • Learn to know the nature of your application: CPU intensive, Memory intensive, I/O intensive. Then deal with it accordingly.
  • Learn to choose between different types of databases according to your needs: SQL, NoSQL, TSDB, graph database ..
  • Learn to manage IAM roles/permissions and how to manage keys for different users (AWS IAM, GCP IAM ..).
  • Publish your code to GitHub if you like sharing and helping others with the problems you faced.
  • Learn to benchmark your infrastructure and application to fill in the gaps.
  • Don’t directly jump to execution. Visualize the end-goal. Draw diagrams. Discuss in-detail with developers. Ask question without any hesitation. Let the questions be downright silly.
  • Do small demo’s or PoC’s from time to time for better understanding.
  • Are you familiar with IDEs (Sublime Text, Atom, Eclipse ..) ?
  • Dive into DB (MySQL or any other which you like)
  • Learn about Redis/Memcache and similar tools
  • Get to know the pro/cons of Microservices architecture and start building similar architectures
  • Learn how to configure and use continuous integration and continuous delivery tools like Jenkins, Travis CI, Buildbot, GoCd. Integrating this tools with other tools (like Selenium, build tools, configuration management software, Docker, Cloud providers’ SDKs ..etc) is helpful.
  • Learn distributed version control system Git and its basic commands (pull/push/commit/clone/branch/merge/logs…etc.). Understand git workflows. Do you know how to revert a Git repository to a previous commit ?
  • How to use SSH-keys. Try Github, Bitbucket or Gitlab .. to configure passwordless access to the repo/account
  • Get familiar with the mumble-jumble of Kernel versions and how to patch them.
  • How to generate checksums (md5, SHA ..) to validate the integrity of any file
  • Get to know the difference between Monolithic and Microservices architecture.
  • How do you make zero downtime deployment ? What is your strategy to make rollbacks, self-healing, auto-scalability ?
  • Learn about scalability and highly distributed systems. How to keep them UP & Running all the time?
  • Get familiar with APIs and services: RESTfull, RESTful-like, API gateways, Lambda functions, serverless computing, SOA, SOAP, JMS, CRUD ..
  • How to secure your infrastructure, network and running applications ?
  • Do you know what is ChatOps ? Have you tried working with one of the known frameworks ? Hubot, Lita, Cog ?
  • Learn how set-up, configure and use some monitoring systems (Nagios, Zabix, Sensu, Prometheus..etc)
  • Whatever you do “document it”. No matter how rough it is. Do it. Later on, you would thank yourself for it.
  • Make small code scripts for your ease. Note down commands or snippets (from StackOverflow, Github Gists or other online boards) that helped you to get what you wanted.
  • Make Google, StackExchange, Quora and other professional forums your friends.
  • Read. Read. Read. Ask questions on Twitter/StackOverflow.
  • Attend meetups. You can join one of our local meetups, like DevOpsLinks Community Meetings (Bangalore), DevOpsLinks Community Meetings (London) and DevOpsLinks Meetings Meeting (Paris). If you want to organize your local meetup, get in touch, we will help you!
  • Talk to fellows who are in the same domain and discuss your issues. Learn from the community.
  • Join our Slack channel and ask all of your (noob) questions, no problem! We all started from the beginning.
  • Don’t try to solve every problem. Always keep one thing in mind: No man is an island. You can’t do, learn, achieve everything. Learn what is most important for a task in hand.
  • Read about DevOps glossary (Google it)
  • Follow open-source projects (Kubernetes/Docker etc.) or what excites you.
  • Follow like-minded folks from the community and be updated with latest tech trends.
  • Try to establish good development practices and a solid architecture.
  • Learn how to scale at production level.
  • Learn how to live debug and trace running application in production servers.
  • Follow some decent tech companies engineering blogs (We follow: Google/Uber/Quora/Github/Netflix). This is the place from where you can learn straight from the experts and get a chance to see their approach to solve any problem.
  • Browse a few aggregators like Reddit, hackernews, medium .. etc.
  • Follow like-minded developers and tech companies on twitter. ( I am always reading articles and watching talks/conferences, post-mortems are some of my favorite content. I also follow a few github repos to see what’s going on with the technology that I use.)
  • Join DevOpsLinks, Kaptain and Shipped! We are sure you are going to learn many things, even if you’re an expert, you still have to learn more and more.
  • Read various technology related blogs and subscribe to DevOps Newsletters. We have a publication by the way, you can submit your article and share it with the community.
  • Read about Open Source and how you can contribute to Open source projects.:
  • You should be able to do post-mortem if something bad happens to your systems. Make a detailed documentation about what went wrong and how we can prevent not to let it happen in future again.
  • Try to learn the approach how experts from StackOverflow are solving any problem. Always remember, it’s the technology which keeps on changing not the basics. Basics always remains the same.
  • Read books
  • Last but not the least… don’t assume anything, ,never take realities for granted, always experiment and enjoy the journey.

If you have the majority of these skills, you can be sure you have the prerequisites to DevOps, SRE and system engineering.

You can’t learn all these in one-shot. But having a mind-set is the main thing. It will surely take time even to get familiar with all these but as they say journey has the fun. You will fail many times, learn from your mistakes and don’t repeat them.

Always remember, we all are learners here. We learn by hit and trial. Don’t feel shy in failing because that’s how we learn.


Connect Deeper

We will be happy to hear your suggestions to add other points to this list.

Join one ore more of our online communities to stay in touch.

Shipped, a community focused on technologies like Serverless Computing, FaaS (Function as a Service) and other interesting topics.

Kaptain: A Kubernetes community hub, hand-curated newsletter, team chat, training & more (coming soon).


If you want more articles like this, show your support by sharing it with your followers and colleagues!

Join our community Slack and read our weekly Faun topics ⬇

If this post was helpful, please click the clap 👏 button below a few times to show your support for the author! ⬇

Faun

The Must-Read Publication for Aspiring Developers & DevOps Enthusiasts. Medium’s largest DevOps publication.

Aymen El Amri

Written by

Building www.faun.dev & www.eralabs.io DevOps / Kubernetes / Architecture, Maker/Entrepreneur. Author Painless Docker SaltStack For DevOps and Practical AWS

Faun

Faun

The Must-Read Publication for Aspiring Developers & DevOps Enthusiasts. Medium’s largest DevOps publication.

Welcome to a place where words matter. On Medium, smart voices and original ideas take center stage - with no ads in sight. Watch
Follow all the topics you care about, and we’ll deliver the best stories for you to your homepage and inbox. Explore
Get unlimited access to the best stories on Medium — and support writers while you’re at it. Just $5/month. Upgrade