Devops Week News — Issue #45
This week we start the newsletter with a video from SREcon 17, Laura Nolan is talking about Distributed Consensus Algorithms and how they work, how they perform, what can go wrong, when to use them and not to use them.
On the articles section, we start with an article about Load Balancing and the many ways you can do that, then we have a great article about Site Reliability (SRE) role and how to become one, the systemd journal is a useful tool for collecting and managing system and application logs and we have commands to help you to extract information from it and to close we talk about version and releasing products.
Last but not least, we have Zeus a tool that will guide you to the best hardening practices on AWS.
Please, send us links, event, comic strip, DevOps job related, etc., via @devopsweeknews.
Video of the week
Processes crash or may need to be restarted. Hard drives fail. Natural disasters can take out several data centers in a region. Site Reliability Engineers need to anticipate these sorts of failures and develop strategies to keep systems running in spite of them.
This usually means running systems across multiple sites, and this means that you need to make tradeoffs between availability and consistency of your system state.
This talk explores distributed consensus algorithms, such as RAFT and Paxos in production: how they work, how they perform, what can go wrong when to use them and not to use them.
Laura Nolan has been a Site Reliability Engineer at Google for four years, working on large data infrastructure projects and most recently, networking. Her background is in software engineering and computer science. She wrote the ‘Managing Critical State’ chapter in the O’Reilly SRE book and is co-chair of SRECon EMEA 2017.
This post will explain to you how load balancing works to handle billions of requests and stay highly available.
So you want to be an SRE? In this post, you will find an excellent explanation of the SRE role, and on end, you will find many links to guide your journey to become an SRE.
The systemd journal is a useful tool for collecting and managing system and application logs which are usually dispersed throughout the system and handled by different daemons and processes. In this post, you will understand how it works.
Have you ever had problems to create a meaningful versioning and releasing to map to your product? Here you will find a path to facilitate it.
Tools that we love
Zeus is a powerful tool for AWS EC2 / S3 / CloudTrail / CloudWatch / KMS best hardening practices. It checks security settings according to the profiles the user creates and changes them to recommended settings based on the CIS AWS Benchmark source at a request of the user.