Lord of the Patch — Story of the PatchBot
Automated patch management using serverless technologies
“So you have come here for information… “
(Saruman, LOTR — The Two Towers)
If you are reading this, you are probably on the good side of cybersecurity and we share the same interest — fighting the bad guys. This article is for you if you enjoy lots of Tolkien references and a bit of cybersecurity. I appreciate you coming here and opening this article, so sit back and enjoy the material.
The world is changed: I feel it in the packets, I feel it in the network, I smell it in the wi-fi… It all began with the forging of the Great Programming Languages. Object-oriented languages were given to the wisest, fairest of all nerds. C was given to the byte lords, great craftsmen of shell tools. And assembly languages… assembly languages were gifted to those who, above all else, desire power. But they were all of them deceived. For another thing was made. In the land of Sploitdor, in the fires of Cloud Doom, the Dark Lord Pawnuron forged in secret a Master Exploit to control all servers. And into this Exploit he poured his cruelty, his malice and his will to dominate all life. One exploit to rule them all, one exploit to find them, one exploit to bring them all, and in the darkness bind them.
Fortunately, we live in a universe where such a Tolkienised exploit does not exist, so this is the first good news of this article. The bad news is, there are still many scary exploits out there that we are aware of, and even more that we are not. The race between good and bad security guys never ends, security is just a feeling and anyone can be hacked. But it doesn’t mean that we should just give up. As one of the noble cyber knights what you can do is reduce the number of people that can hack the kingdom you swore to protect, or, in other words, the company you work for.
As you might have guessed already, I am one of the Security Engineers at About You and a big fan of the “Lord Of The Rings” series. In this post, I will briefly summarize my presentation during Code.Talks 2018 conference  and introduce you to one of the things we do at About You to secure our infrastructure from known vulnerabilities and exploits.
One of the biggest risks to organizations running web applications is using components with known vulnerabilities. According to OWASP Top 10  this issue is currently ranked number 9 among the biggest security risks. In its current version OWASP Top 10 is sorted by severity of the vulnerabilities, but the situation drastically changes if you sort the issues by the likelihood of them being used against your organization. Let’s look at the figure below for a more detailed comparison (data on the right side was taken from ):
As you can see, the injection vulnerability shifted to the third place, while in the first place we see, the issue caused by not paying attention to security patches of firmware. The reason for this is that modern companies use numerous technologies and it is nearly impossible to keep all software up-to-date in fast-growing environments. In the next paragraphs, I will try to discuss issues with the patching process and how we tackle them at About You.
One general problem with patching software — the bigger your company, the harder it is to keep all firmware up to date, so the earlier you start doing patching, the better it will be in the long run. The first thing to remember with software updates — it is not a one time action, but a cyclical process!Therefore, you need to take into consideration all issues that arise for processes, such as: automation, visibility of information, reporting, defining clear action items… Below, we will go through all of them one by one.
With automation, one can achieve painless delivery of patches on regular basis. At About You, for this part, we developed a weekly process that simply takes all of our Amazon Machine Images (AMI), applies all updates and replaces our servers in the cloud with the new images. This procedure is only partially automated because in the end, a human needs to trigger the replacement of the instances in an environment. We have many teams that are responsible for applying patches, therefore, we had a problem of keeping track of how well these teams apply updates because in the chaos of rapid development it was often forgotten. In order to see the status of the semi-automated process for every team, we developed a tool that collects data and reports it back to responsible personnel on demand. The data we collect is as follows:
- Status of the AMI updates (done with Jenkins).
- Status of the AMI replacement (done via pull requests to Cloud Formation templates) for staging and production environments.
In the figure below you can see the representation of this data:
Many vulnerability scanning solutions have already introduced a process called “continuous scanning” when your servers are constantly reporting security-related data to their master node for analysis. Unfortunately, at About You, we did not have such a solution, so we had to work with a more conventional scheduled scanning using Nessus Professional. One of the challenges when scanning cloud resources were specifying the targets of the scan. Our EC2 instances in AWS are often recycled due to load balancing and auto-scaling functionalities, so their IP addresses change as well. Unfortunately, our version of Nessus did not have native integration with AWS, therefore, we had to provide the IP addresses of our servers to be scanned instead of other identification available in AWS. In order to know the IP addresses of all our EC2 instances (around 400 on average) at any given point in time, we used an open source tool called Security Monkey .
Using the tools mentioned above, we built a system based on AWS Lambda service to automatically launch scans with Nessus, parse the results and report them to the responsible teams. On a very high level, the sequence of actions the system does is the following:
- Get IP addresses from Security Monkey.
- Launch a Nessus scan for those IPs.
- Parse the results of the scan.
- Report the data back.
The report we get back consists of several parts: a Jira ticket for our Operation team to work with, a notification in Rocket Chat and some plots. After a report is generated, we analyze the vulnerabilities, decide which ones need more attention than the others and start fixing them. Since we’ve developed and set up the patch management process we saw great improvement both in a number of issues and the overall security awareness in the teams.
Even though things are generally getting better, there are still several challenges we need to overcome. One of them is that as of 2019 Nessus Professional does not longer support API calls necessary for automated scanning, so we are rebuilding our patch management system from scratch with a modern agent-based vulnerability scanning solution. There is a great future I see in automated vulnerability scanning and the benefits it brings to modern companies that seek to protect their infrastructures in the endless battle against the evil.
If you’d like to know more details about this system, you are more than welcome to see the recording of my talk in . In the next few days, we will keep an eye on the comment section to answer all your questions.
I hope I gave you a good look at our patch management process. In fact, we have many more topics to work on in the company: penetration testing our infrastructure and applications, extending our intrusion detection system, implement security measures for the AWS and many more. If you’d like to learn more about what we do here or bring in new ideas about security improvements, check out our open positions via the links below:
Thank you for reading and stay safe.