I am sharing this story with you guys because I think there is a lot to learn from this case about the security and best practices or what is just enough in terms of security. If you are using any of the cloud platforms like AWS , GCP, Azure or another Cloud Vendor to support your Applications and have not really paid attention to the security aspect or feel that since you are using one of the well known Cloud Providers you are safe, this article might change how you approach this domain entierly.
Unless security is something you look at being an important piece of work or domain which you need to invest your time in or there is work done to ensure security is tested and a vulnerability assessment checklist is maintained and worked on, your system is still prone to breach.
Now what happened with us was a minor incident, but the way it happened made us wiser and more focused on security. The hacker gained the access to one of the AWS access key-secret pair from the environment file which was hosted on a web application, a developer had forgotten to put that file in gitignore and there was another developer who forgot to check for bad paths and excluding certain files from being served at all.
Still could be worse …
The DevOps team had been lazy and was using a single key for all the AWS resource API and this key was associated with an admin role. More so this key was also shared with various application services or micro-services to provide access to AWS resources and services.
We had One Key to Rule Them all … ( AWS resources )
Once he got that access he waited for the right moment and found out the IP of our productions instance as it was the public IP of our website. Now for the hacker to try and get access to our sensitive data he would require to either ssh into our instance, but trying to SSH into that instance will be futile as he would not have the key or pem file which is req by AWS.
But here is the most clever and simple trick he used yet no one thinks of this stuff or the fact that this will happen to them.
The hacker cloned our current EBS storage volume which was stored as a snapshot for backup purposes. Then went ahead to create a new VM instance, attached the volume. Since he created the VM instance from scratch thus a creating a new key pair, allowing him to SSH into the newly created VM instance.
This is where we could have been more active in setting alerts on the account activity and making sure we got active alerts for any new resource creation. This would have helped us to take note of new resources being created from a foreign location or at a very unusual time. There is a perfect solution for this which we will discuss soon.
Now the hacker had access to our virtual machine instance which had our code environment variables and all of our sensitive data. This included our database credentials and other sensitive information. He already had access to aws account. And now with all the credentials, he had all the ingredients to successfully reach into our database and do harm us.
And these events took place over several days, but none of the DevOps could spot the extra resources or keys which were created. And to the credit of hacker, the naming conventions he used were able to skip the negligent eye. So basically all the new resources like security groups and ec2 instances or keys created were named by twisting the names of existing resources so that these will not be spotted over just a glance.
For example, if a resource was named — launching-wizard5
the hacker named his security group lanuching-wizrad5.
This is where we learned another lesson for we had not given importance to the naming of some of the resources or did not have a clearly defined naming strategy for resources. If we did we would have been abler and have had more chance to spot abnormalities.
Alright so how did we eventually find him or what was it that we did right which allowed us to recognize something was happening albeit a bit too late. We had developed an in house fraud detection system which was the business-specific solution, but it was able to spot any abnormalities in the transactions which were aggregated in the database.
As soon as he started modifying the database values we were alerted via this fraud system, which brought to notice that something is wrong with these transactions. And which resulted in a hunt to find what was causing these abnormalities.
This is where we learned another lesson when we started looking into the AWS cloud trail for logs of events that took place during the time of transactions. And after hours and hours of looking into these logs, we found the culprit. The hacker was not done with his full plan execution or was lousy to leave the resources created as it is and was brought to our notice.
After this was discovered we had to urgently take steps to eradicate further access and change current access to make sure this would not carry on further and any further harm should not be caused by the said hacker or group of hackers. But we had to be careful and not alert the hacker which may result in defensive action from him thus holding us hostage over data and threat of destructive action.
So this is what we did —
- Made a list of affected resources and possible leaked credentials.
- For each of these affected resources what was the impacted area and the scope of this effect.
- The possible way we can revoke the access or make the creds obsolete without alerting the hacker.
- Gather enough resources to make sure this is done quickly and within a given time slot.
- Communicate with the team and stakeholders the possible maintenance time and work estimate.
Some of the other actions which we took, post the flush-out —
- Enabled AWS Guard duty and attached alerts to Guardduty events
- Create alerts for resources creation on our account
- Established clear nomenclature POC for AWS
- Set resources scope for each key to only allow access to required access and no more than that, thus separated keys for say S3 access, ECR access, etc.
- Enabled 2-factor auth for all users with console login and set strict password renewal and expiry policies.
- Creation of a vulnerability and security audit schedule to identify active and unresolved security loopholes in our applications.
Weirdly and unusually, I am thankful to the hacker for making us better at handling our security and establish the importance of enterprise security or tech security in our team.
As they say — What doesn’t kill you only makes you stronger.