The reasoning behind this assertion is based on many years as an IT consultant working with small to mid-sized companies (and a few large) and my day job as a co-founder and product owner at EINDOM.
Over the years I’ve helped with everything from small blogs on shared hosting, to scaling cloud deployments into the 100’s of instances. No I’m not quite privileged enough to put multi-thousand deployments on my CV yet, but this is not the kind of stories I’m going to do here. You can read about large scale deployments at Uber engineering or Netflix chaos blog.
I’m going to be talking about running everyday companies using modern technologies.
This focus on real life IT is what sometimes comes into conflict with what the rest of the internet defines as best practices. I frequent various programming discussions daily and I must admit, I take issue with the general hostility towards anything not defined as “best practice”, without stopping up for a moment and evaluating your own situation.
That being said, I’m quite the paranoid type myself and I tend to consume every single publication from all the industry-people faster than a node_modules folder grows beyond 100 MB.
If you don’t believe me, see this video of my homebuilt fingerprint sensor for my room at home back in 2013. The thing ran RSA 4096/AES-256-CBC encrypted communication between 3 PC’s controlling the room using C#— https://youtu.be/XERUphlTxzQ
Every single person on the internet seems to be recommending running your EC2 instances in a closed VPC, installing a NAT instance and then setting up a DMZ with minimum one bastion server.
Amazon even have their own guides for doing it. They even tell you how to break SSH confidentiality and capture user sessions with homemade scripts!
How to Record SSH Sessions Established Through a Bastion Host | Amazon Web Services
A bastion host is a server whose purpose is to provide access to a private network from an external network, such as…
It’s on the AWS security blog, it must be good. Right?
I don’t think so. Who reads these tutorials? Who uses them? Is it the Airbnb security team? Maybe, but they will quickly adopt and improve upon whatever they learn within the hour.
Contractors, startups, CTO’s in small to medium sized companies and everyday tech enthusiasts with side-projects are the primary users.
I am on this list too, we actually deployed Bastion servers at two companies, one being my partners snow removal company (app based) and the other being our startup EINDOM. We had a peak of 5 bastions running, but quickly dialed it back to 2 because we could live with the wait time for a new to spin up, if any of them went down.
We did this because we believe in security first and not being the next Equifax. Especially EINDOM processes personally identifiable information for all our landlords and their tenants. Any breach would be bad.
What are the benefits of a Bastion?
- Logging. Who accessed what, when and what did they do?
- Protecting against port scanning.
- Hardening one place only. Zero day exploits.
- Prevent rogue SSH access by an additional layer.
- Slow down attackers.
All of these are indisputably true. Bastion hosts do provide all these benefits. What people forget is that you have other ways to obtain the same benefits, cheaper, easier and with less risk for the kind of “set-and-forget” deployments many smaller teams do.
Bastion hosts provide logging, but they break security while doing it. You have to “open” the SSH connection to inspect it of course, but this requires using Agent Forwarding. Telling people to use it sometimes, but not others, is not a great way to get them to stop doing it. See Johannes post about why Agent forwarding must die. https://heipei.github.io/2015/02/26/SSH-Agent-Forwarding-considered-harmful/
The better alternative is to have logging enabled on all your machines, all the time, and ship the logs off to Papertrailapp e.g. If you can’t trust your own employees, less privileged accounts is one solution, but you should probably consider hiring better employees at this point anyway. Remember: We are talking small to medium sized businesses. If you are logging for your developers own sake, enforce a policy of not disabling the logging or just have them probably jailed to their own accounts and ship the logs from root.
Bastion hosts prevent port scanning, but so does a firewall.
Bastion hosts are easier to harden. Why? They are machines too. If you firewall off your internal communications from the internet and run them in a private VPC, what is there on your web server that isn’t there on a Bastion host?
You can run less services, but let’s be realistic. We are only opening port 22, and every single machine is most likely to run some version of OpenSSH.
We had our own magical hardening scripts for our bastion machines as an Ansible playbook, but then we realised that we should be running the same hardening on our app servers anyway. What good is a rock shell if your core is soft as marshmallow?
Zero day exploits are easier to prevent on Bastions. You will most likely be running the same OS and same OpenSSH server on your bastion and app servers, meaning it’s just a local port scan and jump more. If anybody finds an auth bypass / remote code execution for OpenSSH, they will be both lucky and extremely rich. Chances are people will notice before your project is attacked.
Bastions prevent rogue SSH access. I’ve read this argument before, and always asked: “how?”
If you are using public key based authentication, you will most likely be using the same key for the bastion as the internal network. Stealing this key is a free token anyway. You can add 2-factor at the bastion (we did that), but you can do that at the app servers too. Using configuration management, this doesn’t matter. Also, you should really be storing your private keys in a Yubikey or equivalent.
Bastions slows down attackers. True, especially against automated tools, but, they should not be breaching your servers in the first place. And any non trivial tool will be able to quickly port scan the local subnets and jump along.
Any targeted attacker will do this within the first five minutes of compromise. Yes, you just bought an hour, maybe even a day, but what does this get you? If you don’t have a security team actively monitoring your bastion servers, you will most likely never know unless its’ CPU spikes to 100% mining bitcoin or it starts sending out spam and you get shutdown by your provider.
I’ve seen customers running 5 EC2 instances, +1 bastion and +1 NAT gateway, because it’s “best practice”. You pay $38 pr. NAT instance and $5 pr. t2.nano Bastion. Meanwhile their security groups were way too open, their Network ACL policies default open and they dropped the root account, but had full IAM access on all developer accounts without two-factor enabled. What did those $43 of false security get them? Complacency.
For people saying $43 is not a lot; No it’s not, but in small scale business it could have bought an Multi-AZ ADS Micro instance on MariaDB — Is that not a better investment for a small development shop? If your budget is fixed, your budget is fixed. This article is not about the money anyway, I just wanted to point it out.
In my experience however, the worst thing about Bastions in small teams is the increased attack surface. Now your 5 person IT department have another server to take care of. It’s a server that nobody seldom checks, customers won’t notice if it’s down, and when did we last run apt upgrade again? Was it Ubuntu 14.04 or 16.04?
It often leads to complacency in the rest of their system design. We don’t need UFW rules limiting the subnet for port 80 on our app servers, because we are in a closed VPC. We won’t need a security group for our database, because it’s in a closed VPC. I think I’m getting the point across.
Add all this on top of bastions breaking most tools. Sequel Pro SSH tunneling only tunnels once. Same for Datagrip. You wan’t to use your favorite deployment tool? You can’t. Not even Ansible like it very much without some dirty hacks in the inventory file.
Don’t use bastion hosts. Use all the free goodies AWS (and other cloud provides) gives you.
Modern software defined networking is really under appreciated. Think about your VPC’s, subnets and Network ACL’s for each subnet. Apply Security groups liberally but be stringent with permissions. Limit the scope of all IAM policies provided by IAM Roles (If I had a dime for every time I see FullS3 attached to an IAM role on all webservers).
We have a default Network ACL on all production environments that limit all inbound traffic to only port 80, 443, 22 and ICMP. We also allow TCP 32768–61000 inbound because ACL’s are stateless and Linux needs the upper port range for Ephemeral ports. We allow all traffic in our private subnet in our Network ACL for easy every day use, it only serves as a last line of defence against the internet in case our security groups get opened or otherwise disabled. Outbound to the internet we are very specific about what we allow and generally only allow HTTP/HTTPS + ICMP as that’s all that we need for updates and API calls.
We further lock down communication between instances by applying Security groups as needed, where we e.g. open port 3306 on members of the “database security group” up to members of the “web app security group”.
Our setup defers when it comes to SSH. We have no opening for SSH anyway except in our Network ACL. If you try to SSH to our instance IP’s (which are also hidden by the ELB), you will be met with absolutely nothing. Zero. You can’t ping them, and you can’t SSH. Nothing is open to the public internet.
We have a single security group that all servers are member of, called “SSH access list”. This group has been manually setup, but is managed automatically using a CLI tool we’ve developed in-house and that is part of our primary project.
This CLI tool opens up your current IP (or a specific IP / Subnet you provide) for 12 hours. The next time somebody runs this tool, it will scan for old IP allowances and remove them. The tool can also clear the access list.
It works using the AWS API, and uses a static token in our project, that is limited to only a few actions and only on this specific Security Group. It can’t open port 80, it can’t add another group, it can’t even delete the current group. It can only add and remove rules on port 22 on this specific Security Group. Every action is of course logged using AWS CloudTrail.
As you can see, we are quite happy Laravel users at EINDOM. :) So instead of having a Bastion, our workflow is like this.
- Attempt to access whatever server we need. Did it hang?
- `php artisan ssh:allow` inside our project root.
- Attempt to access server again and succeed.
We have no operational burden of managing a bastion and our access is always limited to the current dynamic IP we are working from. You still need our servers IPs (which change, they are cattle) + you still need to break through SSH, and you only have a few hours before we clear our access or it gets purged by another developer.
We think this provides a much better security model for the kind of attacks we are worried about while also cutting complexity and a little bit cost (not a primary goal). This API controlled security group could be implemented together with a Bastion host but we really don’t see the need unless you are a top tech company and your security team is bored.
I will make the code we use into an open source package in the near future (promise), but until then it will be provided as a Gist. I will also provide an example IAM policy for controlling the security group.
I realise it’s dangerous territory to go against industry practices, but without discussion, we aren’t going to get any changes. I hope you’ve enjoyed this rather lengthy read and it provided som food for thought, both about bastion hosts, but also generally about trusting “standards” blindly.
See all code and IAM policies on: https://gist.github.com/HSPDev/74ad755060880b2c30ae9e9a6ed20eda