But InfoSec Won’t Let Us Use the Cloud

A Tale As Old As Time

Jonny O'Connell
Appsbroker CTS Google Cloud Tech Blog
10 min readMar 17, 2022

--

I ❤️ data security. So much so that as a fresh faced post-doc, I did a couple of years with the University of Bristol’s cryptography research group working on the SILENT program. However, throughout my entire professional career since then I’ve found myself living a Groundhog Day, having the same conversation on repeat with various Information Security departments — confronting the nervousness, apprehension, or downright hostility around using the cloud, and addressing the fear that your data automatically becomes less safe when it leaves your building.

As a proposition, it sounds reasonable enough at face value, which is exactly why it gets brought up so often. But I have an alternative lemma:

Your data is orders of magnitude safer in the cloud than it is on-premises.

So let’s step through the proof together, as I relive some old conversations:

We encrypt our data on-premises — does the cloud provider?

In a word, yes. And most likely, better than you’re doing so currently 🤷. Does your company have key rotation and key hygiene processes as stringent as theirs, or as rigorously adhered to? Is your choice of encryption algorithms and associated configuration as exacting, and tightly controlled? Do you have crypt experts on your engineering teams, trying to push the art of the possible? Unless you said a resounding yes to all of these, your data is probably going to be just as safe on your cloud provider’s spinning rust as it is on your own.

Of course, if you’re still nervous, feel free to manage your own key, or even bring your own. Just make sure you’re looking after it as well as your cloud provider will.

Talking exclusively from a GCP perspective, the FIPS 140–2 Level 1 validated cryptography library, Tink, is open source so you can see exactly what’s protecting your data, as can experts the world over; this is not just a tick box you’re taking on trust. If crypt isn’t open source — run.

We’ve got to send our data across the internet

Encryption on the wire is a solved problem. To access any of your cloud provider’s services, your data will always be wrapped in the warm embrace of TLS. Alternatively, a VPN can be set up between your on-premises and cloud estate to form a permanent encrypted bridge between the two.

When I opened with this being a solved problem, I really, really meant it. If public/private key cryptography is proven to be unfit for purpose, the internet (and likely humanity at large) will cease to function. I guarantee the day your TLS wrapped data is at risk on the wire, you’ll have bigger fish to fry.

“Our data centre is secure”

Sure. But probably not as secure as your cloud provider’s.

Looking at the average GCP data centre, it has 6 layers of security:

  1. Boundary security — Fences. Really big fences.
  2. Secure perimeter — Including thermal imaging and guard patrols.
  3. Building access — Requiring ID passes and retina scanning.
  4. SOC — Monitoring everything on site 24/7.
  5. Data centre floor — Which < 1% of Googlers will ever visit.
  6. Disk destruction facility — Which even fewer will see, and a hard drive will only visit once.
  7. (Optional) Batman — Certain Google data centres have the hero they need.

And let’s not also forget that if you did James Bond your way through all this, the reward would be a pile of useless encrypted data.

“We need compliance for our data”

I don’t speak legalese. Nor do I know how each region’s regulations differ from the next. But that’s fine — my cloud provider does speak legalese, and has teams of people dedicated to keeping up with compliance standards around the world.

Do I know what ISO27001 is? Barely. Do I know GKE is compliant? Absolutely✅ . So when a customer asks me for a compliant solution, I just guide them to the appropriate offerings. And as standards evolve and change, my cloud provider evolves with them, with no extra work to me.

For example, imagine you’re a company holding PII. Drop the data into an (adequately secured) GCS bucket, you’ll meet your resilience requirements. Sit DLP on top, you’ll meet your obfuscation, data ageing, and PII management requirements. Pick your regions appropriately, you’ll meet your sovereignty requirements.

Compliance is a thorny and difficult issue, so let someone else take the burden. Especially if you’re a smaller enterprise, or a startup, spending bandwidth on keeping up with an ever-changing legal landscape can be one of the biggest drains on your velocity.

“They’re going to be spying on our data”

This is one of my favourite points.

Starting with data at rest: we’ve already established you can use your own keys to secure this, perhaps even with your own external key manager. Short of an employee going rogue, circumventing an impossible mountain of safeguards, and grabbing the key as it’s in use, this data’s pretty safe.

Moving on to data in flight: this might require a little more effort to avoid using your provider-managed keys, but is still feasible in the most part; BigQuery can use CMEKs for encryption in flight, you can build IPsec tunnels between your VMs, self host istio in Kubernetes to encrypt cluster data, and so on. However, if you look hard enough, you’ll undoubtedly find somewhere you must use provider-owned keys — for example, the connection back to your host from a TLS terminating GFE.

When I attended CHES in 2015 I got chatting over dinner to some representatives from a-government-agency, and we moved onto the hot topic of mass surveillance. They laid out a thought experiment which neatly encapsulates why watching everything that’s being done digitally is a fundamentally impossible proposition.

Let’s explore the premise that, yes, your cloud provider is in fact spying on your data as it meanders around their network.

  • 16TB in a 2.5” form factor is pushing the reasonable limits of current storage density, and a 2.5” drive clocks in at 2.7 x 0.4 x 4” (ish). That’s about 65cm³. Which, if my napkin is correct, is 246,807 TB per m³.
  • Now let’s hone in on a use case: Spotify, a famous GCP customer. If we assume 25% of their estimated active user base streams for 2hrs a day, they’ll be transferring about 61,000 TB a day only considering public egress traffic, and parking ingress and internal chatter which is probably in similar orders of magnitude.
  • This means to spy on Spotify alone using numbers that are unrealistically conservative, GCP would need to acquire, power, and cool an extra m³s of storage every 4 days.

And we’ve only dived into one customer here. GCP is also home to hundreds of thousands more, including Shopify pushing 30TB/min, Sky powering millions of set-top boxes, and joined by other big names such as Verizon, Twitch, Paypal, and SAP. Even filtering and tapping a fraction of a customer’s data, given the sheer eye-watering quantity of data flowing around a hyper-scaler’s network, it’s just not feasible to have any chance of drilling into what your customers are doing.

Parking ad-absurdum arguments momentarily, far more simply, if there was any hint of such wrongdoing from your provider, their business would experience such an exodus overnight that it’s not worth the risk. No hyper-scaler would jeopardise a multi-billion dollar enterprise just for a glance at your packets (👀). nb: It might take longer than one-night to offboard multi-cloud is hard.

There are of course meta attributes of your infrastructure that your provider needs to see — “How many VMs does this customer have?” and “What IP addresses are they assigned?”. In GCP, if you require support or anyone else to access your project, you get visibility of this, and can even require explicit approval first.

“What if they lose our data?”

Not impossible… but nearly. Considering GCS, for example, this comes with an 11 nine guarantee (99.999999999%) for durability, meaning that if you store 1 million objects inside it, one will be lost after 100,000 years.

Achieving similar on premises will be a monumental task; as a starter for ten, you’ll need highly redundant arrays, quality disks, stringent failed-disk-replacement policies, demonstrable UPS strategies, geographic redundancy for natural disasters, bit-rot prevention techniques with checksummed data, and supply chain management to spread procured disks across manufacturers and production runs.

If you can achieve this resilience on-prem, I take my hat off to you. Alternatively, GCS is $20/mo/TB.

Isn’t it hard to maintain a firewall across so many components?

Perimeter security is dead. Long live zero-trust 👑

On-premises you knew where you stood; your firewall, your trusted IP ranges, and perhaps a good ol’ fashioned DMZ for good measure. Behind the firewall — Good. Outside the firewall — Bad. Inside the DMZ — Here be dragons. The world made sense.

But in the cloud when you’ve got a database spanning continents, and serverless components scaling up and down dynamically based on demand across a handful of zones — where is your perimeter?

Let’s assume your calling service is a Cloud Function, existing for only a few seconds, in anyone of 6 different regions, how do you allow that access to another service in a distant project?

  1. We can think in terms of perimeter security and throw some arcane networking runes to force the function to egress through the VPC, route it across transit networks and through firewalls to where it needs to be.
  2. Let the call traverse the scary ‘outside’, and have it authenticate who it is at the other end.

In one of these cases, you can guarantee who’s calling your service, and ensure they’re authorised to do so. In the other you cry yourself to sleep at night dreaming about network route priorities, and when it works all you know is the call came from ‘somewhere’ inside your network.

“But if we allow calls from the ‘outside’, anyone can get in 😱”. True. Although there’s far less likely to be a bug in a hyperscalers IAM stack, than in a firewall rule you’ve created 🤷.

Cloud providers still, of course, offer firewalls, but they also offer so much more. IAM guarantees mutual authentication between managed services by default — no longer do you trust a service just because it’s attached to your network, you trust it because it can prove who it is. To reach any data a service must now authenticate it’s identity and be authorised to access the data, not just simply share a CIDR range, and all this complexity is invisibly handled at the infrastructure level.

Firewalling et al. is of course still critical to a defence-in-depth approach, but a robust IAM posture will be less brittle, more secure, and something that’s very challenging to replicate in-house.

Our cloud provider knows what they’re doing, but we don’t know how to use the cloud

This is the first point in which there’s a genuine difference between on-prem, and the cloud.

We accept that people can press the wrong button — hard problems are hard, and squishy organic matter is fallible. People are equally likely to press the wrong button in vSphere as they are in the GCP console. However, pressing the wrong button in vSphere (usually) doesn’t come with the risk of putting all of your data publicly accessible on the internet.

There are two approaches to addressing this, and both need to be used in conjunction: people, and tooling. Tooling is the ‘easy’ side of the problem, and there’s plenty of it out there to help you; restricting what people can do, alerting where people have got it wrong, and detecting security config violations before they’re ever deployed. If someone has the ability to accidentally expose credit_cards.csv on the internet, that’s an organisational problem, not theirs.

The people half of this dichotomy means embedding developers in the security process far sooner, and is an article in itself. Security in the cloud native world is no longer transactional — it’s not a step post development, it’s a step during development. When security comes last it invariably builds the ‘Wall of Obscurity’ — developers throw code over the wall to security, security says no for a set of reasons unknown to development, then throws the code back whilst building the wall higher to make it harder to throw code over next time, forming a vicious circle. People end up resenting the wall, eventually seeking to go round it or through it, to everyone’s detriment.

Developers need room to experiment, need to be empowered to flow their changes from ideation through to production in the course of a day, and most importantly need space to fail. With the correct organisational guard-rails, none of these concepts are at odds with a secure cloud foundation.

Cloud native tools provide near instantaneous feedback loops, ensuring safety without causing delay. They can give visibility into why the container you built is unsafe, why you can’t deploy a specific container, guarantee appropriate build processes are used, and ensure your change lands in production with a secure posture. Where there are parts that humans are still required to be in the loop, automate the boring stuff, and transform InfoSec from gatekeepers of the cloud into development accelerators.

Good security will always increase velocity, and if security is slowing your development cycles down, you’re probably doing it wrong.

Jonny O’Connell — Cloud Architect

Security worries around using the cloud? Let’s talk!

Like ranting online about tech? Come and join us!

About CTS:

CTS is the largest dedicated Google Cloud practice in Europe and one of the world’s leading Google Cloud experts, winning 2020 Google Partner of the Year Awards for both Workspace and GCP.

We offer a unique full stack Google Cloud solution for businesses, encompassing cloud migration and infrastructure modernisation. Our data practice focuses on analysis and visualisation, providing industry specific solutions for; Retail, Financial Services, Media and Entertainment.

We’re building talented teams ready to change the world using Google technologies. So if you’re passionate, curious and keen to get stuck in — take a look at our Careers Page and join us for the ride!

--

--