Computers Are Hard: security and cryptography with Anastasiia Voitova

Published in

Computers Are Hard

11 min readSep 27, 2020

There’s no single action that will allow your application to be secure. It’s a cycle.

Illustration showing a locked padlock with source code in the background. — Security and cryptography with Anastasiia Voitova. Illustration by Gabi Krakowska.

Cybersecurity is a weird beast. It ranges from using complex mathematical functions to encrypt data, to saying things like ‘you shouldn’t write your password on a sticky note’ and ‘please, for the love of god, enable two-factor authentication’ over and over. Because no matter how sophisticated the protections, we all have a story about a family member who got phished (in my family, that’s me). Not to even mention regular news about breaches and data leaks from major companies.

For all the talk about how crucial security is, how vulnerable our networks and websites and even governments are, I have to admit I don’t really know much beyond the basic precautions. Set strong passwords, don’t click on suspicious links, things like that. But how are passwords stored? How does a website know I typed in the right one? If an app encrypts messages I send and can then decrypt them, why can’t a hacker do that? It all feels esoteric. So I asked Anastasiia Voitova, a Security Software Engineer at Cossack Labs whom the Twitterfolk among you might know better as vixentael, to shed more light on security engineering and her specialty: applied cryptography.

Let’s dive into how engineers protect their products and our data from malicious actors. But first: please, for the love of god, enable two-factor authentication.

Wojtek Borowicz: What’s the most common way for security incidents to happen?

vixentael: If you search Google for .env files, you will find a lot — a lot! — of public environment files in plaintext with logins and passwords for databases and internal services. You can use such a password to log in and you basically hacked a company.

So the most typical source of data breaches is not some sort of sophisticated attack but putting your password a Google search away from malicious actors?

Yeah, exactly. To exploit a security breach, attackers sometimes don’t need specific knowledge or even a lot of time. There are so many low hanging fruits: public access, misconfiguration, or sticking with default credentials like admin/admin or root/root. Recently, I was at a conference and I was showing the organizers why their app wasn’t very secure. It took me a couple of hours to hack the app, access some details about attendees, and point out to the organizers how they can improve. It’s not complicated.

A company often wouldn’t even know they’ve been hacked until long after the fact. If they have security and anomaly monitoring systems (SIEM), they can notice someone is reading too much data from their database or accessing resources they shouldn’t. Otherwise, they find out from the news. I read that typically, it takes more than 170 days for companies to realize they’ve been hacked.

Screenshot of Google search results for database credentials. — If you’re not careful, your database password is just a Google search away.

Let’s assume we took the basic steps and made sure our password isn’t published anywhere and isn’t just admin1. What other precautions can software engineers take to keep their application secure?

In software development in general, we have cycles. First we try to understand the user’s problem, then we create the prototype, then we code it, and we perform user testing. In security it’s the same. There’s no single action that will allow your application to be secure. It’s a cycle.

First of all, you need to understand what you are trying to protect. In my experience, many companies don’t. In many industries, there are regulations that explicitly say what data to protect. For example in healthcare or in finance. Now there’s also GDPR. But regulations don’t cover everything. Other — typically non-regulated data — might be sensitive for a specific business. Let’s say your company has an app that collects users’ likes. Thanks to that, you show content and ads based on users’ interests. So for you, the data to protect would be those likes and users’ profiles. Technically speaking, it’s not sensitive data because it’s not regulated. But for your business, it’s critical.

So step number one is to define the data scope. And I don’t mean just binary data, but all the assets, access, and infrastructure points… basically anything that will lead to financial or reputation losses if someone gains access to, modifies, or deletes it. Now, even if you understand the data scope, you still most likely can’t protect everything. There’s not enough time, or the budget is too tight, you know, the real world happens. You need to focus and prioritize. Understand losing what data would lead to the most severe consequences. In security, we call that risk management. It’s complicated. I often see software developers putting more effort towards obfuscating the source code, rather than encrypting user data or spending time on proper authentication.

Does implementing those methods, like obfuscation and encryption, make it more difficult to build and maintain software?

Yes. And this brings us to step number three. When you understand what to protect, you implement ways to do that. That’s what we call security controls or security measures. Usually, you want to have more than one. This is called defense in depth: when you have multiple layers of security measures to protect the same assets. Unfortunately, there is no finish line here. There’s no sign that says ‘hello, you’ve done everything and are 100% secure’. You can take the basic steps and as a company, you will be fine against most threats. But new vulnerabilities are discovered every day, so you need to be updating these layers of defense.

Is it common that your application becomes exposed to a threat because of someone else? Like a vulnerability on the side of a vendor or a library you use?

Of course, it happens all the time. There are companies whose main business it is to keep an eye on dependencies. They follow the libraries you’re using and alert you when you need to update them.

Let’s say I’m running an online store. Which security layers would you recommend I use?

First of all, you operate under some regulations. You gather regulated data from customers and you need to protect it. To be compliant with GDPR and still be able to use the data for the purpose of analytics, you might need encryption, anonymization, and pseudonymization.

Since this is an e-commerce app, you’ll be handling payments. That’s another regulated industry, with PCI DSS and financial regulations. You either implement them yourself and keep an eye on credit card information, or you use a third-party solution. If you do the latter, you need to make sure it’s a trusted vendor. You can’t allow anyone to intercept the data in transit between your application and your vendor’s library.

Your store also has some inventory. And if you lose the database of your items, you can’t sell them. So you want to back up that database and do that every night. And you need to make sure you’re really backing up the data and are able to retrieve from the backups. Because another typical mistake is creating empty backups. And until something happens, no one realizes there was nothing backed up.

You probably also have different apps for different platforms. Like an iOS app, Android app, a web app, and some backend. You need to protect the infrastructure layer and make sure that data transmitted from the mobile application is transmitted in a secure way to the backend application. To do that, you need to make sure transport encryption is configured properly. Most likely that’s TLS. Bonus points if you create an end-to-end encrypted app, but that’s overkill for online stores.

Then you have authentication, authorization, and access control policies. That leads to a step many companies forget about. Some of your staff has access to user data. Like customer support. It makes sense to keep an eye on staff accounts and monitor their behavior. For example, if someone from tech support is accessing gigabytes of data from the database, that’s most likely a sign of something wrong. It could be a disgruntled employee trying to sell the data. The last thing you want is information leaking from insiders.

TLS
Transport Layer Security is a standard internet security protocol. It has three fundamental components: encryption (the data can’t be read during transmission), authentication (the data can’t be exchanged unless both sides of the connection prove they are who they claim to be), and integrity (the data can’t be tampered with). To verify their identity, apps and websites need a TLS certificate (also known as an SSL certificate).

You mentioned that when you’re transmitting data, for example when connecting to your payments vendor, you need to watch out for it to not be picked up during transmission. How is it possible for a third party to listen in on the data you’re sending?

It’s either through the transmission layer — if you set up TLS but allow downgrading to the old TLS versions (weak ciphers), the attacker can intercept that connection — or it’s through your logs. Many people believe that TLS is enough but unfortunately, TLS is terminated outside of the application’s code. The data that was encrypted during transmission reaches your application and is translated into plain text. Here it can be logged. Logging sensitive data is another typical story. It happened with Twitter and Facebook. They realized they were logging plaintext passwords of users. Developers might use application-level encryption to encrypt sensitive data fields before sending them using TLS. This way, the data will be encrypted twice, with different methods.

Does the hacker need to be very technically sophisticated to intercept this data in transit?

If the data is logged and no one protects the logs, then no. Attackers just need to find these logs. They need to either be lucky or to understand where to look.

We touched upon encryption a few times already. Is that the main method of protection in software engineering?

If you asked security engineers with a different background, they wouldn’t answer in the same way. But for us, cryptographers, yes: encryption is the main security control to protect the data. That’s because if data is properly encrypted, it can’t leak in plaintext. Instead of monitoring the whole data flow, we can monitor only decryption services. In other words, if we use good encryption and we understand what we’re doing, we make defensive security easier.

But that’s the tricky part. Doing encryption correctly is quite a sophisticated job. Which libraries to use? Which ciphers? How to store keys? How to revoke them? How to rotate them? If we ask ourselves what’s simpler: to implement encryption in an app and handle all the difficulties of encryption and key management, or not implement it at all but set up a lot of other security measures, the answer will be — it depends.

If you start from scratch with encryption, then it will be faster and easier. But if you already have the application working, especially if it’s a large application, it’s not easy to build encryption into it.

Okay, but if I’m a malicious actor who can already access your encrypted data, what’s stopping me from also accessing the keys?

Because keys are usually stored separately, in key management systems or HSM (hardware security modules). But storing keys alongside the data is another mistake I’ve seen a lot. In this case, encryption doesn’t make a lot of sense.

Thales hardware security module. — Hardware Security Modules can store and generate encryption keys.

It’s like locking your safe and leaving the combination on your desk.

Or like having your long and secure password written on a piece of paper under your keyboard.

But if the keys are securely stored, does it automatically mean your data is safe?

It means the attacker would need a lot of time to decrypt it.

So they can do it even if they don’t have the keys?

It’s just a question of time. It could be days, months, or a hundred years. What we’re trying to achieve with the use of modern ciphers is making decryption with brute force take so long, it would still take ages even if you rented a whole Amazon cluster to do it.

If you use old ciphers, they can be decrypted in hours or minutes. Same if you use a correct, modern cipher but do it wrong. For example, if the attacker has access to the database, they can send plaintext to your service and see what cipher text is returned. Then they can guess the nature of the encryption and try many possible attacks — known-plaintext attack, chosen-plaintext attack, side-channel attack, etc.

If it takes ages to decrypt data encrypted with modern ciphers, why would anyone keep using old ciphers?

Typically, you don’t write the ciphers yourself. You just use a library. Some libraries don’t support new ciphers. But if you’re a developer and you don’t have any cryptographic background, you don’t know which ciphers are modern and good and which are old and bad. It’s a question of expertise. There are still people who use Base64 as encryption.

BASE64
Base64 is an encoding algorithm dating back to the 80’s. It uses an alphabet of 64 characters, hence the name. It can convert files, such as images, videos, or music, into strings of text. Engineers use Base64 when they need to store files somewhere that doesn’t support non-textual data. Base64 can be easily decoded and is not an encryption method.

Can you encrypt any type of data? Is there a difference between encrypting videos, images, and audio recordings?

Yeah, there is. There are different kinds of ciphers types (block, stream), and ways to encrypt (e.g. authenticated encryption). Which one to use depends on how much data you have, which hardware you use, what performance drawbacks you can handle.

What about securing data online (e.g. website passwords) versus offline (e.g. password protected files)?

There’s no difference on abstract level. It’s more that it’s different types of data. When you protect your password, you don’t encrypt it — you hash it. You use a password hashing function, like scrypt, bcrypt, or PBKDF2 to create a hash from password. When you encrypt a file, you probably just use a block cipher to encrypt contents of the file. Encryption and hashing are different mathematical functions and you use them for protecting different types of data.

What’s the difference between encrypting and hashing?

That’s easy. Without digging into details — they are both mathematical functions. Hashing is a one-way function. You have data, you hash it, and you can’t get it back to the original state. Encryption is a two-way function. You have data, you encrypt data, then you decrypt the data and have it back in plaintext.

So how does a website know I entered the correct password if they store it hashed and cannot unhash it?

When you try to log in, it calculates a hash from your password the same way as when you created the password. Then compares the new hash with the stored one. But password-based authentication is not the only one that exists.

Is it possible to make a complex application that would be entirely secure and impenetrable?

Nothing is impenetrable. You can protect your application from the most common threats but you cannot protect it against vulnerabilities that will be revealed, say, next month. You can, however, have a security strategy and implement changes continuously. It’s like a roadmap. Month by month you add or improve security properties of your application. We can’t tell the future. What we can do is make our application good enough against threats we know about right now.