Data Security

Security in the world of AI, IoT and Edge Computing- Part 1

Under 10 min introduction to common security concerns in software systems

9 min readJun 9, 2023

It’s a data driven world we are living in, whether we talk about AI or IoT it’s all about the data. We are surrounded by machines which generate and use data to ease our day to day life but poses several security risks. From as small as smart watches to as big as oil rigs all of them are generating critical data.
Today all of us are living in a world where AI, IoT, Edge computing are the buzz words. All of these modern technologies are driven by the data. These technologies are advancing at a rapid pace so the challenge of data security is bigger than ever. A lot of innovation is happening using AI in the healthcare sector, Edge computing has become critical for traditional industries such as oil & gas, retail, automobile etc. The data generated by these systems can be stored locally on-premise or it can be sent to a cloud server. Misuse of data can lead to catastrophe of lives, environment or even snowball a crash of a country’s economy.

We will look at high level security and design considerations for an AI model running on cloud or an IoT device talking to a cloud.

Before looking at security considerations and design, let’s take a look at a use case where a hospital is sending data to the cloud for processing using an AI model.

Let’s divide the data flow path for this case:

Scanner generates the data and sends it to the on-premises server.
On-premises server receives the data and stores it locally.
On-premises server uploads the data to the cloud.
Service on cloud receives the data, stores it and invokes the required AI module. AI module generates results and stores it locally.
Result is sent back to the clinician’s phone.

scanner → on-premises server → cloud cloud → mobile app

Overall if we look at the data path, we find that either data is moving over the wire or it is being stored somewhere on the machine. This means we need to secure the data when it is moving over the wire and secure the data when it is stored on the machine.

The mechanism of securing data over the wire is called security of data on the fly and the mechanism of securing data stored on the machine is called security of data at rest.

We will look at each data flow path one by one:

Path 1 — Scanner to on-premises server

Upon every patient scan the scanners will generate data or scan output and send the data to the on-premises server. In this case both scanner (CT/X-ray etc.) and on-premises server are in hospital premises and communicating over a secured private network.

Risk: What if someone hacks into the hospital network and starts sending data to the on-premises server as if the scanner is sending the data. This data can be malicious and it can cause DDoS attacks. Quick reference to DDoS attack. DDoS attack prevents the on-premises server from doing it’s actual job. This may lead to loss of life because the on-premises server won’t be able to receive data of an actual patient which needs immediate help, say in case of a brain haemorrhage or heart attack.

Hacker sending malicious data to the on-premises server

Solution: You can mitigate the risk by adding an onboarding step for the scanner. Once the scanner connects to the on-premises server it should start a handshake process and it should provide its identity to the on-premises server. The identity can be provided in the form of a certificate, unique keys etc. Now the on-premises server should verify the provided identity and decide on the success or failure of the handshake process. Once the handshake is successful only then the scanner’s data will be accepted by the on-premises server.

We can look at onboarding and common ways to onboard a device in detail in the upcoming articles.

Risk: What if someone in between comes and acts as the on-premises server and the scanner connects with a rogue agent and starts sending all its critical data to that rogue agent ?

Hacker acting as on-premises server and receiving critical data

Solution: Data between scanner and on-premises server should be shared using a secured channel. You can use TLS to secure the channel and provide security of data on the fly and this will ensure that the scanner always connects with the desired on-premises server and not with a rogue agent.

Path 2 — On-premises server

The on-premises server stores the data locally after receiving it from the scanners.

Risk: What if someone gets hold of the on-premises server ? In that case the attacker will have full access to the data present on the on-premises server. How can we secure data at rest ?

Solution: All the data on the disk should be encrypted and the encryption key should be kept in the TPM (Trusted Platform Module) chip. TPM is a type of hardware security module (HSM) Read about TPM here. Even if the attacker gets hold of the device, TPM raises the bar and will make it very difficult for the attacker to get hold of the encryption keys and keeps your data safe.

Path 3 — On-prem server to cloud

As we looked in the previous step, after a successful handshake an on-premises server receives the data from the scanner and stores the data locally in an encrypted fashion. Now in a very similar fashion data needs to flow from the on-premises server to the cloud. But this time data will flow over the internet and hence it increases the surface of the attack, unlike in the previous case where data was flowing over a secured private network.

Risk: How should on-premises servers make sure that they are connecting to the desired server on the cloud ? What if someone in between comes and acts as the cloud and on-premise server connects with a rogue agent and starts sending all its critical data to that rogue agent ?

Hacker acting as cloud server and receiving critical data

Solution: A simple HTTPS (TLS) connection can help you prevent this attack. On-premise server will start a TLS handshake and the cloud server will send back its certificate and public key. On-premise server will verify the certificate with a Trusted Root Certification Authority to ensure the certificate is legitimate. If the verification passes then it will prove that the domain claimed by the cloud server actually belongs to the cloud server and upon successful handshake on-premise server will connect to the correct cloud server and communication will start on a secured channel.

Risk: How should the cloud server make sure that it is receiving data from the legitimate on-premise server ? If the cloud server starts receiving data without a handshake process then anyone can send malicious data to the cloud server which can lead to different types of attack.

Solution: You can mitigate the risk by adding an onboarding step for the on-premise server. Once the TLS connection between on-premise and cloud server is established, the on-premise server should start a handshake process and it should provide its identity to the cloud server. The identity can be provided in the form of a certificate, unique keys, public part of TPM endorsement key etc. Now the cloud server should verify the provided identity and decide on the success or failure of the handshake process. Once the handshake is successful only then the on-premise server’s data will be accepted by the cloud server.

Path 4 — Cloud server

The cloud server will store data in some volume or database after receiving it from the on-premise server.

Risk: What if someone gets hold of the volume or database where the cloud server has stored the data? How will we secure data at rest ?

Solution: Data should be encrypted and the encryption/decryption key should be kept very securely. Depending upon the type of data we can design our security measures. If the cloud server is storing a huge volume of data in a file system eg. EFS (AWS) provided by the cloud providers like AWS, Azure or GCP then you should use the encryption feature provided by the cloud providers to safeguard the data. If you need to store sensitive data like passwords in a database then using a vault (eg: HashiCorp vault) for encryption and decryption can do the job. You can pass sensitive plaintext information to the vault and the vault will encrypt it and will give you a cipher text that you can store in your DB. If you want to decrypt the data then you can pass the cipher text to the vault and the vault will decrypt and give you the plaintext data. Encryption and decryption keys are always stored and never leave the vault.
People run their software in virtual machine on cloud, so instead of TPM, vTPM can be used on cloud to store the keys required for encryption. A quick peek into vTPM.

Path 5— Cloud to mobile app

Once the AI module on cloud processes the data, mobile app can fetch it and show it to the clinician.

Risks: How should mobile apps make sure that they are connecting to the desired cloud server ?

Solution: We already discussed this problem above and solved it using a HTTPS connection.

Risks: How should the cloud server verify that it is processing the requests received from a valid client ?

Solution: This can be solved by the standard user and session management technique. Mobile app users should be created on the cloud and their request should only be processed after a successful authentication and authorised on the cloud.

Summary

As we saw in the article that data needs to be safeguarded when it is on the fly and when it is at rest.

Identity verification becomes very critical when two parties (on-premise and cloud server) are communicating with each other. Like we saw in the above case that we can use onboarding process to verify if the desired on-premises servers are connecting with the cloud or not and with HTTPS/TLS we can make sure that on-pre servers are connecting with the authentic cloud server.

Sensitive data on disk and memory should always be encrypted. There are various ways to encrypt and decrypt data depending upon their size and criticality. Keys used for encryption and decryption should be safeguarded with top security measures.

In the next article we will talk about a case where proxy is in between on-premise server/device and the cloud and what if the proxy gets compromised ? How will we ensure data integrity and authenticity for both on-premise server and the cloud ?
We will also talk about additional security using ECDH on top of HTTPS for critical data.

Throughout the article I have used certificates to mitigate the security risks but I have not talked about how certificates are created, how they are signed, role of CA (certificate authority) in signing a certificate, what all algorithms can be used to generate the key, size of the key etc. We will look at these topics in detail in the upcoming articles.

I have also talked about onboarding process but did not discuss it in detail, we will look at onboarding and common ways to onboard a device in detail in the upcoming articles.

New:

Security in the world of AI, IoT and Edge Computing- Part 2
https://medium.com/@saurabh.newera/security-in-the-world-of-ai-iot-and-edge-computing-part-2-422393ab02ef