Prevent your customer data from getting Hacked

Published in

Analytics Vidhya

4 min readMar 7, 2020

This article will attempt to help you design and develop a simple user login logging system that will help forensic investigation in the event of a breach, and also allow for you to pre-empt any attempt of getting hacked in the first place.

I can vouch for this setup since it was designed and deployed at a production system that handled close to 3000 users and ran for 2+ years without a single account getting compromised. We were also able to inform a few customer of potential malicious activity on their systems - for which they thanked us.

We also managed to identify certain range of IP addresses, some of which were being used for brute force attacks — got them blacklisted to prevent future attempts.

You can create a service and release it on the cloud for people to consume, or plug it into your existing user logon process and help — prevent misuse of your IT landscape — the concept will remain the same.

Most of the architectural vulnerabilities for web-apps, mobile apps are exploited at the end consumer user login level, since people rarely change passwords, and reuse passwords. While the industry is making attempts to bypass the password mechanism altogether, we still live in an era of tonnes of legacy powered setups, and password based authentication is not going anywhere for a few years. Admin / Root access comes after someone has managed to compromise the end user level logon process and have a “free run” on what is available to be seen.

There has to be a server side component in the user logon process, and it is at that end that you should also capture the server session variable called “remote_addr” — which is part of any package like requests, flask in python or in classic asp it is a direct call.

User passwords must never be stored as is — always hash them, if possible along with the username

storage_value=hash { hash(username ) + hash(password) }

For example:

hash(username)=FG%^&

hash(password)=12#456

storage value = hash(FG%^& + 12#456)

thus your storage value will (hypothetically) look something like 89Kl43f — even difficult to be cracked.

This will prevent reverse hash lookups (since there are services available that offer stored hash lookups for pre defined commonly used passwords)

For hashing use anything over 256 bit. If needed rehash the hash to n-th level — just make sure you don’t blow up your cloud billings, since each hash is a mathematically intensive operation.

Lets look at an example of how a user logon / session info be stored at a simplistic level

If we are storing the remote IP address of each logon attempt, we will able to identify certain patterns, for example: user “b,e & c,f” are having some suspicious activity. This can help you launch the next stage of investigation, which could be on the lines of…

Is there any other set of users also who are attempting to access the system using a legitimate logon process, but via a closely related IP address range?
Develop a visual map of all logons happening, and if you are a geographic specific service (like a local e-commerce, local loan processing) operating in Sydney, it should be worthy of an investigation or a call to the end user if there is a user logon attempt happening from a different continent
Is there any user who is trying to access the system at a pre-defined time, from different IP addresses? (Big Red Flag)
Is the user_agent (Last column) differing very vastly for each logon attempt for a particular set of users? (You are probably being targeted via some sort of weaponised AI)
Has a user suddenly logged on after a 100 days?
Can you blacklist any specific range of IP addressed that are potentially invasive?
You can also, with a certain amount of analysis, identify a Distributed Denial of Service (DDoS) attack being carried out on your customer facing portal / engine with this type of logging (IoT based DDoS is now a service on hire)
Is there a user who is structurally changing the user_agent and testing which user_agents are being allowed and which are not? This comes in very helpful when someone wants to deploy an exploit after narrowing down their choice from a larger universe
Once you have sufficient data, deploy a daemon process or a regular batch to give alerts
Eventually create a white-list an orange-list and a black-list of IP addresses, in conjunction with the hosts file to not only identify, but pre-empt attacks

Unless you store all possible information available — cyber forensics will find it difficult to help in the event of a breach. Once you have the right set of fields ready to be investigated, you can run any amount of permutations and combinations and find out the outliers using any available charting tool or data analysis package.

Now — that gets me to the next point — if you are storing all of the data — what about data toxicity, data retention, what about user privacy, We are always on a VPN (Yes, VPNs can be hacked too) ? — I shall try and address them over the next few articles.

Glad to assist if you have any further questions!

Prevent your customer data from getting Hacked

Written by Vishesh Bajpai