Author: Auxten Wang
Growing Data Breach
Due to the growing data breach, data privacy issues are getting more attention from people and government. Europe published GDPR in 2018, while other countries such as India and Brazil are also publishing data protection acts. In China, the nation people’s congress has listed the personal data protection act in the 2019 legislative work plan. In the rising awareness of protecting personal data, the questions remain as of how do we protect our data as individuals, and how do companies protect users’ data when developing applications?
Data Leaking Caused by System Security Issues
In the second decade of the 21st century, data storage and processing technology have been developed faster than ever, along with a lot of problems of privacy issues increasing dramatically in the meantime. The following table listed the top 10 data breaches reported by media before 2018 August (Trendmicro 2018).
Even the top tech companies which have set a lot of standards for API and done penetration testing, can not guarantee “data leaking” will never happen. Usually, these companies are in control of tremendous user data and sensitive information, and due to this reason, these companies have a large impact on customers or even our society if data breaches occur.
To understand why the consequences are severe, first of all, we need to know how the hacker hack.
How Hackers Hack
In general, there are four steps that hackers will do (Trendmicro 2018).
Research: Hackers search for network or system flaws in the target company’s security system. Since this involves a large amount of work, most companies cannot implement full protection and most of them improve their security by employee education and period checkup.
Attack: Attacks are categorized in two ways:
1. Most widely known — attack on the infrastructure. The traditional firewall protects the servers from this type of hacking.
2. Uncommon — a different way of attacking by using social engineering ways such as Phishing, pirated copy Trojan, WiFi phisher, and social media fraud. If the information is really valuable, the hacker may utilize the 0Day vulnerability.
- Stuxnet worm entered Iran’s nuclear facilities in 2014
- APT-C-01 against China National Security lasting for 11 years
- APT-C-12 against China’s nuclear facilities and research institution lasting for 8 years
Drag: As long as the hacker enters the company’s office network, or even the server, due to the historical reasons of the database (I will explain this later), the address and password of the database are usually written in the code. Even if the hacker doesn’t find the password, he could just copy the data file and then transmit the valuable data back. Since DBMS (Database Management System, relational database) is simpler to develop and more flexible, it’s used to store data in most user privacy sensitive scenarios. Therefore, no matter the most widely known way of attack or the uncommon way of attack, DBMS is commonly used for storing data.
Hide: The majority of data breaches are not reported to the public, or they are usually reported after a long while when the data have been sold and cannot be traced. For those valuable data, hackers often hide in the dark and wait to steal the data. Since most IDS (Intrusion Detection System) use simple file storage or database, it’s easy for hackers to hide their trails.
Data Breaches due to Improper Management
The most famous data breach so far is the Facebook-Cambridge Analytica data scandal in early 2018. Cambridge Analytica collected up to 87 million users’ data without users’ consent and used for political purposes. In the meantime, Facebook was reported aware of Cambridge Analytica’s action. The data was used by Donald Trump in the 2016 US election campaign and influenced the election result. Impacted by the scandal, Facebook’s share price dropped by 7% in one day and lost 36 billion dollars of market value.
What shall the service providers do?
There is one important principle: In the system architecture, the place that needs the highest security protection is the bottleneck of the system.
The First Moat of Defense
Most security protections are done in the API Gateway, and this follows the way we discussed above.
The current application development includes multi ends such as Web App, Android, iOS, Applet. To improve the development and management efficiency, usually the backend provides a standard API for all type of client ends. Since developers use different technologies stack, the API are usually very different and lead to the most frequent data breaches due to insecure API.
limited results of common defense mechanism in API Gateway is also caused by the complex protocol layers and coding layers.
The Second Moat of Defense
If we reconsider the architecture of the backend development, we find that another bottleneck of the system is the database.
Most usable DBMS are developed in the last century. They lack fine-grained access control at the field level due to historical limitations. Since a database is hard to design and develop, developers often use patches to fix security vulnerabilities/bugs or improve usability or performance through DB proxy.
- ProxySQL: developed by well-known MySQL database service provider Percona
- Vitess: developed by Youtube since 2011.
However, even those top tech companies such as Percona or Google cannot make DB proxy become the mainstream solution. The reasons are listed below.
- The DB Proxy solution has a negative impact on system performance and stability because the execution steps are overlapped with the database.
- The original intention of DB Proxy is mainly to solve the problem of database horizontal scalability. It solves some problems, but also introduce some other problems.
The Moat before First Moat
In the Facebook-Cambridge Analytica scandal, Zuckerburg admitted that Facebook recognized Cambridge Analytica’s action of collecting user’s profiles before the news broke out, and this issue didn’t draw attention from Facebook’s management team. In the European Parliament hearing, Zuckerburg admitted that they “haven’t done enough to prevent these tools from being used for harm. And that goes for fake news, foreign interference in elections or developers misusing people’s information. We didn’t take a broad enough view of our responsibility. That was a mistake, and I’m sorry for it.” (Zuckerberg 2018)
Fairly speaking, Facebook is still leading in protecting user privacy in the world, no matter from a technical perspective or from a social awareness perspective. This didn’t prevent the scandal to happen and promote data protection from a matter in the internet industry to a hot topic in society globally.
Everyone’s Data Right
Overbearing Terms and Conditions of Use
Data rights must be protected starting from terms and conditions. For a long time, the ignorance of data rights in our society has caused a lot of vague or even overbearing terms and conditions of use on the internet. In the beginning of 2019, Google received a fine of 50 million Euro as Google failed to comply with GDPR. Google didn’t provide clear consent to the users when asking for essential information and the consent flow doesn’t comply with GDPR. Therefore, users are unclear how and where their data will be used.
Recognized as the new petroleum in this era, data is attractive to hackers. However, what is more, serious than hackers’ attack, is that rarely people realized the value of their data in the internet’s early stage. Internet companies intend to seduce users to agree with the terms and conditions so that they could use the data for profits. Now the internet giants are controlling most of the data, and they keep silent for their own goods.
The situation is different in Europe. Due to cultural and historical reasons, there is no internet giant which has huge impacts emerged in Europe. This makes Europe in a neutral position when dealing with user privacy issues, and this is why the toughest online privacy regulation was born in Europe.
GDPR’s Requirements and Limitations
To be simple, internet companies shall comply with GDPR in three ways.
1. Users have the right to know their data being used for certain purposes.
2. Users have the right to request stop processing and delete all their data
3. If internet companies do not comply with GDPR, they will be fined as much as the higher of 20 million euro or 4% of revenue in last year.
In general, GDPR is more like a guide rather than a regulation that specifies the detailed actions for companies. For example, how transparent shall the company inform the users of their data being used for certain purposes? The data is stored in an internet service provider’s database, how does the service provider proof that users’ data has been used properly? And how complicated is the process to delete user’s data? Facebook announced its new feature “Clear History”. However, this feature has not be implemented since the Cambridge Analytica scandal for months. Why this is so hard? Besides the technical challenges, one fact is that Facebook makes profits from advertisements. Stopping collecting users’ data or enabling deleting data would make Facebook lose a big revenue stream.
Hong Kong’s Privacy Ordinance
Reviewing different regulations, we think Hong Kong’s Privacy Ordinance set a good example for data protection. The core principle of Privacy Ordinance is that the purpose of data collection shall be consistent with data usage, and data shall be deleted after use. This is similar to GDPR but is more operational. As an international city, Hong Kong residents are also concerned with privacy. Back in 2013, Hong Kong developers published an application called “Do No Evil” which provides companies’ public information including government registration or lawsuits to users. However, this application was banned due to a privacy issue.
When Open Data come to our attention, data rights and privacy protection also remain an unresolved topic. As a tech team, we truly believe everyone’s data right will be guaranteed and protected with the development of technology one day.
Empowerment of Technology
F (gender, location, mobile model, searching keywords, product category, brand, price) = possibility of purchasing
Then rank the possibility of purchasing in all products. Put the product with the highest mark in the top of the list, and you will see what is recommended by the system. Artificial intelligence is actually predicting, or to be more precise, controlling what’s coming up to you.
Based on the same principle, Cambridge Analytica is able to analyze everyone’s preference based on the label they assigned to each one after collecting and analyzing the personal data. Like what Cambridge Analytica has done in Donald Trump’s campaign. People received the advertisement or news that are tailor-made according to their preference in the campaign, and these ads or news would impact people’s conscious and finally impact the election results.
The use of blockchain brings a turn to data right.
Data Right Back to User
In the future, all of our personal data can be stored in a decentralized database, such as bitcoin. We control our data with a private key. To make the data usable, a guideline for use of data shall be specified and set. The core of the guideline is to limited companies’ use of personal data in the designated way and delete after use. For example, if we trust Facebook, we grant Facebook access to read our name, age, and list of friends. Meanwhile, each time Facebook use our data will be recorded. If we find Facebook has done something out of scope, we could revoke the access. In this case, we fully control our personal data, and our privacy is protected.
Circulation of Data
Data is called new petroleum. What is different between data and petroleum is that data only possess huge value when it’s circulating or exchanging. Currently, the majority of data is possessed by government, companies, and individuals, and not shared, used, circulated for greater benefits. There is still a long way to go to maximize the value of data. The biggest headache for big data companies and research institutions is a lack of data. To create a win-win situation, individuals could sell their data by granting institutions access for research purposes. This would empower the data circulation and end the dispossession of data from their owners.