Data, the new root of evil?
A counterintuitive use of data that might spread like wildfire as we move into the data-driven era.
While the use of data is not new, it has certainly been in the spotlight in recent years. To some extent, everyone (yes, including you) uses data in their daily lives. Companies are making decisions based on data, from projecting their revenue to optimizing their services and cost. Movie ratings and product reviews are great examples of data that consumers can leverage to decide if they are going to invest their time and resource in it.
Humans are quite good at identifying spam reviews and some fake reviews, algorithms were developed and performed quite well in real-world settings for the same purpose too. However, there are still issues that both humans and algorithms are blinded to, which is interpreting misrepresented data. Data that are presented in a deceiving manner, derived from its original meaning, data without context, or even misleading reports to sway the general public's believes for personal or even political agenda are evidently increasing over time as it becomes a trend.
The hard truth
As shown in the graph above, Facebook posts with misrepresented information have increased drastically over the years from 2012. A steep jump could be seen between 2019 and 2020, and it is due to the increase in Coronavirus posts. A forecast indicates that we would have more than 3 million misrepresented posts by 2023. As this trend grows, people should refrain from using Facebook to discourage the algorithms from learning it…
Some of you may find this news alarming, or perhaps some of you are already skeptical and doubting the credentials of this author. While it is just a harmless graph I created for this article and nothing I said in this article is factually true or proven, there are definitely variants of similar posts floating around social media. A study done by Hiroko Kanoh on why do people believe in fake news over the internet in 2018 stated that while people may be doubtful of the fake news, they could not resist thinking if the news could be not untrue too. The link to that paper could be found here on Science Direct. Now, this is worrying as this untrue graph and statement can be taken out of context and misused, and there will be a portion of the general public believing it innocently.
The case of unrealistic comparison
Recently, posts comparing Malaysia’s fresh graduate salary against other countries like this have been appearing on my Facebook feed (original poster name redacted). An attempt to sway people’s views politically and promote unrealistic views/the grass is always greener on the other side. While the post above is true to a certain extent based on each countries Purchasing Power Parity by Statista, Hong Kong’s Big Mac pricing should be lower than Singapore’s. Needless to say, broadband pricing could be referred from Cable Co as they provided good raw data on worldwide broadband pricing. A post without the proper source, misrepresenting data by converting currencies, and not factoring in the cost of living in each country provides an unrealistic expectation for parents with graduating kids, or even fresh graduates themselves.
How far can this go?
Examples in this article are already happening on multiple scales, from people misrepresenting data to cause panic, to people taking data out of context for political agendas.
A study was done by Aner Tal and Brian Wansink on ad persuasiveness using graphs and formulas. That study states that putting data in a form of a graph or chart is more convincing than just making a statement, which is happening nowadays on the internet. In my opinion, data can be taken out of context and misrepresented for multiple gains. Companies influencing stakeholders or inexperienced investors with biased data on revenue, or rivals using the same information to defame companies or someone may become a trend too. As the use of data grows, the abuse of data grows.
What can we do?
Publications such as Towards Data Science conforms to strict policies that require claims, researches, and images to have proper sources and credits, however, it is not the same for social media where it is unregulated. In the same study done by Hiroko Kanoh, a questionnaire on “how should we deal with fake news?” had literacy education as its highest response. Understanding that fact-checking from credible sources before jumping to a conclusion is essential as we move into an era where data surrounds us.
We as people who work with data understand that part of training any algorithm in general consists of ensuring the dataset and the results of the algorithm trained is not biased. And I believe we should know better when we see a misrepresented data on the internet. Educating our families and peers when we see such a pattern will curb the spread of data that could potentially cause unnecessary panic or misinform others.
As my parents always tell me, do not believe everything you see on the TV. The same should apply when browsing the internet in this era.