Can we calculate and predict the future?

Essay on the topic “Positioning of Data Science and Big Data” within the scope of the Semester Work of the module “EDS — Data Science” at the University FFHS in Zurich Switzerland, September 2018 by the Student Gökhan Sari.

When it comes to the future of the IT industry, you hear the magic word Big Data. We should now think about the benefits and challenges of Big Data. This is not about a Pokemon Go hype that will be short-lived, it’s about huge data volumes that we generate as users, but certainly also about the evaluation of that data. Who knows, in the future, maybe only machines will make business decisions that are more based on data analysis than experience, or is it already reality today?

Why do we even collect the data (1)

Big data refers to amounts of data that are so large that they can no longer be handled with normal soft or hardware or conventional data processing methods. But where do such amounts of data come from? We all generate them with every click, every online purchase, every entry into the navigation device, every money transfer, every phone call, every gym visit, every new friend in the social network. As you can see, the data mountains are growing rapidly. The large amounts of data are used to automatically search for patterns and relationships. An example of this is familiar to everyone from online shopping, recommendations such as customers who bought, also bought this, based on the real-time analysis of millions of purchase data from other customers.

It is a growing thing

(Data) First of all it’s BIIG … and it’s getting bigger and bigger. Instead of questioning people with surveys on the street, we actually already have data about all of us who are gathered somewhere. (Automaticly) If we assume that today everyone owns a smartphone, the data is automatically collected every time you visit a restaurant, hotel, car garage or even if you are using public transport, so we can assume that it is easier automatically collecting data as manually, which is already the reality today. (Time) also plays a big role in this regard. I mean, it’s not just a matter of looking into the future of the collected data and analyzing that data, but of setting up a system and algorithm where you can run the analysis of the data in real time based on criteria. (Artifical Intelligence) It’s something new about big data that in the process, such as data collection, analysis and evaluation, it does not need a human who needs to be involved in the whole process. With machine learning, we can actually build a system where people are not at the center of decision-making without understanding what’s going on. Data alone is useless without the ability to analyze it (2). But that means it needs a system that is reliable and the choices we can rely on. But who should realize such excellent systems? Of course, no machines are needed here, it needs the software engineers. Big Data can be seen everywhere. The way we live has changed. There is a lot to do, a lot to analyze and evaluate. I think there is a big job supply page on one side and a skill gap on the other (3). This certainly enhances the status and future of data science and big data for software engineers.

Which skills do you need?

Data scientists are usually highly educated. 88% have at least a master’s degree and 46% have a doctorate. Although there are notable exceptions, a very strong educational background is usually required. To become a data scientist, you could acquire a bachelor’s degree in computer science, social sciences, science and statistics. The most common subjects are mathematics and statistics (32%), followed by computer science (19%) and engineering (16%). A degree in one of these courses gives you the skills you need to process and analyze Big Data (4).

R and Python are the programming languages that dominate the field of data science. 53% of data scientists have reported that they “speak” R and / or Python as programming languages.

In the ever-changing world of data science, these are the most up-to-date tools that professionals use in their work. R and Python are the bread-and-butter programming languages anyone should learn to penetrate the industry.

Although the data suggests that R is the most widely used language, consider putting Python at the top of the ToDo list, as this is the fastest growing programming language, according to the Stack Overflow community (5).

Analysis/Decision making

For most companies and authorities, a lack of data is not a problem. In fact, it is the opposite: there is often too much information to make a clear decision. With so much data to sort, you need a bit more of the data. You need better data analysis. With the right data analysis processes and tools, the once overwhelming volume of disparate information becomes a simple, clear decision-making point. We could collect these steps in 5 categories.

  1. Defining the questions,
  2. Deciding what and how to measure,
  3. Collecting data,
  4. Analyzing the collected data,
  5. Interpreting the results.

I would like to take a closer look at the last point.

After analyzing the data and potentially investigating further, it’s finally time to interpret the results. When interpreting the results of the data, you should ask the following key questions:

  • Does the data match my original question?
  • Do the data help me avoid objections?
  • Are there any restrictions on my conclusions, any angles that I did not consider?

If the interpretation of the data holds under all these questions and considerations, then you have probably come to a productive conclusion. The only remaining step is to use the results of your data analysis to determine the best course of action.

Big Data will save the world

What humanity has achieved or will achieve in the future with Big Data Analysis is hugely important to the entire world. In the areas of education, media, weather forecasts, information technology, insurance, healthcare … we will hear the magic word Big Data more and more in the future. So if we e.g. collect medical data, analyze it properly and evaluate it, we could prevent the diseases and in many cases prevent bad diseases (Prevention (6)). A very different example for the analysis of BigData comes from the USA: The distribution of a flu epidemic was predicted and contained via the automated evaluation of tweets. With the additional introduction of mobile health, eHealth and wearable technologies, the volume of data will continue to increase. This is where I fear that it will cause serious problems in terms of everyone’s rights and privacy, even if it means that the data is collected anonymously. That means BigData also raises questions:

  • Who owns the — often personal — data?
  • Is it dangerous if only a few big companies like Google, Facebook, Microsoft etc. control them?
  • Do we really want to learn from an online store, based on buying behavior, that someone is pregnant?

With Big Data Analysis we have opened a world of endless possibilities. There are various oceans of data ahead of us, but it also carries risks.

It’s a bit like discovering the fire. You can burn your fingers or justify a civilization on it…

— — — — — — — — — — — — — — — — — — — — — — —


[2] Jeanne Harris, senior executive at Accenture Institute for High Performance, has stressed the significance of analytics professionals by saying, “…data is useless without the skill to analyze it.”

[3] According to a study by the McKinsey Global Institute, there will be a lack of around 190,000 data scientists and 1.5 million managers and analysts in the US who can understand and make decisions about big data by 2018.