DATA SCIENCE SURVEY 2021 — INSIGHTS

Mehmet Sezer
Data Science Earth
Published in
12 min readMay 20, 2021
View interactive dashboard from here.

Introduction

Data Science Earth was established on the purpose of producing high-level data science solutions in 2019. The community whose members are entirely volunteers primarily aims to develop data awareness and ensure the correct use of data power in a globalizing competitive environment.

Data Science Survey Team, which is part of organization, released its first survey in 2020 to obtain some exploratory insights about Data Scientists all over the world. This year, they conducted the second survey for 2021. Survey aims not only to explore some insights but also to be a guide to all Data Scientists who are just new in the Data Science ecosystem and who are already in this sector and want to improve themselves.

The guide answers many frequently asked questions about Data Science including:

  • Where should new generations start?
  • What more can be added to current skills?
  • Which tools are necessary to learn, and which ones will most probably lead in the future?
  • What are the options to improve our skills?
  • What is the purpose of data science?

When it comes to results of the survey, the dataset includes many different perspectives from different regions. The different regions imply almost 20+ countries and many cities all over the world. After our Survey Team had handled the pre-processing of dataset, they presented interactive dashboards to explore some insights for us.

Let’s continue step by step

What is the Demography of Data Scientists?

Here are some of our discoveries about demography of data scientists all over the world!

You can find some information about country, age, gender, major, education level and business level of respondents;

  • Even if the major of respondents come from Turkey, this lasts with USA, India, Azerbaijan, and Australia. Brazil is also taken a place this survey with 2021.
  • Although the males are clearly dominant according to females, when it is focused on generation Z, the number of male and female approaches each other and even in 2021 number of female is slightly more than number of male. It clearly shows us that female labor force participation rate will increase for the Data Science area in the next future.
  • It can be easily seen that the total of generation Y and Z are almost entire of survey. The insight can be that the data related areas and the business title as Data Scientist is new concepts for business life. But when it is focused on just generation X, their business level mostly academician. So that, it can be interpreted that the academy goes before business life. Along with 2021, it is seen the academician turns into senior expert in private sector. It can be interpreted that private sector needs expert data scientist than ever before.
  • Most of respondents generally are professional (junior), expert or student. It is clearly seen that reversed mentoring is very important for companies. Because the young talents will ensure that companies become data-driven companies in the next future.
  • While major of the participants are respectively Computer / Software Engineering, Statistics / Statistics and Computer Science etc. and Industrial / Management Engineering, their degree is mostly bachelor. Statistics / Statistics and Computer Science is largely preferred for the master’s degree. It shows us that Statistics / Statistics and Computer Science is basis of Data Science. It can be also seen easily that Economy / Economics / Finance / Business Admin have recently been one of the most important majors for master’s degree in consideration of pandemic which impacts negatively global economy.

Data Analysis / Reporting, Data Science, Artificial Intelligence, Business Intelligence / Decision Support and Business Analysis are the dominant business title of Data Science area for private sector.

What is the Business Information of Data Scientists?

Here are some of our discoveries about business information of data scientists all over the world!

You can find some insights about experience, business field, company and data size, speaking languages and salary of respondents;

  • Most of respondents know Turkish as proficient level since most of them come from Turkey. This lasts with English, Portuguese, Spanish, French and Arabic. It is a normal result because these languages are commonly spoken around the world. But in these languages while the language levels of respondents who say to know English, Portuguese and Spanish are upper intermediate and advance, this level is elementary for Arabic and French yet.
  • When we look at the experience of Data Scientists, the average is approx. 5 years and this supports that Data Science is almost new working area for business. When it is focused on the experienced leaders, business fields as consulting and education directly attract the attention. These leaders certainly shape the future for business.
  • Most of respondents work in companies whose size in terms of number of workers are <10. Last year, companies with less than 10 employees also stood out in this ranking. While these boutique companies especially are accepted by fresh graduated or comparatively less experienced talents, they also handled mostly relatively less sized data.
  • Most of respondents work on data whose size are between 100 MB and 1 TB. While the business fields are distributed smoothly including Computer (Hardware, Software, Hosting), Information Technology, Finance and Insurance, Consulting, Education, Legal Services, Automotive and Textile, it is found out that more data sized is considerably correlated with Computer (Hardware, Software, Hosting), Information Technology, Finance and Insurance.
  • The salary is a really important issue to should be spoken on it. Before data pre-processing, the raw data set not only include two type of salary periods including monthly and yearly but also include four different currency types including TRY, USD, EUR and GBP. This issue is a critical data handling process to convert the raw data to just one type as monthly salary period and USD currency type. After that, salary can be become to analyze. It is clearly seen that more experience more salary. Especially after 16 years’ experience in sector, it can be got more salary with achievement of business titles as manager and above on organizational chart. The insight is that the salary is sharply getting to increase after a certain experience in sector.

What are the commonly used Data Science tools and for what purpose do they use?

Here are some of our exploratory insights of data scientists all over the world!

In this section, we will provide some insight on the tools that Data Scientists use the most and for what purposes. Nowadays one of the most frequently asked questions is which tools should I learn for Data Science and where and how should I start. The results of this survey show exactly the insights that answers these question marks.

  • As a result of the incoming answers, it is understood that approx. 30 different tools are used extensively by Data Scientists. Data Science always works by combining the most suitable tools to do the best job. For this reason, it would be wrong to set standards such as the following tool should be used in Data Science. The fact that we always have to keep in mind, using the right tool for every job will increase the quality and minimize the costs.
  • When we look at the Top 5 Data Science tools; Excel, Python, SQL, Anaconda and R are standing out. With 2021 survey, in order to compare tools in itself, Data Science tools are divided into four categories including Tools of Database, ML Platforms, Tools of Programming and Software Packages.
  • Much of the world’s data resides in databases. SQL (or Structured Query Language) is a powerful language which is used for communicating with and extracting data from databases. As in the results given by the participants, one of the systems used by Data Scientists in daily life is SQL. A working knowledge of relational databases and queries is necessary if you want to become a data scientist. But NoSQL tools including MongoDB, Hadoop, Spark, Hive and Kafka are coming upon the heels of SQL.
  • AWS, Google Cloud and Microsoft Azure is respectively very common ML Platforms and it can be said these three platforms are extremely dominant in the sector. The choices of the participants show that these 3 platforms are in serious competition in the sector.
  • Anaconda is a free and open-source distribution of the Python and R programming languages for scientific computing. Thus, it takes its place in the Top 5 among the tools used by the participants. Another exploratory insight is that Python users prefer the Anaconda platform and Jupyter Notebook in it. Some useful libraries such as Numpy, Pandas, Scikit-Learn, Matplotlibt and SciPy are also used by participants.
  • We must admit that although we have the opportunity to use direct data from different sources today, Excel is one of the most frequently used data sources from the past to the present. Matlab, SPSS, Power BI and Tableau are the other software packages frequently used by participants.
  • Data Science is used in improving business processes and developing new products by using Statistics and Computer Sciences together. As can be seen in the survey results on statistics and computer science, the most used tools are Python and R. While R has been preferred more in this regard in the past, today Python is preferred because it has a strong interpreter, its use by big technology companies, it contains many different libraries and they are easy to use. You can easily see this difference, especially if you filter Junior employees and the Y-Z generations.
  • Although there are many tools in the field of Data Science today, each one differs from the other with a feature. The greatest flexibility of Data Scientists is due to the variety of these tools. When we asked this in the survey, Data Scientists stated that 89% of them were satisfied with these tools. When it is compared with 2021, the result in terms of satisfying with tools is increased as 93%. One reason for this high satisfaction which continues to rise rate might be that most of the Data Science tools are open source and have very strong communities behind them.

When we look at the purpose of Data Science usage, the results do not surprise us at all. The Top 5 include Statistical Analysis, Preparing Report and Reporting Solutions, Data Management, Artificial Intelligence Applications and Processing / Accumulating Data This situation reflects the basic process of Data Science to us. The process is starting from collecting and managing data, creating a model by applying statistical analysis on it and making decisions on created reports at the end of the day.

How long does a Data Scientist work per day?

Here it is meaningful to make a distinction first. Rather than working, how much of this work do we work efficiently?. This was one of the other most frequently asked questions for us. We have reserved a section about this in our survey. We asked data scientists how long efficiently they work and how much R & D they do per day. When we look at the results after performing the necessary operations on the outliers;

  • It is understood from the survey that Data Scientists can work efficiently on average 5.3 (It was 5.4 in last year) hours a day. White collars around the world work on average 7–9 hours a day. This is much-debated issue today. In 2019, Microsoft tested out a four-day work during a week in its Japan offices and found as a result employees were not only happier but also significantly more productive.
  • Data Science is a constantly changing and developing field. That’s why Data Scientists have to take time to improve themselves apart from their daily work. At that point when we examine the results of the survey, it is seen that Data Scientists allocate an average of 2.4 hours It was 2.5 in last year) a day for R&D studies.
  • On the other hand, when we look at what Data Scientists do to improve themselves during the avg. 2.4 hours/day ; It seems that a high percentage (84%) of participants are using online channels. Especially online trainings, content and communities are the channels Data Scientists have chosen to develop themselves. It seems that this situation will increase considerably after the COVID-19. Otherwise, Data Scientists stated that they improved themselves by participating in in-company trainings. It is understood that companies that want to make a difference from the data should give importance to in-company training in order to keep and develop the human resources they have in Data Science field.
  • When we consider on these two averages, Data Scientists spend an average of 8 hours working in a day. In this case, we recommend all data scientists to have fun with the other 8 hours, and rest the remaining 8 hours.

Are Data Scientists satisfied with their current situation and what are they planning for the Future?

Although Science is a fact, we should not forget that Data Scientists are human. Human beings are seriously affected by their living and work conditions. If the most suitable living conditions and working conditions can be provided to scientists, then they will carry Science to the place it deserves in the highest quality. Therefore, developed countries are always ahead in science.

  • When we look at the numbers, the situation is unfortunately not very satisfying. While some of the participants (30–40 %) preferred not to comment on both work and living conditions, others (40–50 %) stated that they are not satisfied. The results along with 2021 survey clearly indicate us that the conditions both living and working have become much worse day by day. Participants (45–60 %) unfortunately express not satisfied. The problems experienced all over the world and the recurrence of these problems for a long time affect people’s views. Another reality is the fact of globalization. Although it has good sides, its bad effects are faster and bigger in problematic times. All links in the supply chain are affected. For this reason, if we want success in Science of the World, the whole should be happy, not a certain segment. The other problem which people all over the world experienced is pandemic of Covid19. At the beginning of the Covid19 epidemic, people from all over the world experienced working from home, and although everyone was happy with this at first, increasing working hours, long meetings and intense pace caused the work-life balance to deteriorate. It should not be forgotten that Covid19 may also have an effect on the worsening of the answers to this question, where we evaluate the living and working environment. We believe that both states and companies will take the necessary precautions in the following days.
  • Data Scientists (40%) stated that although they are not satisfied with their job and living conditions, they want to improve themselves in their fields of work as a future plan in the survey results. We see this as a positive situation. This rate shows us that Data Scientists have high hopes for their jobs and science. It is really pleasing that this rate has increased to 60% with the 2021 survey. The Covid19 outbreak has also shown us that online training is an opportunity in such situations. The results of the 2021 survey showed us that the desire to participate in online trainings increased from 83% to 88%, especially for the Z generation. Apart from this, those who are not satisfied with the current situation and want to change jobs and go abroad also have a serious rate (30%). Finally, the entrepreneurs (15%), the boss of his own business, are among those who do not lose hope. However, this ratio has unfortunately declined to 11% with the 2021 results. One of the biggest reasons for this decline is probably the Covid19 outbreak. We hope all Data Scientists can do whatever they want best.

Conclusion

In general, we see that the Data Science ecosystem continues to evolve with young talents and its dynamic structure. As every technology has a hype circle, this is valid in this field too. When we analyze the results in depth, we understand that we are still in the early stages of this circle.

In particular, it seems that the Y and Z generation will have a serious voice in this field in the coming years and will direct this field. It is understood that companies which want to make a difference through the data should support these young talents and invest in this field.

In this area, where online content consumption increases especially in the direction of learning, solutions are also produced quickly. The flexibility and diversity of the tools used and the strong communities behind these tools provide considerable flexibility and convenience for Data Scientists.

Last but not least, big thanks to our researchersMehmet Sezer, Yunus Emre MIZRAKÇI who contributed to this work. We can say that Data Science is a field where solutions are created in an enjoyable way as knowledge and experience are shared. We created this study to be a guide for Data Scientists. Our request from you, let us continue to contribute to Data Science all together and beautify the world with Data Science.

--

--