10 Trends To Follow In Data Science In 2020

Nishchay Shah

Published in

Cactus Tech Blog

6 min readAug 22, 2020

Many researchers across various universities are doubling down on NLP research.

One of the biggest showstoppers for any data science project is the lack of relevant training data.

2020 and the next few years are going to be very exciting for the businesses and teams adopting data science.

Artificial Intelligence is a hot topic today, and while there are some groups who claim that another winter may be coming, a larger population (including myself) strongly feel that this time, summer is here and it’s going to be one big party. In fact, with advances in both hardware and software, there may not be winter in sight for a long time. Below are the top 10 trends I am excited about in 2020.

Quantum Computing

credit: https://en.wikipedia.org/wiki/Qubit

Towards the end of 2019, Google’s announcement of quantum computing power, which outperformed a standard supercomputer by a factor of over a billion, caused waves in the media. While there may not be any direct use for it in real-world applications today, there is extensive focus on quantum computing in research labs at companies such as Google and IBM. Therefore, in 2020 and beyond, we are sure to make definitive leaps in quantum computing, and soon, it may become viable for practical applications.

Advances In Natural Language Processing (NLP)

Natural language processing (NLP) has been an important focus for a while, and with the recent entry of transformers and attention models, things are moving ahead full steam. A few months ago, Elon Musk’s OpenAI released the GPT-3 model. The model is based on the transformer architecture model, which was trained on up to 175B parameters. This changed everything. The model achieved SOTA on various language model tasks and continues to do so on many private tasks.

Many researchers across various universities are doubling down on NLP research. From newer contextualised word representations to sequence-to-sequence modeling, a large number of resources are being devoted to NLP and enabling the machine to understand and respond to language, just like humans.

Data Repositories And Marketplaces

One of the biggest showstoppers for any data science project is the lack of relevant training data. Many teams end up spending up to 80% of their time collecting the right training data. Over the past year, many independent teams, open-source projects, and publicly funded projects have opened access to many structured datasets. Organizations are also getting into the business of monetizing the data that they either have access to or are functioning as data aggregators that collect, normalize, and structure data in formats that can be used by other data science teams. This new line of business will witness a rising trend in the coming years.

Annotation As A Business

While data collection and aggregation occur in parallel tracks, a critical piece, which involves getting the same data tagged, annotated, and ready for training, is also picking up steam in a big way. Tools and services like mechanical turk-which enables the crowdsourcing of annotation-already exist, but now there is a growing realisation that this can actually be a viable business. Many developing countries, especially those that operate in cheaper labour economies, are developing a business around tagging data with large teams of people selecting, tagging, and labeling input data and making them ready for consumption.

Augmented Reality (AR)

Since the release of the Google Glass and Microsoft HoloLens applications, including others in the last few years, there have been significant advances made in AR. This year, we saw patents and announcements from various companies in AR glasses, which will allow people to interact and work in a real-world simulated environment. The smart glasses of 2021 will change the way the world works and communicates.

Data Analytics As A Service

Analyzing data at scale requires a good setup of software and hardware. One has to set up machine learning clusters, install the necessary software-even the ‘ plug and play’ ones-and incur a large upfront cost before the first set of data can be analyzed. However, there are many SaaS and self-service solutions available where one can get started with pennies on the dollar. In addition, with tools and techniques such as AutoML available with almost all providers, high-powered data analytics is now available to anyone.

AI Explainability

AI models, especially those that deal with larger derived dimensions of data and data gathered from various touchpoints, are largely deep-learning model black boxes. The data goes in and the decision (output) comes out. There is very little reasoning behind why a certain decision was made. As we move into the future where AI is being used in applications such as medical diagnosis, self-driving vehicles, automated trading, and even in recruitment and other decision-making functions, it becomes important to ensure transparency and visibility on why a certain machine-learned model reached a particular decision. There are many open-source tools and frameworks that have yielded good early results in the interpretation of AI models.

Responsible And Ethical AI

If a self-driving car is faced with two choices, both of which result in some harm to a human, which decision should the model make? Should it be based on data OR should there be some override rule?

If a very novel advancement in AI has been made, is it okay for it to be used in a military application that will eventually be used in warfare?

These are some of the questions, along with bias, data protection, discrimination, etc., that responsible and ethical AI attempts to address. There is a large movement around the ethical use of AI, and many companies are creating dedicated task forces and coalitions that deal with this.

Data Warehousing And Data Management Platforms

Warehousing has been around for a long time, and it has served as the primary step for organizations to collect and structure data such that it starts making sense. The past few years have seen the emergence of many warehousing services and platforms that can be used by data engineering teams to kickstart their data warehousing and data laking journeys.

Data Science As A Basic Competency For Organisations

Many years ago, statistical and big data analysis were seen as “expert” skills that were farmed out to analytics teams, but this changed a few years ago. At present, many business teams prefer having their team members use analytics tools to analyze data.

Similarly, today, there is a movement where data science skills are being built within business teams. Business teams are learning how to manage data science projects, expectations, and timelines, and how skills and team management are different from those in traditional software development teams.

In sum, 2020 and the next few years are going to be very exciting for the businesses and teams adopting data science and related areas of work.

Originally published at https://inc42.com on August 22, 2020.

— — — — — — — — — — — — — — — — — — — — — — — — — — —

If data science interests you, and you think you have what it takes, send us an email describing why. Find our email address at the txt record of first 11 digits of pi of cactusglobal.io. We at CACTUS are on the lookout for new team members.