10 things you should know before heading for AI/ML/Data Science in 2021
Data Science and Machine Learning is not a bowl of cherries
Since Data Science was proposed as the sexiest job of the 21th century in 2012, a lot of people from all kind if different fields started to move to data science or related machine learning roles. Solving complex problems with fancy artificial intelligence algorithms and a good pay sounds attractive. A lot of companies jumped on the hype train and now offer boots camps to learn data science/AI/ML in less than one year. Here are 10 things to consider before joining such a bootcamp or heading for a career transition into machine learning.
1. Job titles are not yet well defined, required skills vary a lot, even in 2021
After startups noticed that artificial intelligence is now a powerful buzzword to be funded, they started to rename existing job offers from data analyst/statistician to data scientist or something related. The job title sounds sexier, so they get more applications for the job postings.
But if you read the job posting, you notice that some roles are completely different. Some want business analysts, answering questions with SAS, SPSS. Some want data engineers building Big Data Hadoop systems and some want deep learning researchers using TensorFlow and neural networks, but they might call them all data scientists. All these types are very different and require different skills. In recent years, these types have emerged: Data Scientist (Advanced Analytics), Machine Learning Engineers, Data Engineers and Applied Scientists/Researchers. Focus on one.
2. There is no shortage for graduates
As already mentioned, a lot of people want to become a data magician. Not only computer scientists, physicists and mathematicians, but also economists, psychologists and other natural scientists with quantitative background. The problem is, that most of the companies are not looking for fresh graduates, some do not even know, for what they are looking. Some might expect to hire one data scientist and solve all their problems. And because they do not really understand the requirements, they hire fresh undergraduates or bootcamp grads, having all the buzzwords on their CV. 85% of data initiatives fail probably, one reason might be because of that. Furthermore, according to techrepublic, the demand for data scientist was already starting to shrink in 2019. Today you can read a lot of frustration from young data scientists, having problems, finding a job, also due COVID.(link, link, link)
There might be a skills shortage, but not an applicant shortage. It’s not unusual for entry-level or internship openings in data science to receive hundreds of applicants. When employers talk about shortages, they’re generally talking about a lack of experienced professionals.
Glassdoor senior economist Daniel Zhao
3. Without an academic degree it is difficult
The idea to get a data job without any academic education is daring. It may be possible if you are a genius or lucky, but in general you will hardly get an interview call. Artificial intelligence is about statistics and math and usually these two are the hardest parts in the studies. You might not need all of it, but usually you will not be the only applicant and you compete with people with PhDs. All these MOOCs and bootcamps cannot teach you the fundamentals in few months, you need more time. Read the job postings and you will notice that mostly masters or even PhD is a plus, depending on the roles. With that in mind, its hard but its not impossible.
88% have at least a Master’s degree and 46% have PhDs.
kdnuggets
4. Applied machine learning is about building datasets
Kaggle challenges and university courses have one in common, which is not true in industry: A data set is available and prepared. To learn exploration, preprocessing and modelling it makes absolutely sense, but a huge part of the work is to get to this point. Machine learning is rewarding if it delivers value, but it takes you a lot of observation and experimentation until you get good results and even longer until you get the data clean. If you are a perfectionist and your frustration tolerance is low, don´t go for applied machine learning, it will make you mad.
5. Deep Learning is not widely adopted
Neural networks made artificial intelligence popular in the last years, but they have several drawbacks. They are hard to train and to architect, they need a lot of time to tune and they are prone to over-fitting and very computational intense. Infrastructure is getting better, but still not where it should be. If you want to use neural networks, don´t head for a career as data scientist in the industry. There are very few companies using neural nets, because it’s too much magic and in many cases traditional methods are good enough. If you want to use deep learning focus on academia and research or to some extend startups specializing on ANNs.
6. Perception of AI is wrong
Artificial neural networks are inspired by brains, but they are very far away from it. I don´t see any AI competing with the human. The perception of AI in public and in science is quite different. The problem is that it is hard to explain, why AIs play DOTA 2, make deep fakes or compose music and are still not “intelligent”. What seems to be forgotten, is that AI is still pattern recognition and it fails pretty fast if some pattern change. It does not understand, it does not think and it does not dream. You will be asked, why your AI system can not do XYZ and you will probably not be able to fix it. Now explain, why AI can defeat world champions in GO, but can not learn how to predict some “easy” business problem thing.
7. Lots of AI is actually not AI
In 2019 there was a research about European AI startups. They basically found that 40% of AI startups are not using AI at all. Some even just hired humans to fake AI. The reason for that is quite easy. AI systems require data, time and people to build it, which is expensive. Sometimes it’s easier and cheaper to let humans do the work. Don´t be that “labeling things” guy, who is just there, to proof that your startup has AI expertise. Be skeptical about data science job postings, ask about their data before joining them.
8. Lifelong learning
Spark, TensorFlow, PyTorch, keras, scikit-learn, pandas are tools, which makes your life easier. These tools change, they are replaced by better tools or they stay forever, who knows. But they are just tools. You should not focus too much onto those tools, focus on techniques and problem solving. If you love keras, but PyTorch solves some problem better, learn PyTorch. You will notice that the idea behind these tools and frameworks is often very close and they work similar. Same for programming languages. Don´t be that guy that uses C++ to prototype ML models, because he was too proud to learn Python, a scripting language. Be open minded.
9. Domain matters
Machine learning is about data. Data is about domain. Understanding the domain is necessary to understand the data. The idea that a data team can solve any problem with data and without domain expertise is dangerous and will not work. There are so many hints in the data, which only can be understood if you know, how the domain works and furthermore how the processes work. Not just the business view, but also the technical view. Playing around with techniques is not enough. To understand domains, you are required to have good communications skills, at least as data scientist for advanced analytics.
10. Critical thinking personality
Critical thinking is one of the most important skills. A lot of projects are successful only because someone questions the current approach or objective. Is the target variable really, what we want to predict? Do we really need machine learning here? Do we spend one week more to get 1% more out of it? Can we really trust that data? Is is a self-fulfilling prophecy? Asking these questions is quite hard, because often we don´t like the answers, but it just necessary!
Disclaimer: All these things are biased and my personal view, even if I did a lot of research for other opinions.
If you are really interested in machine learning and data science, I am the last one who want you to stop, but don´t believe promises from consulting companies, who offer boot camps. Don´t do it because it is hyped, remember all hypes end at some point.
Originally posted on my personal Blog