EDITOR’S PICK | AI & DATA SCIENCE | KNIME ANALYTICS PLATFORM
Is Data Science dead?
In the last six months I have heard this question thousands of time: “Is data science dead?”
Now that there is AI, is it still worth it to train your own Machine Learning models?
Now that there is AI, is it still worth it to learn Python?
Now that there is AI, is KNIME still in business?
Now that there is AI, is data science still needed? Or shall we declare it dead?
Now that there is AI, do we still need data scientists?
And I would add my personal doubt to this long list of questions: Now that there is AI, do we still need graphic designers?
All very good questions, lest a bit dramatic.
1. Generative AI has grown up
You can ask Gen AI to generate an image so and so, and it will. You might not like it, and you can refine it, but it will. Or you can ask Gen AI to write a poetry around topic X, and it will. Professional poets might not like it, but it is good enough for a poetry contest at a dinner table among friends. You can also ask to write a letter to Santa or to write a complaint letter about product Y that does not work. And in both cases, it will.
So, yes. Gen AI can generate writings or images, perfectly acceptable for everyday’ s tasks. Yet, is it professional enough? Can it write a whole meaningful book or a plot of a movie? Maybe, if you just say exactly what to write, how to spin it, and yet for truly professional work it might not be good enough yet. Still, it might just suite the profanes.
2. AI can write Python code
AI can write Python code almost perfectly. Isn’t that fantastic? With less time for coding, we are left with more time to think about what to implement.
Have you read Dennis Ganzaroli’s post on the Minard’s chart about Napoleon’s campaign in Russia in 1812? Well, he had the data and he decided to visualize it with Python via the Python nodes that KNIME Analytics Platform offers.
Since version 5.1, KNIME Analytics Platform has integrated some Gen AI features in its framework, also known as KNIME AI or short K-AI. All Python nodes, in particular, offer a K-AI chatbot in the configuration dialog where the workflow builder can ask for advice on how to write that particular Python code he needs.
Dennis was basically chit chatting with K-AI, asking to write the Python code to visualize Minard’s data. While the first attempt was not satisfactory, he kept at refining it with further suggestions that K-AI accepted and included in the draft code. Without taking anything away from Dennis’ ability to write Python code, here K-AI — the artificial intelligence agent of KNIME — did all the work.
Note. Remember that K-AI is a KNIME extension and must be installed separately after installing the KNIME Analytics Platform Core. Remember also that to be able to query K-AI you need to be logged in with a free account on the KNIME Community Hub.
3. AI can create KNIME workflows
K-AI can also create KNIME workflows.
Since version 5.1, KNIME Analytics Platform has integrated some Gen AI features in its framework, known as K-AI, to help the user also build workflows. The fourth tab from the top on the left in KNIME Analytics Platform workbench leads to the K-AI chat area, if K-AI extension has been installed. Here the user can chat to K-AI for advice (“Q&A” option) or to build the workflow (“Build” option). This article by Vittorio Haardt teaches you what LLMs are and how K-AI can help you save time in assembling workflows.
K-AI is not as expert in building KNIME workflows as in writing Python code, but its workflow building skills are improving fast, release after release.
4. What is left for a data scientist to do?
All these new AI capabilities sound a bit overwhelming and make us wonder what is left for us to do. Especially as data scientists, model trainers, Python programmers, KNIME workflow builders, what is there left for us to do?
First of all, AI does not build things by itself, does not train models, does not write Python scripts, does not build KNIME workflows, just because. It needs to be told so and how. In Dennis Ganzaroli’s article, the author had to give the task and then keep refining till the result was what he expected. Even when using AI as a support, the project owner still needs to describe the whole process in subsequent steps: what to build, how, from which data, and so on.
Secondly, AI does not check for correctness. AI provides a result. Evaluating whether this is correct is not part of its tasks. AI still needs a check for correctness by an expert user: a check for data science correctness and for business soundness. For that, we need a skillful end user who knows what must be achieved and how.
In case the result is not correct or does not correspond to the prompted task, the final user needs either to refine it with better prompting or to manually add the missing parts. This takes us directly to the third point: fine tuning of AI models. There’s now the emerging tendency to fine-tune AI models. For this, you definitely need data scientists.
Going on with the parallel of AI generated images and graphic designers, AI can generate all sorts of images. However, only the graphic designer at the end can verify the image quality and help with improvements, if needed. Lately, I have seen too many ugly images, which did not improve when people told me they were AI generated. Being able to generate images with AI does not make you a graphic designer. Being able to correct them and improve them, does.
5. Do we still need data scientists?
Following all that, we will not need pure implementers anymore. Especially for basic tasks, AI will get better and better and will make pure implementers less necessary. However, we will still need professionals who know about the data science process and its mathematical requirements, know how to correct and redirect the AI efforts, and how to interpret the AI generated results. In practice, we are moving from creating and training models and services towards consuming and refining them.
Long story short, we still need data scientists. Though, the role will probably change in the next future. It will focus more on the algorithms and the data science process, rather than on programming. At that, low code tools will make the implementation of the whole process even more approachable and faster. We will need more general data scientists, well-versed in the mathematics of the algorithms, good communicators, and skilled at guiding and correcting AI towards the desired result.
6. Is Data Science dead?
Data science is probably not dead, but surely it is changing. The best data scientist will not be who can code faster, but who can better direct the assembling of the data science project taking into account data integration, data quality, data history, machine learning algorithms, result interpretation, and correctness of the process.
Will we become more generalist? Probably, in the initial phase of a data science project, we will need more generalists to work more on the process. However, we will still need expert data scientists to review and correct the AI output. Just like for graphic designers, data scientists will take advantage of the new faster implementation of solutions via AI but will still need to remain vigilant about the quality of the AI provided solutions.