DATA STORIES | GENAI | KNIME ANALYTICS PLATFORM

Is Data Science Dead? An Interview with David Plummer

My Data Guest — An Interview with David Plummer

Rosaria Silipo
Low Code for Data Science
8 min readSep 13, 2024

--

My Data Guest — An Interview with David Plummer.

It was my pleasure to recently interview David Plummer as part of the My Data Guest interview series. Back in March, I published a Medium article titled “Is Data Science Dead?”, questioning whether AI is marking the end of data science. The piece sparked diverse reactions, with David offering a particularly insightful critique. I had to invite him to the podcast to hear more from him.

David is a Senior Consultant at Frazer-Nash Consultancy in Bristol (UK) and in this interview he shared his expertise as we explored whether data science is facing an existential crisis or evolving with AI. We also discussed whether AI is always the best solution to tackle data problems, and the challenges businesses face in adopting and operationalizing it today.

Rosaria: Could you tell us a bit more about your professional self?

David: I work in the data engineering department at Frazer-Nash Consultancy. My role involves engaging with customers to understand their needs, developing solutions with our team, and managing the delivery of these projects. We offer a wide range of services — including data science, software development, and data engineering — to help our clients achieve their goals. I originally trained as an electrical engineer and have worked in the telecommunication and healthcare sectors, before joining Frazer-Nash about a year ago to help expand their business and capabilities in health and social care.

Rosaria: What kind of projects does your company typically work on?

David: Frazer-Nash Consultancy has around 1,500 employees, with two-thirds of them being classical engineers, working in electrical, mechanical, and structural engineering. The other third are digital engineers, including data scientists, AI specialists, and software engineers. We work across various sectors such as defense, energy, transport, sustainability, and increasingly, the health and social sector. Our focus is on next-generation and future technologies, rather than large-scale existing projects. For example, we have co-developed digital models with clients to identify the most cost effective wind farm deployments, optimize routing of freight trains using predictive models, and modeling healthcare pathways to improve efficiency and capacity.

Rosaria: Given your extensive experience with AI, can you share your thoughts on its power and potential?

David: AI is incredibly powerful and is becoming more so over time. We’ve been exploring large language models (LLMs) to improve office productivity, from conducting research to summarizing data and even producing documentation. AI helps us save a lot of time and effort in these areas. We also use AI and predictive models across a wide range of customer problems including physical systems where predicting future failures helps reduce maintenance costs and improve operational performance; and real-time control systems where it is necessary to preempt actions to avoid accidents. AI’s ability to optimize processes and make predictions is becoming indispensable in both the digital and the physical domain.

Rosaria: Could you give us a practical example of a project where generative AI made a significant difference?

David: I was involved in this project where the client needed a way to analyze written requests for information and break them down into a series of intelligence gathering tasks that would be performed by autonomous robots. For example, find where an overhead power cable is damaged along a particular route. We used an embedding model to understand these requests and translate them into tasks. This approach aims to optimize the scheduling of the autonomous robots when there are large numbers of robots to schedule and large numbers of requests, each with different priorities, a task which would be difficult to achieve without intelligent systems.

Rosaria: On the other hand, has AI turned out to be useless in other projects?

David: AI is a term that has been around for many years, with technology evolving over time. In the 80s and 90s, there was a boom in AI focused on expert systems, but that approach died out because it was too costly and complex. Today’s AI, particularly neural networks, has a much bigger impact, and I usually focus more on the technology rather than the concept of AI itself.

AI has failed in the past because the solutions weren’t effective enough to meet people’s expectations. Today, however, with LLMs and neural networks, we have new opportunities. Some solutions work well, while others don’t, often because there aren’t enough experienced people to implement them correctly.

Going back to the question, even when projects fail, they still provide valuable learning experiences, leading to better solutions in the future. For example, instead of using pre-trained LLMs to complete tasks directly, with the risk that the LLMs could hallucinate options, we have blended the embedding model from the LLMs with existing technology, such as knowledge graphs, to provide a solution which is more explainable in terms of output and performance..

Rosaria: So, combining traditional machine learning methods with newer approaches is a winning strategy?

David: That’s right — this probably answers the main question of the interview: “No, data science is not dead”. Another important point to consider is that we’re moving away from small proof-of-concept solutions towards enterprise-scale systems. This shift requires a different set of skills and a more collaborative approach. It’s no longer about one person working in the lab on a small model; it’s about having a whole team of data scientists, data engineers, and data analysts who work closely with the business to interpret the results.

Rosaria: What challenges do businesses face when adopting generative AI?

David: While it may seem like generative AI has been around for a while, it’s really only been a couple of years, and many people are still trying to understand what it can actually do. A key issue we see with customers is figuring out the right use cases — how generative AI can be applied to deliver real benefits within their organizations. Many struggle to articulate these use cases and to define the problems they want AI to solve.

The broader goal of AI is to help customers understand how to use it effectively and identify the right applications. For example, AI can be used to help write documentation or summarize large amounts of information, which can be a big time-saver. When working with businesses, we help them articulate potential use cases, define the benefits of implementing AI, and then develop a technical solution. The technical aspect is just one part of the process. It’s also crucial to ensure that the AI solution actually works as intended and produces reliable results, avoiding issues like AI “hallucinating” or generating false information.

This requires a combination of technical skills and the ability to engage with people, understand their needs, build the solution, and then provide ongoing support to ensure it’s used effectively.

Rosaria: KNIME Analytics Platform has a GenAI extension. It offers nodes to connect to a GenAI provider and integrate results into workflows, as well as K-AI, an AI assistant for building workflows or writing code. Did you try them out?

David: I’ve tried both. Initially, I was skeptical about KNIME’s generative AI capabilities for creating workflows. It felt like a step too far, something for the distant future. But after trying it, I was pleasantly surprised — it actually produced workflows that made sense. Despite using KNIME for about 10 years and having my own way of doing things, I found this new feature to be quite useful, especially for training newcomers who need to get productive quickly.

As for the LLM nodes, I’ve only used local, open-source models with KNIME to understand how they perform in our development environment. These models are slower on my computer, so I haven’t pushed their capabilities as much as I’d like. Still, they’re a valuable addition, enabling me to create workflows for research paper analysis, breaking them down into text chunks, and building retrieval-augmented generation (RAG) workflows. This helps demonstrate complex concepts visually, making it easier for non-technical people to understand.

One of the aspects of KNIME I really appreciate is its ability to create automated Reproducible Analytical Pipelines (RAP). This feature ensures that once you’ve implemented a process, you can reuse it consistently, providing a level of auditability that’s hard to achieve with tools like Excel or manually executed scripting languages.

Rosaria: What about open-source LLMs? What is your take on them?

David: What we’re seeing is that commercial models are quickly outpacing open-source ones. Commercial models, like OpenAI’s, have the resources to invest in larger, more advanced models, making them difficult to replicate in an open-source environment. Open-source models are still great for getting started, proof of concept or with limited curated dataset for niche applications, but when it comes to production, the commercial options offer more reliability and innovation.

Rosaria: What data science skill do you think is essential for a good data scientist?

David: That’s a great question. To answer it, let me ask you another question: “What makes a good scientist?” At the core, it’s curiosity.

A good data scientist, much like a good scientist, needs to have a deep interest in understanding why things happen the way they do. They should be driven by a desire to uncover the underlying causes of data patterns and trends. Without this sense of curiosity and a genuine drive to solve problems, one might end up just processing data without truly exploring what it can reveal.

Rosaria: How important are soft skills for a data professional?

David: It’s crucial to have a solid foundation in technical skills. While I often emphasize the importance of engaging with customers and understanding their needs, this doesn’t diminish the value of strong technical abilities. In our team, some members focus heavily on these hard skills — coding, mathematical modeling, and implementing solutions — because that’s their strength and interest.

However, beyond technical skills, soft skills like active listening are vital. It’s easy to assume you know the solution before truly understanding the customer’s problem. But the ability to listen, empathize, and grasp the core issue is what allows you to create solutions that genuinely meet the customer’s needs. Resilience and perseverance are also key, especially when dealing with rejection or delays. These soft skills, combined with technical expertise, are what make a well-rounded professional in this field.

Rosaria: What should a data scientist focus on to gain more experience?

David: My advice is to go out and talk to people. Understand their problems, then reflect on how you could deliver a solution to address those issues. The key is to work on real-life applications. Start small, perhaps with a proof of concept, and once you have something tangible, use it to engage with others, gather feedback, and develop it further.

Rosaria: So, considering that generative AI is here to stay, what will its evolution look like in the future? Any forecasts for the next five years?

David: I think we’ll see continued refinement of current technologies in the next five years, but the real transformative changes will likely happen over a 10 to 15-year horizon. One major shift could be the integration of AI with autonomous machines, leading to breakthroughs in areas like space exploration, healthcare, and social care. The pace of change, however, will be determined by how quickly businesses and users can adapt to and absorb these new technologies. It’s not just about how fast we can develop the tech, but how quickly it can be adopted in everyday practice.

Rosaria: Before we say goodbye, how can people in our audience get in contact with you?

David: Feel free to connect with me on LinkedIn, just be sure to mention that you listened to this podcast.

Check out the original interview on KNIMETV:

--

--

Rosaria Silipo
Low Code for Data Science

Rosaria has been mining data since her master degree, through her doctorate and job positions after that . She is now a data scientist and KNIME evangelist.