TDS Archive

An archive of data science, data analytics, data engineering, machine learning, and artificial intelligence writing from the former Towards Data Science Medium publication.

When to NOT use AI or use it based on my experience

--

Photo by V Babienko

I have a PhD in Machine Learning and 16 years’ experience deploying scalable solutions and data science in product to maximise revenue and user retention, including on 26M users at JustGiving. As a mentor and strategic data science advisor for start-ups and VCs, here is my expert opinion on when you should and shouldn’t use AI. I’m often asked how to “how do I get started with AI?”, “how to integrate AI in my product?”, or “how can I get a research grant with AI?” and given the massive page reads on my How to work in Data Science, AI, or Big Data based on my experience I thought I would put this together in a new post around specifically around when it is appropriate to use and not to use AI.

What is Artificial Intelligence (AI)?

First AI is a generic term, that studies human brain and intelligent systems, the AI is usually a moving target, you could say your car, washing machine, and phone have some form of AI built in. AI includes multidiscipline such as mathematics, psychology, physics, biology, neuroscience, and computer science, that aims to mimic some form of intelligence on a computer. How it is implemented is using machine learning (ML), natural language processing (NLP). There further subdivide into the likes of neural networks, deep learning, reinforcement, ensemble learning etc.

Let’s focus on machine learning. ML is not magic, the main difference between AI and traditional programming is that rather than writing the rules the models learns from the data through training. A model has parameters and weights that are adjusted as it learns using different algorithms. We then use a test dataset to check the accuracy or performance of the model.

The data is everything in ML / NLP in B2B an B2C

Without much data, your algorithm will not be able to be trained or tested. The exceptions are for example genetic algorithms or reinforcement learning, which have a fitness function, reward/regret that they are optimising. But in my view they have limited application in the business world unless you are running a form of simulations, driving, playing video games, weather forecasting, organic chemistry modelling etc. I once had a discussion with a high performance computing (HPC) vendor to give me a use case where I would use simulations in the e-commerce or B2C, and they struggled to answer the questions I had. Why would I simulate events or user when for example at JustGiving I had 26M real users to run tests on?

I have no data but still want to use AI

If you don’t produce, capture or have historical data, e.g. you have a new use case or are a start-up, then it is still possible to use ML and NLP. Find the data for your company by looking for existing open datasets, such as Kaggle, AWS, U.S. Government’s open data or license it. This can be used to train and test your initial models. As you company evolves and becomes more mature you will be able to capture more data and improve the models with better training/testing data.

Are Data Scientists the Right people

For those who still think that data science is the sexiest job of the 21st century, think again you will be spending 80% of your time on data preparation tasks that include cleaning, normalising, standardising, shaping the data. Only then can you get to the cool AI modelling and visualisation everybody talks about. If you look at it from that perspective, there are many parallels with what a data analyst and business intelligence report creators do this day in day out, minus the AI modelling, as they operated on business data and understand the key performance indicators and business metrics. I like to upskill existing staff to be data scientist as they have business and domain knowledge and work with the data already.

In addition, successful data scientists rarely work in isolation, they need to be part of a wider business intelligence/data engineering team that prepares the data in some queryable form, DevOps/developers that integrate changes in product, and machine learning engineers that help push models in to a production environment.

Offline data science Vs in product data science

There are two types of ways you can use data science in your organisation: offline data science and data science in product.

Offline data science can be on a laptop and cover tasks such as market research, customer/client analysis or A/B testing. Here the data science models do not get deployed in production rather the historical data is analysed. There are huge benefits in doing this, as it complements your existing business intelligence and helps shape products for success.

In my view to get the most out of data science, it needs to be embedded directly in your products and services, rather than an after thought where you are only analysing the offline data after the event or action. This needs a much larger commitment in terms of budget, team, executive support and company strategy.

Data science in product is much more complex, costly and you need the right team in place. A lot of the data scientists I have met or interviewed are not strong developers. They need to think beyond using notebooks, that are good for exploration but not for deploying models into product. Some of the cloud providers do help with creating batch inference engines and inference APIs. However it’s important that they understand and learn the best software engineering practices, and start to think like a developers with testing, deployment pipelines, and scalability. I also don’t think they should do this alone, you need DevOps, developers and data/machine learning engineers working with them. This is essential if you want to have the models deployed in product to make the matches, recommendations or predictions directly to your users as they navigate your site or app. Every website and app is completing for screen time, so any improvements can make a huge difference in retention (B2C) or productivity (B2B) for example. You then of course need to measure the performance of the models versus randomised groups and/or hardcoded rules, and take action if the performance deteriorates.

Is it worth adding AI?

In business, we often talk about return on investment (ROI) to build or buy anything. The same applies for ML and NLP, how much effort, time and money will it costs you and what will be the return versus having a hardcoded set of rules that a full stack developer can deploy in 1 day?

Let’s look beyond the AI hype:

  • Use case: where will it be used? I found that in business this is mostly in the areas of matching, recommendations, predictions, or suggestions to your users or customers, in your domain or sector.
  • ROI: will it add value by increasing revenue/productivity, and/or decreasing costs/time taken. Always compare this against using humans for a task, or using normal software development processes without AI.
  • Easy to support: Netflix Prize is a great example when the model with best performance was NOT used in production due to the engineering cost and complexity in running it in product. Simple models, pre-trained models or no AI are better you go quicker to market, especially if these are already solved use cases. Exception are when you are tackling a new use case, doing true research & development, or University research of course where you publish research papers.
  • Monitoring: you need processes to monitor model performance overtime and as any degradation in inference could impact ROI. You need to have contingencies and quickly roll back or retrain a new models if that happens.
  • Do you have the right data: sufficient and right data from existing internal and/or external sources suitable for the chosen use cases.
  • Dream team: data scientist cannot work in isolation, they need to be surrounded by developers, engineers, product managers, and DevOps teams.
JustGiving Data Science Team Reunion

Next steps

Most people with a developer or computer science background should be able to understand the concepts of ML and NLP. Do not be tempted to recruit an AI army of data scientists only, they still need the right data, use case with ROI and dream team which is something I think you should start with when making your data science related business cases, research proposal or speaking with investors. Feel free to connect with me on LinkedIn if you have questions/comments, or if you need short term mentoring, strategic consulting or a data science executive advisor.

--

--

TDS Archive
TDS Archive

Published in TDS Archive

An archive of data science, data analytics, data engineering, machine learning, and artificial intelligence writing from the former Towards Data Science Medium publication.

Dr Richard Freeman
Dr Richard Freeman

Written by Dr Richard Freeman

Author, Advisor, Co-founder & CTO @ Vamstar, Series-A funded startup Tech4good enthusiast

Responses (1)