When will you use a third-party algorithm and when will you develop your own?

Published in

Enrique Dans

3 min readDec 1, 2023

IMAGE: A chatbot’s robotic head with an empty speech bubble on a blue background — IMAGE: Mohamed Hassan — Pixabay

Here’s a conundrum: given that an algorithm is only as good as the training that has gone into it, and that training is carried out with data, if we’re going to use an algorithm marketed by a third party, which part of its training will come built in, and which part will we want to train ourselves, while ensuring that the data remains under our full control?

It’s an issue I have encountered on many occasions: when I ask my students to come up with a simple algorithm with their data, instead of using their company’s, or that from real repositories, they resort to Kaggle or similar repositories to use third-party data, already conveniently anonymous, and that doesn’t expose them to any risk.

The issue is becoming ever more important: since OpenAI started offering companies the possibility of training their own assistants from the ChatGPT base a few weeks ago via a simple process within the reach of most people, the market has started to experiment and train ChatGPT with all kinds of data from multiple industries. The problem is that these chatbots are leaving a lot of the data used in their training on display.

Using a simple prompt injection, these chatbots easily reveal data that was not intended to be revealed. The concern is not just about what might happen if…

When will you use a third-party algorithm and when will you develop your own?

Written by Enrique Dans