2024 Trends in AI and Machine Learning
The AI and Machine Learning space had a lot of changes in 2023. The biggest impact areas were around Generative AI and Large Language Models (LLMs). As we enter 2024 I have compiled a list of trends that I am expecting for the year. I have broken these trends into Traditional ML trends and Generative AI trends.
Key Term Definitions
Before we start with the trends lets define and level set on the definitions of a few key terms.
Machine Learning (ML)
Machine learning is a subset of artificial intelligence that automatically enables a machine or system to learn and improve from experience. Instead of explicit programming, machine learning uses algorithms to analyze large amounts of data, learn from the insights, and then make informed decisions.
Machine learning algorithms improve performance over time as they are trained—exposed to more data. Machine learning models are the output, or what the program learns from running an algorithm on training data. The more data used, the better the model will get.
Artificial Intelligence (AI)
Artificial intelligence is a broad field, which refers to the use of technologies to build machines and computers that have the ability to mimic cognitive functions associated with human intelligence, such as being able to see, understand, and respond to spoken or written language, analyze data, make recommendations, and more.
Although artificial intelligence is often thought of as a system in itself, it is a set of technologies implemented in a system to enable it to reason, learn, and act to solve a complex problem.
Traditional ML Trends
Traditional ML will continue to be the more commonly used method by businesses in 2024. Since Traditional ML techniques are more mature instead of focusing on specific models or model architecture trends I will focus on 2 key areas that affect traditional ML (but also can affect Generative AI). Those key areas are Hardware Acceleration and Machine Learning Operations (ML Ops).
Hardware Acceleration
One of the most important elements for training large machine learning models is optimized compute. In most cases that optimization is in the form of GPU Acceleration.
I expect NVIDIA’s CUDA acceleration to remain as the most popular GPU acceleration framework in 2024 especially since NVIDIA has ~75% market share with consumers (according to the most recent steam hardware survey) and a staggering estimated 98% market share of the Data Center market. With that being said it will be interesting to see improvements in AMD’s ROCm and Intel’s One API which are both open source and have been improving rapidly but do not have any significant market share. As Training datasets continue to grow GPUs become the most effective way to train a new model and I see this trend continuing in 2024 and also including more gpu accelerated inference.
Besides GPUs there are specialty accelerator chips. Usually these chips are used for inference and can provide impressive performance per watt compared to CPU or GPUs which by their nature are both more general processing chips. In 2024 I would expect continued investments in these specialized accelerator chips from AZURE, AWS and GCP.
ML Ops
Continuing with Trends in Traditional ML, ML Ops has been a big focus area for many companies that are using ML models in production. ML Ops tools can help simplify the Monitoring, Management and Lifecycle’s of a Machine Learning Model and can even help reduce costs.
Model Monitoring/Insights have been getting a lot of attention lately. These are the easiest tools for many business leaders to understand and generally help with monitoring a model’s input and output while it is in production. This can help to give important information on data quality issues or feature and demographic drifts on the inputs and score drifts on the outputs. These tools also can be helpful in providing insights like when a model should be retrained or understanding model explainability (what inputs affected the model score).
Feature Stores were very popular concepts over the last few years with a few niche startups and hosted versions being added into the public cloud providers but for most cases the overhead of managing a feature store has been bigger than the associated value they bring. Specific companies (mostly centered around core data/features that are used by many models) will see benefits with feature stores but most companies with models that cover a bigger variety of datasets and dont have much feature overlap between them will not see much value.
Once a company has a model that was trained for a specific task that has become stale they need to retrain it in order to get the same predictive power it had when it was first trained. This has lead to a lot of tools that help to log the model artifacts, hyperparameters, training sets etc of model retraining which can help to automate some elements of the training/retraining lifecycle.
The newest addition into ML Ops is the vector database which is mostly used in conjunction with LLM models. For example a company can cache similar queries to a chatbot model and reduce the usage of the expensive computations by then returning the same answer for similar queries.
Generative AI Trends
Generative AI usecases in 2023 were the source of a lot of hype. That hype will continue into 2024 and currently those usecases can be grouped into two main categories, Text Generation and Image Generation. While both usecases use models with similar concepts, the competitors and trends are distinct.
Text Generation
Generative Text models have captivated both businesses and consumers.
Consumers have been using AI chats like Chat GPT for over a year now (released in Nov 2022). I expect the percentage of people using AI chat tools to increase in 2024 as more companies roll out their own AI powered apps.
Businesses see the value of making their internal information more easily accessible to their employees or customers while also reducing their IT helpdesk or customer service agent costs. Most of the business work seems to focus on connecting internal data to Generative AI chatbots which generally can mean fine tuning an open source model like Llama 2 or using a pretrained model like GPT 4 and in both cases hooking up to a vector database which stores a representation of a company’s internal documents.
Currently according to user feedback in Chatbot Arena (a crowdsourced A/B evaluation platform with over 200k human votes) the best text generation models are all closed source models. I expect this trend to continue especially since the barriers to gathering large training sets and GPU hours required to train large competitive models make open source projects less feasible than their closed source/for profit counterparts.
Image Generation
In 2022 the concept of AI art and AI image generation burst onto the scene. DALLE 2, Midjourney and Stable Diffusion were all released a few months apart in mid 2022 and have been getting consistent updates with that being said it seems like year over year while there have been improvements to the models, the usecases are more niche and don’t have as much of a broad impact as LLMs and Chatbots have. I expect that Image Generation will continue to improve (especially with in image text generation) and expand into AI generated Gif’s/Short Clips.
Sadly there is not currently a model comparison site similar to Chatbot Arena for Image Generators. That being said the best models by reputation are Midjourney, DALL-E 2 and Stable Diffusion.
Spotlight: Building LLM Applications using LangChain
LangChain is a framework for developing applications powered by language models. It enables applications that:
- Are context-aware: connect a language model to sources of context (prompt instructions, few shot examples, content to ground its response in, etc.)
- Reason: rely on a language model to reason (about how to answer based on provided context, what actions to take, etc.)
The main value props of LangChain are:
- Components: abstractions for working with language models, along with a collection of implementations for each abstraction. Components are modular and easy-to-use, whether you are using the rest of the LangChain framework or not
- Off-the-shelf chains: a structured assembly of components for accomplishing specific higher-level tasks
Off-the-shelf chains make it easy to get started. For complex applications, components make it easy to customize existing chains and build new ones.
Spotlight: AI Gateway and Workers AI
Disclosure: I work for Cloudflare as a Machine Learning Engineer. I do not work on the team developing AI Gateway or Workers AI and while I do hold Cloudflare stock my opinions on this are not otherwise sponsored or endorsed by the company.
AI Gateway is a plug and play service that most importantly allows applications to cache model requests and responses along with controls like rate limiting. It also can provide Analytics, Logs and other insights. You can use it in front of any API but it also comes preconfigured for a few of the top inference APIs like OpenAI, Amazon’s Bedrock, Workers AI and more. I think that components like AI Gateway which make deploying AI powered apps easier, provide usage insights and protect against spikes in demand will only become more popular in 2024.
Workers AI is an easy way to create AI powered apps using just an API call. It supports a variety of models and usecases and can be called from LangChain. I expect that in 2024 more platforms will continue to provide hosted model solutions where a developer doesn't need to train and host their own custom LLM but can use Retrieval Augmented Generation or similar techniques to give more generalized models proper context for their application. A few competitors that offer similar hosted model services are Amazon’s Bedrock service, Hugging Face’s Inference service and obviously OpenAI.
My Data Science Setup
I am a Z by HP Global Data Science Ambassador which means Z by HP sponsors my content and has provided me with the following hardware that I used to create this Data Science Project and the others I work on!
Desktop: HP Z8
Having a powerful desktop is very important for iterating quickly while working on Data Science projects. Not having to wait as long between updates to my code and outputs lets me get more done. The powerful GPUs in my Z8 are very helpful when training large deep learning models.
Laptop: ZBook Studio
Being able to work on Data Science projects when I am away from my home office helps me to maximize my productivity. The ZBook Studio I have has a powerful enough CPU and GPU to handle a lot of the iterations and initial models I run.
Monitor: HP Z38c
Having a big high resolution display like the Z38c helps me to improve my productivity by running multiple windows at the same time. I usually have it open with a python notebook and internet browser, chat app or video call open on the other side of the screen.
Preloaded Software Stack
Being able to have the software I use for Data Science come preinstalled and configured to manage package updates helps me get started on projects and worry less about package compatibility issues that arise from updating packages one by one.