A Year of Insights with the MLOps & LLMOps Dictionary
AI Trends and Lessons in 2024
Over a year ago, our team at Hopsworks began consolidating an extensive dictionary of terms and concepts on MLOps, LLMOps, data modeling, and feature engineering. Our goal was to create a comprehensive terminology guide for building and managing ML solutions, with the primary focus on MLOps and LLMOps.
Many entries from the dictionary have proven invaluable to practitioners, reflecting the community’s growing interest in both foundational and emerging topics. In this blog post, we have selected the 25 most-read dictionary entries from the past years to highlight key trends and lessons learned. In the below diagram we placed the 25 most read entries on a spectrum of three categories, LLMOps, ML and MLOps.
Download the The Big Dictionary of MLOps & LLMOps
Background
The initiative of the dictionary began in the summer of 2023, but the traffic didn’t significantly increase until early 2024. During this period, a surge in frameworks and techniques related to large language models (LLMs) drove up search volume. The sharp rise in traffic was a result of closely tracking emerging keywords and trends, combined with continuous SEO maintenance. The graph below demonstrates the fluctuations of organic traffic to the dictionary and the slow build up of traffic as the keyword density increases.
To create a successful online guide, it’s crucial to balance SEO and keyword optimization with backlinking, while continuously expanding the guide with fresh keywords and content. Staying within the top 10 search results for a keyword ensures a steady flow of web traffic, which requires diligent monitoring and effort. If we refer to the graph above, we see a peak in early 2024 followed by a dip. This decline can be attributed to initially securing top keyword positions for emerging terms, but subsequently losing those positions due to a lack of content updates.
Now that we’ve discussed the technical SEO and content performance lessons, let’s explore what the dictionary has revealed about AI trends and developments.
The Dominance of LLMs in MLOps
Keywords: Retrieval Augmented Generation (RAG) for LLMs, In Context Learning (ICL), Context Window for LLMs and vLLM.
Although the dictionary originally focused on MLOps and machine learning terms, we noticed a growing interest in techniques, frameworks, and concepts related to LLMs. This prompted a stronger focus on LLMOps and a shift toward monitoring new and emerging keywords in this space. The increasing demand for understanding and optimizing the behavior, training, and deployment of LLMs, as well as fine-tuning them for specific tasks, led us to add more LLM-related terms to the guide. This shift isn’t surprising, considering the explosive growth of generative AI, especially following the public release of ChatGPT.
Looking at specific LLM-related keywords such as RAG for LLMs, Context Windows for LLMs, and ICL, we observed a sharp rise in interest in early 2024. As interest in these concepts grew, we added newer terms like vLLM and LLM temperature in late Q2. These keywords had low search volumes initially, as there wasn’t much information available yet, resulting in significant traffic as the interest for these concepts grew.
Feature Engineering and Management as a Core Focus
Keywords: Feature, Feature Pipeline, Feature Store, Feature Type, Lagged Features
At the core of Hopsworks AI Lakehouse is the feature store, an operational platform to orchestrate, monitor, validate, version features and models. Naturally, many of the terms covered in the guide were relevant to feature stores, feature management and machine learning systems. The interest shown for terms such as feature, feature pipeline, feature store etc. suggests that feature engineering and management remains a crucial area of focus in MLOps, and that proper handling of features is essential for building effective and efficient machine learning models.
Considering the importance of the feature store within the AI Lakehouse, extensive focus was put into maintaining the entry for the feature store. This entry eventually became its own guide within the guide, profoundly explaining feature stores as a platform that supports the development and operation of machine learning systems by managing the storage and efficient querying of feature data. Despite not being the most read entry of them all, earning spot number five out of 25 in total, this entry is important as many of the techniques, frameworks and concepts mentioned throughout the dictionary are applicable or related to feature stores.
Even though the rise of keywords in this category suggest that the market is learning more about feature management and data, there is still a knowledge gap in how to tie it together in ML solutions. The dictionary is meant to be one source of knowledge, however we try to expand on this through our blog page and articles. In our blog section we describe, in general, the use of effective feature management, including the use of feature stores and ML pipelines, and how it’s vital for scalable and reproducible ML systems. At Hopsworks we refer to the FTI (feature, training and inference) pipeline architecture which is an integral part of our infrastructure architecture and principal framework for explaining how to build reliable ML systems. This framework can be explored in our blog “From MLOps to ML Systems with FTI pipelines”.
Interest in Specialized Techniques for Model Optimization
Keywords: Flash Attention, Gradient Accumulation, RoPE Scaling
Some popular keywords, such as Flash Attention, Gradient Accumulation and RoPE Scaling, revealed that there is a growing attention to specialized techniques for optimizing model training and inference, particularly in the context of large-scale models. These techniques aim to improve efficiency, reduce computational costs, and enhance model performance and inclines that practitioners are still learning how to get the most out of their models.
Conclusions
After a year with the MLOps and LLMOps dictionary we can conclude that in order to maintain a popular knowledge guide, it requires a lot of hard work combined with a watchful eye over industry trends and developments. You have to constantly be one step ahead of the rest of the industry and almost foresee which new concepts and techniques will become popular in the near future. Throughout this blog, we’ve looked at the 25 most read entries from the dictionary and reflected on what these entries are telling us.
The insights behind each keyword underscores the increasing complexity and specialization in the field of AI, signaling a shift toward more sophisticated and ethical AI practices. MLOps professionals need to deepen their expertise in LLMs and related techniques, as these models are becoming increasingly central to AI applications. The complexity of managing, fine-tuning, and deploying LLMs requires specialized knowledge and tools as well as MLOps teams should invest in robust feature engineering practices to ensure model performance and consistency.
Learning advanced techniques for LLMOps and MLOps is becoming increasingly important for Data Science professionals. Taking the methods and techniques described in the dictionary into account from an early stage can facilitate a more efficient use of resources and better-performing models, especially in resource-constrained environments. Using the dictionary as a starting point when learning about frameworks for building ML and LLM systems, we also highly recommend you to deepen the knowledge with the insights provided on our blog page and from our Hopsworks academy videos.
Read More
On the theme of popular writing. Here are the top ten most read articles from our blog page.
- From MLOps to ML Systems
- Introducing the AI Lakehouse
- Feature Types for Machine Learning
- Facebook Prophet for Time-Series Machine Learning
- Common Error Messages in Pandas
- Faster Reading from the Lakehouse tp Python with DuckDB/ArrowFlight
- Guide to File Formats for Machine Learning
- 5 Machine Learning Myths Debunked
- Feature Store: The missing Data Layer for Machine Learning Pipelines?
- Modularity and Composability for AI Systems