Custom Language Data Analysis vs LLMs

Alex Tantos
Virtually Every Language
3 min readJun 13, 2024

--

Photo by Etienne Girardet on Unsplash

We are currently experiencing the hype surrounding large language models (LLMs). However, slowly but steadily, people will begin to realize their limitations and reconsider their strengths. For instance, while LLMs are highly advanced and capable of generating and “understanding” complex language, they cannot always provide the specific, nuanced insights that custom language data analysis on a specific corpus can, especially when using specialized programming languages such as Julia. Even though the context window is continually growing and fine-tuning remains an option, it is neither practical to expect that your data will fit within the context size of LLMs nor that you will regularly fine-tune foundational models to interrogate every language variety you have in mind. Furthermore, even fine-tuned models are not reliable enough to base one’s research conclusions and results on their output. Here are five aspects that proof the point of this article:

Tailored Analysis

Custom language data analysis enables highly tailored approaches to specific research questions. In social and political sciences, researchers often need to focus on specific variables and the relationships…

--

--

Alex Tantos
Virtually Every Language

Associate Professor of Text and Computational Linguistics at the Aristotle University of Thessaloniki. Language data analysis and modeling enthusiast.