Commentary: Gzip + kNN Beats Deep Neural Networks in Text Classification

Heinrich Peters
5 min readJul 13, 2023

Thats right… A 14-line python script using a 30-year-old data compression algorithm in combination with a 70-year-old classification algorithm beats modern deep learning methods… Under certain conditions.

What Just Happened?

In the world of Natural Language Processing (NLP), text classification is a fundamental task that has seen significant improvements with the advent of neural networks. However, these networks can be computationally intensive, requiring millions of parameters and large amounts of labeled data. This makes them expensive to use, optimize, and transfer to out-of-distribution cases in practice.

But what if similar results could be achieved with a simpler, more lightweight method? A team of researchers from the University of Waterloo and AFAIK have proposed just that in their recent paper “Low-Resource Text Classification: A Parameter-Free Classification Method with Compressors” (Jiang et al., 2023).

Their method combines a simple compressor like gzip with a k-nearest-neighbor classifier, requiring no training parameters. This non-parametric alternative to deep neural networks has shown promising results: It was competitive with non-pretrained deep learning methods on six in-distribution datasets and even outperforms BERT on five out-of-distribution benchmark datasets chosen by the authors.

A Simple Trick: Compression

The researchers’ method is based on the intuition that objects from the same class share more regularity than those from different classes. They use a lossless compressor, such as gzip, to capture this regularity, which is then translated into a compressor-based distance metric. With the resulting distance matrix, they use a k-nearest-neighbor classifier to perform classification. This approach is simple and lightweight as it bypasses the need for preprocessing or training, and it operates without the computational demands often associated with modern NLP methods. Compressors are agnostic to data types, and non-parametric methods do not impose underlying assumptions. Moreover, the approach is implemented in only 14 lines of python code…

The researchers tested their method on a variety of datasets to investigate the effects of the number of training samples, the number of classes, the length of the text, and the difference in distribution on classification performance. The results were surprising. Their method achieved results competitive with non-pretrained deep learning methods on six in-distribution datasets. It even outperformed all methods, including pretrained models such as BERT, on all OOD datasets. Furthermore, it excelled in few-shot settings, where labeled data are too scarce to train DNNs effectively.

Commentary

It might be tempting to dismiss this finding as a gimmick, but I think there is more. Certainly, this research presents an ironic twist in the narrative about progress in NLP. It suggests that sometimes, revisiting simpler, classic methods can lead to astonishing results. The simplicity and efficiency of the compression method, while not entirely new (Morten et al., 2005), underscore the potential of such approaches in a field often dominated by complexity. The paper serves as a reminder to the academic community that innovation can come from revisiting and rethinking established methods. It is a testament to the enduring value of foundational principles in the face of rapid technological advancement.

While the results of this research are undoubtedly interesting — and also funny — it is important to approach them with a balanced perspective. Does this spell the end for deep learning in NLP? Of course not. Deep learning has proven its worth in a multitude of complex tasks and continues to be the future of NLP. Just think about the amazing progress we are witnessing in LLMs on a daily basis. Their ability to model subtle patterns and relationships in data and their capacity to learn from vast amounts of information are not easily replicated.

Also, the method proposed by the researchers may not be a universal solution after all. It is likely that its performance is highly dependent on specific conditions and constraints. As the title of the paper clearly indicates, the approach is intended for situations where data are scarce or where computational resources are limited. It may also be particularly effective for tasks where the data exhibit strong regularities that can be captured by a simple compressor. After all, the proposed method seems to pick up on orthographic rather than semantic similarity.

In situations where the data are complex and the relationships between elements are not easily captured, deep learning methods will undoubtedly hold the upper hand. The ability of deep learning models to learn and generalize from large amounts of data is a significant advantage in many scenarios. The latest pretrained models, for instance, can handle a wide variety of tasks without the need for task-specific training data, making them highly versatile. Otherwise, finetuning may be a viable option.

In conclusion, while this research introduces an entertaining and potentially impactful approach to text classification in low-resource scenarios, it does not diminish the importance of LLMs and other deep-learning approaches. Instead, it enriches our toolkit, offering an alternative method that could be advantageous under specific circumstances. It underscores that in the multifaceted world of NLP, there is room for a variety of approaches, each with its unique strengths and ideal applications. Plus, it may be fun to tease the deep-learning crowd by telling them that their work is blown out of the water by ancient tech…

Sources

For more details, you can access the full research paper:

Jiang, Z., Yang, M., Tsirlin, M., Tang, R., Dai, Y., & Lin, J. (2023). “Low-Resource” Text Classification: A Parameter-Free Classification Method with Compressors. Findings of the Association for Computational Linguistics: ACL 2023, 6810–6828. https://aclanthology.org/2023.findings-acl.426

An earlier source employing a similar approach can be found here:

Marton, Y., Wu, N., & Hellerstein, L. (2005). On Compression-Based Text Classification. In D. E. Losada & J. M. Fernández-Luna (Eds.), Advances in Information Retrieval (Vol. 3408, pp. 300–314). Springer Berlin Heidelberg. https://doi.org/10.1007/978-3-540-31865-1_22

--

--