Multilingual Evaluations in LLMs — a comparisonTL;DR: This is a continuation to my previous blog post exploring the question: What do the Large Language Model (LLM) creators mean when…Mar 24Mar 24
Responsible Disclosure and Multilingual LLMsTL;DR: When Large Language Models claim to be multi-lingual, I think they should clearly write what they mean e.g., what languages they…Mar 21Mar 21
Quality issues in LLM Benchmark datasetstl;dr: This blog post summarizes some of my current thoughts on acknowledging and addressing the quality issues in the datasets we use for…Jan 213Jan 213
Linguistics in the era of LLMsMany people who’ve been working in Natural Language Processing (NLP) since before ChatGPT (actually before 2013 or so, pre-word2vec era)…Dec 15, 2024Dec 15, 2024
What do multilingual LLM benchmarks really measure?Each time I see a new multilingual LLM benchmark release that is a machine translated version or crawled from random questionable websites…Dec 9, 2024Dec 9, 2024
Lessons from a reproducibility studyTL;DR — We reproduced some of the existing research on keyphrase generation from text and published a paper about our findings with…May 20, 2024May 20, 2024
Don’t ignore the relatively older language models!TLDR; I compared Flan-T5 series with GPT3.5 and GPT4 on a 6-way topic classification dataset for short social media texts, and the results…Jan 15, 20241Jan 15, 20241
Reality Check — Natural Language Processing (Part 2)This is a continuation from yesterday’s post summarizing some of the “Reality Check” themed papers from ACL 2023 which is happening next…Jul 7, 2023Jul 7, 2023
Reality Check — Natural Language Processing (Part 1)ACL 2023, NLP’s major annual conference, is happening next week, and has a “theme” track this year, called “Reality Check”. Here is how it…Jul 6, 2023Jul 6, 2023
Making sense of the state of the art in Keyphrase predictionTL;DR: Research papers don’t often report results consistently and it makes comparisons difficult sometimes. This post reflects my thoughts…Jun 23, 2023Jun 23, 2023