On State Of Art #1: Leaderboard of the Chatbot Arena LMSYS: A Platform for Crowdsourced Evaluation of Large Language Models (LLMs)

Published in

AL Game Code

3 min readMar 5, 2024

Two robots fight art comic stiles — *Art by <https://www.instagram.com/carolsalvatoarts/*>

The field of Large Language Models (LLMs) is constantly evolving, with new models emerging all the time. In this dynamic landscape, there is a need for robust methods to compare their capabilities and determine which ones excel at specific tasks. This is where the LMSYS Chatbot Arena Leaderboard comes in, an innovative platform hosted on Hugging Face that employs crowdsourced human evaluation to rank LLMs.

The Leaderboard

The LMSYS Chatbot Arena Leaderboard is a novel platform hosted on Hugging Face that leverages crowdsourced human evaluation to rank LLMs. It is based on the Elo rating system, commonly used in competitive games like chess. In the context of the platform, LLMs take on the role of “players,” and their Elo scores reflect their performance in head-to-head comparisons. Users are invited to vote on which LLM they find more engaging, informative, or helpful in a specific conversation. Based on these votes, the Elo system dynamically adjusts the LLMs’ scores, generating a ranking that reflects both their perceived performance and their potential for improvement.

LMSYS Chatbot Arena Scoreboard: See the fierce competition between LLMs

Benefits

The LMSYS Chatbot Arena Leaderboard offers several benefits for different audiences:

Developers: The platform provides real-world insights into how users perceive their LLMs, serving as valuable feedback to guide future development and refinement efforts.
Researchers: The leaderboard serves as a powerful benchmarking tool to compare different LLM approaches and identify areas for future research.
Users: The platform allows users to explore the leaderboard to discover potentially valuable LLMs, tailored to their specific needs and preferences.

Additional Features

The LMSYS Chatbot Arena Leaderboard goes beyond simply presenting a ranking. The platform offers additional features, such as:

Detailed LLM profiles: Provide relevant information about each LLM, such as name, size, developer, and license.
Historical data : Allow users to explore how LLM rankings have evolved over time, enabling a deeper understanding of their performance trends.
Open platform: The platform welcomes community participation, allowing anyone to contribute their votes and help shape the LLM landscape.

Limitations

It is important to acknowledge that the LMSYS Chatbot Arena Leaderboard has some limitations. The subjectivity of human judgment can influence individual votes, and the platform’s effectiveness depends on the quality and diversity of its user base. Additionally, the leaderboard primarily focuses on the conversational aspects of LLMs, neglecting other crucial performance factors.

Conclusion

The LMSYS Chatbot Arena Leaderboard stands out as an innovative tool for LLM evaluation, harnessing the power of crowdsourced human evaluation. Despite its limitations, the platform offers valuable insights for developers, researchers, and users, contributing significantly to the ongoing dialogue and development in the fascinating field of Large Language Models.

LMSYS Chatbot Arena Leaderboard: Hugging Face

See the original in Portuguese — BR on ALGameCode

<https://huggingface.co/spaces/lmsys/chatbot-arena-leaderboard>

<https://algamecode.blogspot.com/2024/03/on-state-of-art-1-placar-da-arena-de.html>