LIDA | Automatically Generate Visualization with LLMs | The Future of Data Visualization

Sunny Bhaveen Chandra
4 min readSep 5, 2023

--

source: LIDA | LIDA: Automated Visualizations with LLMs (microsoft.github.io)

Recently, Microsoft launched a Library called LIDA ๐Ÿ“Š which is mainly capable of automatically generating visualizations and infographics with large language models (i.e. LLMs) ๐Ÿค–. LIDA is a library that can create data visualizations ๐Ÿ“Š and infographics that accurately represent the data. It is compatible with any programming language ๐Ÿ’ป and visualization libraries, such as Matplotlib, Seaborn, Altair, and D3. Additionally, LIDA can work with various large language model providers, including OpenAI, PaLM, Cohere, and Huggingface. More information about the components of LIDA is available at โ€” microsoft.github.io/lida/ ๐Ÿ”

LIDA takes advantage of the advanced language modeling and code writing abilities of leading LLMs to provide automated visualization features, including:

  • Data summarization ๐Ÿ“
  • Goal Exploration ๐Ÿ”
  • Visualization generation ๐Ÿ“Š
  • Infographics generation ๐ŸŽจ

It also offers operations for existing visualizations, such as:

  • Visualization explanation ๐Ÿ’ฌ
  • Self-evaluation ๐Ÿค”
  • Automatic repair ๐Ÿ”ง
  • Recommendation ๐Ÿ’ก

LIDA has the following Features -

Data Summarization:

LIDA is capable of summarizing large (quite similar to the pandas โ€œpd.DataFrame.describe()โ€ method) datasets into a concise, yet information-rich natural language representation. This representation serves as the foundation for all next operations in ML workflow. Below is a screenshot of the data summary for the iris dataset -

Automated Data Exploration:

If you are unfamiliar with a dataset, LIDA offers a fully automated mode that can generate significant visualization goals based on the data. This allows for free Exploratory Data Analysis. Refer to the following screenshot on the iris dataset -

Grammar-Agnostic Visualizations:

If you wish to create visualizations using Python libraries such as Altair, Matplotlib, or Seaborn. Or perhaps using R or C++? LIDA is grammar-agnostic, meaning it can generate visualizations in any programming language represented as code.

Infographics Generation:

LIDA can transform data into visually appealing, detailed, and engaging infographics using image generation models. This allows for the creation of data stories and personalization in terms of branding, style, and marketing.

source: LIDA | LIDA: Automated Visualizations with LLMs (microsoft.github.io)

Visualization Explanation:

LIDA can provide in-depth explanations of visualization code that it generated, which can be useful for improving accessibility, promoting data literacy, aiding in education, and assisting with debugging and understanding visualizations.

Self-Evaluation:

Large Language Models, such as GPT-3.5 and GPT-4, incorporate best practices for data visualization. LIDA utilizes these capabilities to generate multi-dimensional evaluation scores for visualizations that are represented as code. Below is a screenshot of the evaluation score and a spider chart -

Visualization Repair:

LIDA offers techniques to enhance visualizations automatically through self-evaluation feedback or to fix visualizations based on feedback provided by the user.

Visualization Recommendations:

Based on the context, such as the goals or an existing visualization, LIDA can suggest further visualizations that may be helpful to the user. These additional visualizations can provide comparisons or offer new perspectives which can be helpful in overall data analysis.

System Architecture of LIDA

LIDA has mainly 4 components as shown below -

source: LIDA | LIDA: Automated Visualizations with LLMs (microsoft.github.io)
  • Data Summarization: This stage condenses data into a brief yet information-packed natural language representation that serves as the foundation for all the next operations.
  • Automated Data Exploration: This stage offers a fully automated mode that formulates significant visualization objectives based on the given dataset.
  • Grammar-Agnostic Visualizations: This stage has the capability to produce visualizations in any code-based grammar, including Python libraries like Altair, Matplotlib, Seaborn, and others, as well as languages like R and C++.
  • Infographics Generation: This stage can transform data into detailed, adorned, and captivating stylized infographics using image generation models. (This feature/stage is a Work In Progress)

So, that was a quick overview of LIDA. For a complete walkthrough and setup you can refer to the following video -

References:

Additional resources:

[1] Python playlist:

[2] Playlist to create-test-publish Python packages and more:

[3] My YouTube Channel for more:

I hope these resources are helpful!

More blogs are on the way, till then Keep on learning! and keep on exploring!!

--

--