Turning Words into Works of Visual Art, No Syntax Strings Attached

Sadaf Davre
VisUMD
Published in
5 min readNov 1, 2023

A Much-Needed Tool for Navigating the Complex Terrain of Data Visualization, Powered by Your Favourite AI

Image courtsey of Brooke Cagel

Have you ever stared into an unruly new dataset and wondered — “How am I going to sift through all of this? What visuals would best represent the data? This is going to take forever!” Well, worry no more. LIDA or ‘grammar-agnostic visualizations and infographics’ tool is here to take on those worries, and be a helpful co-pilot in your analysis journey.

With its intelligent, multi-stage pipeline, packaged in a neat python library and powered by Large Language Models, LIDA takes on the heavy lifting. It not only summarizes data, identifies visualization goals, but also generates the visuals, and even stylizes them! Its as if you are writing in a prompt for your analysis and visualisation in Chatgpt online- just much more accurate and a lot more hands-on. Developed by Microsoft researcher Victor Dibia, LIDA is open-sourced on GitHub, making it accessible for our personal projects as well. It contains of 4 key modules as shown below:

Figure 1: Flowchart summarizing the 4 modules in the LIDA package , taken from the LIDA original site

The first module, Data Summarization, is akin to a first mate that immediately evaluates the lay of the land — or in this case, the dataset. It rapidly scans the data, picking out key metrics, average values, outliers, and other noteworthy points. This summary serves as a roadmap for the subsequent stages, allowing users to grasp the key features of the data without getting lost in the weeds. No longer do you need to execute endless commands to get preliminary insights; LIDA does that for you, serving up crucial information on a quick silver platter.

The next stop is Goal Identification. Depending on the nature of the dataset and the user’s specific objectives, this module identifies what types of visualizations would be most effective. Whether you’re aiming to identify trends over time, compare categories, or examine relationships between variables, LIDA determines the best methods to achieve those goals in seconds as opposed to the hours it may take. This is analogous to a navigation system selecting the best route based on current traffic conditions and the driver’s destination.

The third module, Visualization Generation, gets to the heart of the matter. Once the best visualization methods have been identified, this module automatically generates the relevant charts, graphs, or plots. It chooses the most suitable types — be it bar charts for categorical comparisons, line graphs for time series data, or scatter plots for relationship analysis — and populates them with the data. What would have taken hours of manual labor and fine-tuning can now be accomplished in a fraction of the time.

Finally, there’s my favourite module, aka the Stylization module — which is essentially the cherry on top. This module takes the generated visuals and refines them to make them more aesthetically pleasing and easier to understand. It adjusts colors, fonts, and layouts according to best design practices. It’s like having a professional designer review and enhance your visuals, ensuring they are not just accurate but also engaging. As a former graphic designer, I approve!

Now ofcourse, I had to take this for a test drive myself to see its capabilities firsthand. I started with going to its github repository and getting an overview of whats needed to run it- which included just simply installing the package and inputting your data with a personal API key in the command prompt line. I got mine from OpenAI. They also provide a optional bundled ui and web api that you can use , and I did just that. My code looked something like this:

pip install lida
set OPENAI_API_KEY=yourkeyhere
lida ui --port=8080 --docs

Feel free to copy paste and run these in your command prompt, and remember to input your API key in the ‘yourkeyhere’ space [You get a 5$ credit for your API key when you first open an account on OPENAI]. After loading these simply navigate to http://localhost:8080/ in your browser. You can also explore LIDA from this tutorial notebook. Your localhost website should look like this:

Screencap of the demo section of the LIDA web api

Upload the data you want to explore or just chose one of their provided datasets! This video gives an overview of using this user-friendly site:

Walkthrough video from the LIDA original site, displaying its features in use

These are just some of the beautiful illustrations that LIDA can create from your visualisations :

Figure 2: Output of the LIDA infographer module which supports the generation of these data-faithful infographics. Each infographic is conditioned on a generated visualization, as well as natural language style tags which can be used to customize the appearance of the chart. Image taken from original paper.

While LIDA assists you in navigating through complex data by generating summary, goals, visualizations and while it’s proficient with certain visualization styles thanks to its training, it still has room for improvement in some areas like handling diverse drawing styles and explaining its output clearly. But hey, no one’s perfect, right? Even LIDA has a few rough edges, and here they are:

Not All Pictures Are Easy: LIDA’s adept at certain drawing styles learned from code online, but struggles with others like those in Tableau or PowerBI. It’s akin to mastering pencil sketching but not watercolors yet.

A Bit Slow on the Draw: LIDA, with its big brain (GPT-3.5), takes its time to process requests. So, quick drawings might not always be its forte.

Not Always Clear Why It Drew What It Drew: At times, LIDA’s output can leave you puzzled, akin to decoding an artist’s abstract work. The explanations provided may not always clarify the intent.

How Good is LIDA, Really?: Assessing LIDA’s efficacy in rendering useful visuals from data is subjective, much like judging a painting. Understanding this can pave the way for refining LIDA further.

In conclusion, LIDA is a potent and a powerful tool that significantly simplifies the process of data visualization, acting as a bridge between complex data and understandable visuals. However, its journey towards perfection encounters a few roadblocks like adapting to various visualization styles and ensuring a swift response. The path ahead for LIDA is filled with promising possibilities of enhancements - making it a subject of intrigue and potential in the evolving landscape of data analysis tools. Through continued refinements, LIDA is poised to become an even more valuable ally for anyone looking to unlock the narratives hidden within their data effortlessly!

References

--

--

Sadaf Davre
VisUMD
Writer for

I like making sense out of chaotic data. Current Masters student in Information management, with a bachelors in finance. https://www.linkedin.com/in/sadafdavre/