Pandas AI — The Future of Data Analysis

Fareed Khan
4 min readMay 4, 2023

--

Imagine being able to talk to your data like it’s your best friend. That’s what Pandas AI does! This Python library has generative artificial intelligence capabilities that can turn your dataframes into conversationalists. No more endless hours of staring at rows and columns.

But don’t worry, Pandas AI is not here to replace your beloved Pandas. It’s here to enhance it! With Pandas AI, you can take your data analysis and manipulation to the next level. Think of it like a superhero sidekick — it’s there to help you save the day and make your life easier.

The possibilities with Pandas AI are endless. Imagine having a dataframe that can write its own reports, or one that can analyze complex data and provide you with easy-to-understand summaries.

In this quick guide, you’ll get a step-by-step walkthrough of how to use this cutting-edge library, regardless of your level of experience in the field.

Whether you’re an experienced data analyst or a beginner, this guide will equip you with all the tools you need to dive into the world of Pandas AI with confidence. So sit back, relax, and let’s explore the exciting possibilities that Pandas AI has to offer!

Official GitHub Repository — https://github.com/gventuri/pandas-ai

Codehttps://colab.research.google.com/drive/1rKz7TudOeCeKGHekw7JFNL4sagN9hon-?usp=sharing

Installing Pandas AI using pip

pip install pandasai

Our DataFrame contains information about various countries, including their GDP (in millions of USD) and happiness index scores. It consists of 10 rows and 3 columns:

Importing PandasAI with OpenAI

In the next step, we’ll import the pandasai library that we installed earlier and then import the LLM (Large Language Model) feature. As of May 2023, pandasai only supports the OpenAI model, which we’ll be utilizing understand the data.

To use the OpenAI API, you must generate your own unique API key. If you haven’t done so already, you can easily create an account on the platform’s official website at platform.openai.com. Once you’ve created your account, you’ll receive an instant $5 credit that can be used to explore and experiment with the API.

Initializing PandasAI and asking Question

Afterwards, we’ll provide our OpenAI model to Pandas AI and ask various questions.

When using pandas_ai.run, two parameters are necessary: the dataframe you’re working with and the question you’re seeking an answer to, it returns the top 5 happiest countries based on the supplied dataframe.

Asking Complex Questions

Let’s check whether it can draw the plots for us?

Yes it does plot the graph, based on the question I asked.

Let’s perform a complex task, removing NAN values from the below dataset:

This is the output we get:

But when I print the df variable again, it does remove those NAN values from the dataset, removing that row entirely

The pandasai library offers an extensive range of possibilities, and you can explore them all by visiting their official repository page, which I’ve shared earlier.

It’s important to note that working with pandasai involves OpenAI pricing, and you can find the most up-to-date pricing information on their website. As of May 2023, the pricing is approximately 1000 tokens per $0.0200 (for the GPT-3.5-Turbo Model). When posing a question, it’s crucial to remember that the entire dataframe is passed along with the question every time, so it may not be an ideal solution for handling large datasets.

If you have any query feel free to ask me!

--

--