Talking to your data, LITERALLY !

Published in

Tensor Labs

10 min readMay 24, 2024

Hi there fellow AI enthusiast, It’s been some time since I wrote my last medium article but I really hope you are doing great and riding the wave of Generative AI in recent days. If you’ve come across this article then this means that you (like myself in current few projects) found yourself in need of libraries that can help you describe your data better (or maybe you just wanna learn).

As I’ve always stated that model training and selection is probably one of the easier parts of the problem, the difficult ones being cleaning, formatting the data and doing EDA to understand the data better. That’s where today’s article comes in. We’ve already explored a lot of libraries that can create dashboards with plots on top of data to help understand the data better such as bamboolib or sweetviz but Vizro takes these dashboards to a whole new level. So without further ado, let’s dive into it.

Vizro is a toolkit for creating modular data visualization applications, which allows users to rapidly assemble customized dashboards with minimal coding. It’s designed to simplify the creation of complex, Python-enabled data visualizations by using a few lines of simple configuration. This configuration could be written in multiple formats like Pydantic models, JSON, YAML, or Python dictionaries, which helps in flexibility of implementation.

Disclaimer:

The article is pretty long and if you want the to see the really cool stuff feel free to skip below to Talking to data section.

Getting Started

Getting started with Vizro is easy. All you need is a simple pip command. Open your terminal or Jupyter notebook and type:

pip install vizro

Once the installation is done now let’s get going to confirm if our installation was successful.

import vizro

print(vizro.__version__)

Creating the first Dashboard

Creating the first dashboard in Vizro is as simple as you can imaging, Vizro AI built on dash using Flask immediately creates a dashboard which is very simple to deploy and uses Gunicorn to scale to multiple users.

import vizro.plotly.express as px
from vizro import Vizro
import vizro.models as vm

df = px.data.iris()

page = vm.Page(
    title="My first dashboard",
    components=[
        vm.Graph(id="scatter_chart", figure=px.scatter(df, x="sepal_length", y="petal_width", color="species")),
        vm.Graph(id="hist_chart", figure=px.histogram(df, x="sepal_width", color="species")),
    ],
    controls=[
        vm.Filter(column="species", selector=vm.Dropdown(value=["ALL"])),
    ],
)

dashboard = vm.Dashboard(pages=[page])

Vizro().build(dashboard).run()

and just like that we have our very first dashboard, deployment ready ready to be presented to multiple users.

But wait there’s more !

I mean Guillmore Girls wasn’t that bad

Vizro provides tons of customizations that you can do and enable your dashboard users to perform on the dashboard ranging from adding multiple pages to controlling grid layout to adding sliders. Let’s have a look at the options we have for these customizations.

1. Adding Components to Your Dashboard

Components are the building blocks of a Vizro dashboard. You can add various components to a dashboard page, such as:

Graph: Used to display data visually. Common types include bar charts, line graphs, and box plots.
Card: Ideal for displaying text, images, or markdown content.
Button: Can trigger actions or navigate between pages.

When to Use: Use components to present your data in a structured and meaningful way. For example, use a Graph to show trends and a Card to provide explanations or context.

2. Arranging Components with Layouts

Layouts help in organizing components on a dashboard page. You can customize the placement and size of each component using the Layout object.

Grid Layout: Defines the rows and columns where each component will be placed.

When to Use: Use layouts to make your dashboard more readable and visually appealing. For example, position text at the top and charts side by side to balance the space and avoid a cramped look.

3. Adding Interactivity with Controls

Controls add interactivity, allowing users to filter data or change properties of components. There are two main types:

Filters: Enable users to filter data, such as selecting specific categories or date ranges.
Parameters: Allow users to adjust component properties, such as colors or opacity.

When to Use: Use controls to make your dashboard dynamic and user-friendly. For instance, add a filter to let users view data for specific regions or add a parameter to let them change the color scheme of a chart.

4. Customizing Controls with Selectors

Selectors configure the behavior and appearance of controls. Common selectors include:

Dropdown: Allows users to select from a list of options.
Slider: Lets users adjust a value within a specified range.
Checklist: Enables multi-selection from a list.
RadioItems: Allows single selection from a list.
RangeSlider: Lets users select a range of values.

When to Use: Use selectors to enhance the usability of controls. For example, use a Dropdown for selecting a single option or a Slider to adjust a numerical value like opacity.

5. Creating Navigation

Navigation helps users move between different pages in your dashboard. You can create a homepage with navigation tiles linking to subpages.

When to Use: Use navigation to create a seamless user experience, especially if your dashboard has multiple pages. For example, add navigation tiles on the homepage to direct users to detailed data analysis pages.

Let’s put the above into a code cell and see how that looks like. In your app.py file paste the following code

from vizro import Vizro
import vizro.models as vm
import vizro.plotly.express as px

home_page = vm.Page(
    title="Homepage",
    components=[
        vm.Card(
            text="""
            ![](assets/images/icons/content/collections.svg#icon-top)

            ### First Page

            Exemplary first dashboard page.
            """,
            href="/first-page",
        ),
        vm.Card(
            text="""
            ![](assets/images/icons/content/features.svg#icon-top)

            ### Second Page

            Exemplary second dashboard page.
            """,
            href="/second-page",
        ),
    ],
)

df = px.data.gapminder()
gapminder_data = (
        df.groupby(by=["continent", "year"]).
            agg({"lifeExp": "mean", "pop": "sum", "gdpPercap": "mean"}).reset_index()
    )
first_page = vm.Page(
    title="First Page",
    layout=vm.Layout(grid=[[0, 0], [1, 2], [1, 2], [1, 2]]),
    components=[
        vm.Card(
            text="""
                # First dashboard page
                This pages shows the inclusion of markdown text in a page and how components
                can be structured using Layout.
            """,
        ),
        vm.Graph(
            id="box_cont",
            figure=px.box(gapminder_data, x="continent", y="lifeExp", color="continent",
                            labels={"lifeExp": "Life Expectancy", "continent":"Continent"}),
        ),
        vm.Graph(
            id="line_gdp",
            figure=px.line(gapminder_data, x="year", y="gdpPercap", color="continent",
                            labels={"year": "Year", "continent": "Continent",
                            "gdpPercap":"GDP Per Cap"}),
            ),
    ],
    controls=[
        vm.Filter(column="continent", targets=["box_cont", "line_gdp"]),
    ],
)

iris_data = px.data.iris()
second_page = vm.Page(
    title="Second Page",
    components=[
        vm.Graph(
            id="scatter_iris",
            figure=px.scatter(iris_data, x="sepal_width", y="sepal_length", color="species",
                color_discrete_map={"setosa": "#00b4ff", "versicolor": "#ff9222"},
                labels={"sepal_width": "Sepal Width", "sepal_length": "Sepal Length",
                        "species": "Species"},
            ),
        ),
        vm.Graph(
            id="hist_iris",
            figure=px.histogram(iris_data, x="sepal_width", color="species",
                color_discrete_map={"setosa": "#00b4ff", "versicolor": "#ff9222"},
                labels={"sepal_width": "Sepal Width", "count": "Count",
                        "species": "Species"},
            ),
        ),
    ],
    controls=[
        vm.Parameter(
            targets=["scatter_iris.color_discrete_map.virginica",
                        "hist_iris.color_discrete_map.virginica"],
            selector=vm.Dropdown(
                options=["#ff5267", "#3949ab"], multi=False, value="#3949ab", title="Color Virginica"),
            ),
        vm.Parameter(
            targets=["scatter_iris.opacity"],
            selector=vm.Slider(min=0, max=1, value=0.8, title="Opacity"),
        ),
    ],
)

dashboard = vm.Dashboard(pages=[home_page, first_page, second_page])
Vizro().build(dashboard).run()

and see the magic for yourself.

Talking to Data

Now to the point on which I based the title of my article, ever since ChatGPT came to our lives we’ve been asking GPT to help us with various tasks especially when it comes to coding. More than often we would have thought what if we can give GPT a dataset and ask it queries in plain text and it can give us the plots for that back (atleast I have). Well that’s what Vizro-ai has done and hence we can now talk to our data in literal sense by giving plain text inputs and getting the plots back.

Vizro-AI extends Vizro to enable a user to use English or other languages to effortlessly create interactive charts with Plotly. Vizro-AI simplifies the process of creating charts that offer detailed insights about your data. Even if you’re an experienced data practitioner, Vizro-AI optimizes how you create visually appealing charts.

Vizro-AI uses a large language model and Plotly to generate code for an interactive chart that you can add into a Vizro dashboard application. For starters you can use open-ai key.

Diving into Vizro-AI

The same way we installed vizro we can do a pip-install for vizro-ai

import vizro_ai

print(vizro_ai.__version__)

For a quick demo we can use the following piece of code (your open_ai key should be in Environment as OPENAI_API_KEY=’sk-*****’).

from vizro_ai import VizroAI
import vizro.plotly.express as px

df = px.data.gapminder()

vizro_ai = VizroAI()
vizro_ai.plot(df, "create a line graph for GDP per capita since 1950 for each continent. Mark the x axis as Year, y axis as GDP Per Cap and don't include a title", explain=True)

and viola the above gives you a plot from your data.

Let’s have a look at another example. I mean the above examples are good but what if I want a dashboard not a plot from my written text query.

For demo purposes I have created a simple pipeline where I am hosting multiple dashboards on my system, each for a separate user hosted on FAST api, each dashboards runs on different ports and although the architecture on this is not perfect but it serves to show the power of Vizro-AI. The following code creates multiple dashboards with random ports b/w 7000–8000.

from helperfunctions import *
from modules.logging import logging
import shutil, uuid, json, math
import pandas as pd
import pipeline

app = FastAPI()
sessions = {}
logger = logging.LoggerConfig()
pipeline = pipeline.PIPELINE()

@app.post("/start_dashboard_session/")
async def start_dashboard_session(request: SessionRequest):
    session_id = request.session_id
    if session_id in sessions:
        raise HTTPException(status_code=400, detail="Session already exists")
    port = random.randint(7000, 8000)
    while any(session["port"] == port for session in sessions.values()):
        port = random.randint(7000, 8000)
    process = Process(target=create_dashboard, args=(port,))
    process.start()
    sessions[session_id] = {"process_id": process.pid, "port": port}
    return {"url": f"http://localhost:{port}"}

@app.post("/terminate_dashboard_session/")
async def terminate_dashboard_session(request: SessionRequest):
    session_id = request.session_id
    if session_id not in sessions:
        raise HTTPException(status_code=404, detail="Session not found")
    os.kill(sessions[session_id]["process_id"], 9)
    del sessions[session_id]

    return {"message": "Session terminated successfully"}


@app.post("/start_dashboard_with_data")
def start_dashboard_with_data(payload: DashboardData):

    session_id = payload.session_id
    if session_id in sessions:
        raise HTTPException(status_code=400, detail="Session already exists")

    user_query = payload.user_query
    user_query = user_query + str(SPECIAL_INSTRUCTIONS)
    response_dict = {
        "success": "true",
        "status": "200",
        "message": "Data processed successfully",
        "data": {},
    }
    allowed_conditions = {">", "<", ">=", "<=", "=", "!="}
    DEBUG = False
    try:
        # Parse the JSON strings into Python objects
        polygon_data = payload.polygon
        datasets_data = payload.datasets
        logger.info("JSON data parsed successfully")
    except json.JSONDecodeError as e:
        response_dict.update(
            {
                "success": "false",
                "status": "400",
                "message": f"Invalid JSON format: {str(e)}",
            }
        )
        logger.error("Invalid JSON format", exc_info=True)
        return response_dict

    try:
        response_data = {}
        logger.info("Processing datasets:")
        dashboard_resp = generate_dashboard_link(
            sessions, session_id, property_data, user_query, logger
        )
        sessions[session_id] = {
            "process_id": dashboard_resp["pid"],
            "port": dashboard_resp["port"],
        }
        response_dict["data"] = dashboard_resp["url"]

    except Exception as e:
        response_dict.update(
            {
                "success": "false",
                "status": "500",
                "message": f"Server error occurred, {e}",
            }
        )
        logger.error("An error occurred while processing data", exc_info=True)
        return response_dict

    return response_dict


if __name__ == "__main__":
    uvicorn.run(app, host="0.0.0.0", port=8000)

And this is how my generate_dashboard_link function looks like in pipeline.py

import uvicorn
from multiprocessing import Process
import vizro.plotly.express as px
from vizro import Vizro
import vizro.models as vm
from vizro_ai import VizroAI


def generate_plot_code(df, user_query):
    vizro_ai = VizroAI()
    code_string = vizro_ai._get_chart_code(df, user_query)
    return code_string


def create_dashboard_with_data(data, user_query, port, logger):
    try:
        plot_code = generate_plot_code(data, user_query)
        print(plot_code)
        local_vars = {"df": data}
        exec(plot_code, globals(), local_vars)
        fig = local_vars.get("fig")
        if fig is None:
            raise ValueError("No figure generated from the plot code.")
    except Exception as e:
        logger.error("Failed to execute plot generation code: " + str(e))
        return None, None, None

    try:
        # Start the dashboard server in the current process
        Vizro().build(vm.Dashboard(pages=[vm.Page(title="Dashboard", components=[vm.Graph(id="custom_chart", figure=fig)])])).run(host="localhost", port=port)
    except Exception as e:
        logger.error("Failed to start dashboard server: " + str(e))
        return None, None, None
    
    # Assuming the server runs independently without blocking or exits after setup allowing the process to continue
    return f"http://localhost:{port}", port, os.getpid()

def generate_dashboard_link(sessions, session_id, data, user_query, logger):
    port = random.randint(7000, 8000)
    while any(session["port"] == port for session in sessions.values()):
        port = random.randint(7000, 8000)
    logger.info(f"Port Assigned {port}")

    process = Process(target=create_dashboard_with_data, args=(data, user_query, port, logger))
    process.start()
    url = f"http://localhost:{port}"
    pid = process.pid  # Get the PID of the process running the Flask app

    if process.is_alive():  # Check if the process started successfully
        logger.info(f"Dashboard created successfully at {url}")
        return {"url": url, "port": port, "pid": pid}
    else:
        logger.error("Dashboard creation failed.")
        return {"url": None, "port": None, "pid": None}

And that’s it, with a simple postman call I can now have a dashboard created on the text query I paste. The above code executes the plot code generated by vizro-ai instead of rendering it directly and then adds it to a dashboard.

Now with a simple API call

I now have my dashboard presented to me

And that’s it , we did it folks, we just got hands-on with how we can create amazing dashboards that are highly customizable with just a few lines of code using vizro, not only that we also learned to ‘talk’ to data. If this stuff seems interesting do let me know what you want me to write next in comments.

Final notes

With Vizro, creating dashboards is simplified, and with Vizro-AI, interacting with your data using natural language becomes a reality. Whether you are a seasoned data scientist or a newcomer, these tools offer powerful capabilities to enhance your workflow.

The same way Vizro-AI makes your life easier exploring and navigating through complex data, TensorLabs has embarked itself on a mission to help you navigate through complex problems, products and ideas with ease. Out team prides ourselves on the AI services we deliver and the happy client’s Tensorlabs have gathered along the way.

If you have an idea that you genuinely believe can change things around and think AI can help you with that, don’t think twice before pinging us. Until then Keep exploring, keep innovating, and happy coding!