Analyze Structured Data (extracted from Unstructured Text) using LLM Agents

Using LangChain’s CSV Agent

Ingrid Stevens
3 min readJan 4, 2024

This is Part 2 of my “Understanding Unstructured Data” series. Part 1 focused on extracting structured data from unstructured text.

Understanding Unstructured Data | Part 2: Analysis

Confectionary Intelligence Gathering & Data Analysis

Use case: Extracting Unstructured Competitive Intelligence Data with LLMs

Imagine you are a bakery, and you’ve sent out your confectioner intelligence team to gather competitor data. They report back on what the competition is up to, and they have lots of great ideas that you’d like to apply to your business. However, the data is unstructured! How can you analyze this data to understand what’s being asked for the most and best prioritize the next steps for your business? In Part 1 we used `PydanticOutputParser` to analyze our data and add the desired structure. In Part 2, we will create a LangChain agent to do this.

What is an Agent in LangChain?

In LangChain, agents serve as systems that leverage a language model to select a sequence of actions to take. Unlike chains, where actions are hardcoded in code, agents utilize a language model as a “reasoning engine”, determining which actions to take and in what order to take them.

Caution is advised when using this agent, as it internally invokes the Pandas DataFrame agent, which, in turn, calls the Python agent responsible for executing Python code generated by the LLM. This potential chain of execution could pose risks, especially if the Python code generated by the LLM is malicious.

Note: Code Available on Github:

Note: the code to create the dataset is unstructured_pydantic.ipynb and the analysis with LangChain agent is in unstructured_analyze_agent.ipynb

Now, it’s time to analyze our structured data with a CSV agent in LangChain:

Step 1: Create the Agent

Let’s set the code up. We start by loading the necessary libraries:

from langchain.agents.agent_types import AgentType
from langchain_community.llms import OpenAI
from langchain_experimental.agents.agent_toolkits import create_csv_agent

Now, we create the agent

agent = create_csv_agent(
OpenAI(temperature=0),
"data/intel.csv",
verbose=True,
agent_type=AgentType.ZERO_SHOT_REACT_DESCRIPTION,
)

Great work! Now we can test our agent with some questions:

Step 2: Ask Questions

When you ask the LangChain agent questions, you will see it think through its actions.

Asking our CSV Agent Questions

Ask about General Insights

agent.run("What insights can I get from this data?")

‘This dataframe contains information about different companies and their products/services, as well as additional details and potential opportunities for improvement.’

Ask About Competitor Advantages to Focus on

agent.run("What are 3 specific areas of focus that you can obtain through analyzing the advantages offered by the competition?")

‘Three specific areas of focus that can be obtained through analyzing the advantages offered by the competition are: streamlining production processes, incorporating unique and distinctive flavors, and using sustainable and high-quality ingredients.

Ask about Key Competitor Themes

agent.run("What are some key themes that the competitors represented in the data are focusing on providing? Be specific with examples, and talk about the advantages of these")

‘The key themes that the competitors are focusing on providing are efficiency, unique flavors, and high-quality ingredients. For example, Coco candy co is using the 77Tyrbo Choco machine to coat their candy gummies, which streamlines the process and saves time. Cinnamon Bliss Bakery adds a secret touch of cinnamon in their chocolate brownies with the CinnaMagic ingredient, which adds a distinctive flavor. Choco Haven factory uses organic and locally sourced ingredients, including the EcoCocoa brand, to elevate the quality of their chocolates.’

--

--