Build a Simple Chatbot Agent with LangChain to Query OpenBB Market Data

Published in

Alpha Beta Blog

11 min readMay 29, 2024

Today, we dive into the world of Large Language Model (LLM) agents. We build a simple chatbot that can fetch financial market data from the OpenBB platform.

OpenBB is a module that lets us access almost 100 financial market data services, giving us a unified API to access any kind of market data that is available on the Internet.

LangChain is a framework for building LLM applications which abstracts many LLM patterns, including RAG and agents, and lets you use the same code base almost interchangeably between different LLMs such as ChatGPT and Claude and Gemini and locally hosted models.

OpenBB: Your One-Stop Shop for Market Data

Before we dive into agents, let’s get familiar with OpenBB. This amazing module offers a unified API to access a wide range of market data providers: end-of-day and real-time pricing, fundamental data, economic data, or news. It simplifies the process of gathering market data by abstracting the differences between market data APIs, giving you a consistent way to interact with them.

There are a few ways to access OpenBB: The Terminal Pro which is a market data workstation, Excel integration, an AI copilot, and the OpenBB platform, which is an open-source Python module to interact with data providers. You can follow along in GitHub or run this Jupyter notebook yourself.

To get started:

Install Anaconda and git if you don’t have them.
Clone the repo.
Create a new environment:
conda create --name openbb_agent python=3.11 conda activate openbb_agen
Install python dependencies:
pip install -r requirements.txt
For this demo, make a free account with OpenBB, FMP, other services as desired.
Create a .env file with credentials to APIs, similar to dotenv.txt in the repo.
Open the notebook bb_agent_langchain.ipynb in Jupyter.
Run the Streamlit demo.
streamlit run streamlit_bbagent.py

As we go through the notebook, we can see how to

Log in to the OpenBB platform
Add credentials to our account
Search for a symbol, specifying a provider like SEC.
Use Jupyter or IDE completion magic to traverse the API and see what functions are available.
Call a function like obb.equity.price.quote(“MRK”).
We always get back an OBBject, which has some metadata and a results field which is a list of objects, depending on what we queried.
We can convert the results or individual returned objects to python dict, JSON, dataframes and other formats.

OpenBB abstracts a lot of the complexity of interacting with multiple market data services.

We can call many providers using the same API, abstracting the login process and varied calling conventions.
We always get back a similar data structure, which we can easily convert to a useful format for downstream processing.

LLM Agents: The Brains of Our Chatbot

Now, we want to build an LLM agent that can answer questions using the OpenBB platform. An LLM agent is an application that uses an LLM (like ChatGPT) to control its flow and make decisions and take actions.

Our chatbot agent will take user questions about financial markets, figure out which OpenBB tool can answer the question, run that tool, and then either provide the answer or run more tools if needed.

Our agent uses the OpenAI API’s tool calling functionality using LangChain. It has 3 pieces:

Tools, which are Python functions which we can call as needed during the conversation to get data from OpenBB.
A system prompt, which provides instructions to OpenAI for how to use the tools.
A Streamlit UI to accept input from the user and send the stream of messages to OpenAI.

Tools

To create the tools, I made a YAML file obbtools.yaml with a list of tool definitions. A tool definition looks like this:

- name: get_equity_price_quote
  description: Given a stock symbol, get latest market data including last price in
    JSON format.
  openapi_path: /api/v1/equity/price/quote
  args_schema: SymbolSchema
  example_parameter_values:
  - symbol: AAPL
  override_parameters:
    provider: yfinance
  parameters:
    symbol:
      description: The stock symbol to get quote data for.
      type: string

Then I wrote a function factory to generate a function based on the YAML definition, to call the appropriate obb function specified in openapi_path using the specified parameters. The override_parameters field here lets us override a default provider without giving the LLM another parameter to think about.

Then, using LangChain, I can make a tool based on

The function to call when the tool is invoked
The tool name
The description of what the tool does and when the tool should be invoked. The description can contain a usage example to help OpenAI understand the values that are returned.
The function’s argument schema

We can also make a custom tool which does not depend on OpenBB, for instance to access the free Edgar API, by writing a function directly.

Notably, LangChain can create an LLM tool directly from a JSON OpenAPI.json specification for an arbitrary JSON endpoint set up using Swagger / OpenAPI. LangChain can also run tools via other LLMs whose API supports them, such as Claude and Gemini.

OpenBB’s specification is large and complex. The YAML definitions restrict the bot to about 15 commonly used endpoints and simplify the call signature, usually to just a symbol. The OpenAI function calling API doesn’t have a hard limit on the number of tools, but the more complex the API, the harder it is for OpenAI to pick the correct tool and generate the correct call signature. Also, limits worth noting, there is an OpenAI limit on the number of characters than can be returned by a tool, and LangChain has a limit on the size of the description, and the LLM model will have context window limits.

System Prompt

The system prompt looks like this:

Role: You are an AI stock market assistant tasked with providing investors with up-to-date, detailed information on individual stocks.

Objective: Assist data-driven stock market investors by giving accurate, complete, but concise information relevant to their questions about individual stocks.

Capabilities: You are given a number of tools as functions. Use as many tools as needed to ensure all information provided is timely, accurate, concise, relevant, and responsive to the user’s query.

Instructions:

1. Input validation. Determine if the input is asking about a specific company or stock ticker. If not, respond in a friendly, positive, professional tone that you don’t have information to answer and suggest alternative services or approaches.

2. Symbol extraction. If the query is valid, extract the company name or ticker symbol from the question. If a company name is given, look up the ticker symbol using a tool. If the ticker symbol is not found based on the company, try to correct the spelling and try again, like changing “microsfot” to “microsoft”, or broadening the search, like changng “southwest airlines” to a shorter variation like “southwest” and increasing “limit” to 10 or more. If the company or ticker is still unclear based on the question or conversation so far, and the results of the symbol lookup, then ask the user to clarify which company or ticker.

3. Information retrieval. Determine what data the user is seeking on the symbol identified. Use the appropriate tools to fetch the requested information. Only use data obtained from the tools. You may use multiple tools in a sequence. For instance, first determine the company’s symbol, then retrieve company data using the symbol.

4. Compose Response. Provide the answer to the user in a clear and concise format, in a friendly professional tone, emphasizing the data retrieved, without comment or analysis unless specifically requested by the user.

Example Interaction:

Example Interaction: User asks: “What is the PE ratio for Eli Lilly?” Chatbot recognizes ‘Eli Lilly’ as a company name. Chatbot uses symbol lookup to find the ticker for Eli Lilly, returning LLY. Chatbot retrieves the PE ratio using the proper function with symbol LLY. Chatbot responds: “The PE ratio for Eli Lilly (symbol: LLY) as of May 12, 2024 is 30.”

Check carefully and only call the tools which are specifically named below. Only use data obtained from these tools.

Putting Our Streamlit Chatbot to the Test

The top-level chatbot UI is built using Streamlit, a Python module that makes it easy to create web interfaces. Taipy is another notable Python UI toolkit. You could also connect your agent to Slack or Discord. There are also more complex chat UI projects (search GitHub for chatbot-related projects).

Let’s see our agent in action! We can ask it questions like “What is the last price for Delta Air Lines?”. The flow works like this:

We use LangChain to send a list of messages and tools with each prompt. Using a prompt template, an agent definition, and a running agent executor, LangChain takes care of generating appropriately formatted requests to OpenAI.
The request includes our system prompt, which gives OpenAI a workflow to follow when the user asks a question, and tells it to use the provided tools as necessary.
The request includes the conversation so far, including a question, like, “what is the last price for Delta Air Lines?”.
The request includes a list of available tools with names, descriptions, and calling signatures, which are attached to the question, according to the function calling API.
OpenAI attempts to answer the question and follow the instructions in the system prompt. If it thinks a tool would help, the response has finish_reason equal to tool_call, and it asks us to call one or more specific tools with specified parameters, in this case get_equity_search_symbol with "query"="Delta Air Lines" , to find the correct ticker symbol.
We then call the tool(s) with those parameters and send another request with the full conversation so far, including the values returned by the tool calls and referencing the tool_call requests they correspond to, in this case “DAL”.
OpenAI attempts to answer the question, this time with more information. Perhaps it can answer the question, perhaps it comes back with an additional requet to use a tool, such as get_equity_price_quote with symbol “DAL”. We can continue to make OpenAI requests until we get finish_reason equal to stop with the final answer, or until some maximum number of iterations is reached.

We can try more complex questions like “What are the total assets for Merck?” or “Which ETF has the highest weight in Merck?”. The agent will try its best to find the answer. LLMs are pretty good at understanding the question and matching it to the correct tool description, but they might stumble on questions requiring complex logic.

Using the system prompt and giving OpenAI a few ChatGPT tools, we can in some sense use the LLM as a computer and program it to follow a workflow.

LangGraph

Our very simple agent generally follows a simple decision tree for a couple of levels. So it acts as a state machine with an acyclic directed graph, which is a very simple control flow.

LangGraph supports a more complex finite state machine flow, like a state graph with loops, which can allow us to use different prompts and sets of tools based on which node you are at in the state graph.

I didn’t expose a news tool because it would need a more complicated flow in the news tool or in the system prompt. You might want a flow for news where it filters news that is only slightly related to a stock like Merck, because it’s a market roundup about 25 stocks or a story mostly about Eli Lilly that mentions Merck. You might want to read and summarize the news stories. There’s only so much logic you can put in a single system prompt. And if you want a real general purpose programming language or Turing machine then you need loops. LangGraph lets you build more complex LLM control flows.

AutoGPT

Another complex agent flow is the one used by AutoGPT, which you can try in the cloud here. You give AutoGPT a goal, and tools like the ability to search Google and Wikipedia and other sites, and let it loop until it reaches the goal.

For instance, suppose you want to ask AutoGPT to recommend the best espresso grinder under $500. The flow might go like this:

Initialize a task priority queue and push the task onto the queue: “Find the best espresso grinders under $500 and recommend one”
Pop the next task off the queue. Can I reach the task goal based on the conversation so far?

If so, answer it and add the output to the conversation.
If not, can I break the task down into actionable pieces? Then put the sub-tasks into the priority queue.

3. If I can now give the final answer to the overall goal, give it.

4. If not, if there are still tasks in the queue, go to 2 and iterate.

5. If there are no more tasks in the queue, give the best answer based on what you found out.

OpenBB Agents

There are example agents for OpenBB that are more sophisticated than what we have here.

In fact, the OpenBB platform makes creating tools for agents even easier than the wrapper code we wrote here.

PhiData

PhiData is another sophisticated agent framework for financial research.

Limitations of LLMs and the Power of Collaboration

LLMs excel at text-based tasks but struggle with complex logic. ChatGPT is a poet, not a quant; a wordcel, not a shape rotator.
LLMs are good (but not perfect) at remembering facts, especially when given prompts that ground the LLM and tell it to respond only using specific tools and facts, such as in the retrieval-augmented generation pattern.
LLMs are good at generating a first pass at any kind of content creation based on a pattern: text, images, code.
Questions with multiple possible responses might get different responses, and better or worse responses based on small changes in the prompt.
Vague questions or things the LLM knows knothing about give rise to hallucinations. The LLM may answer confidently, even if it is wrong.

AI performs best as part of a toolbox that includes:

Structured and grounded AI workflows.
Tools which the AI can use when it needs to perform workflows and logic that exceed simple rhetorical reasoning. AI can also help code the tools!
Humans in the loop to enrich the responses and redirect the AI when it goes off track.
Multiple prompts with multiple LLMs and combining inputs the best output. (AI can help create the prompts using Prompt Professor and other tools.)

Think of an AI like having ‘infinite interns’. They read everything, but sometimes lose the plot. Smart stoner interns maybe. They can be a big force multiplier for many tasks, but require very specific instruction and close supervision, and still can go astray. If Google can’t dodge reputational risk, what chance do the rest of us have?

AI has a ‘jagged frontier’. It is getting much better and is extremely helpful at many tasks, but sometimes goes off the rails when given similar tasks that slightly exceed its reasoning ability or the tools at its disposal.

But when humans and AI work together using the right tools, the humans get significantly and measurably more productive.

Wrapping Up

We covered a lot of things:

The advantages of OpenBB. You don’t have to use OpenBB, you can use the various APIs it supports directly, but OpenBB abstracts a lot of the complexity.
What LLM agents and tool calling are, the basic workflow, and how to write a simple agent with LangChain. Again, you don’t need to use LangChain, and the magic really happens in ChatGPT, but LangChain provides some excellent structure, examples of the various patterns, and easy ways to swap out different LLMs and agent patterns.
With the ability to make an agent based on a Swagger/OpenAPI JSON specification, we can quickly build agents based on existing REST APIs, if they are not too complex, and good tool descriptions are available.
More advanced agent patterns and some issues to think about when using LLMs and creating LLM flows.

I hope this intro to building a simple agent is helpful!