Stories by Gabriel Cassimiro on Medium

The LangGraph Guide to the Galaxy — part 1

Gabriel Cassimiro — Wed, 06 Nov 2024 14:13:57 GMT

The LangGraph Guide to the Galaxy — part 1

In this guide, I will explain the core concepts of LangGraph and use a real project with a complex agent to illustrate the concepts. My idea here is to go deep in the concepts so feel free to skip to certain parts that interest you more. Below is the table of contents:

Introduction

Why LangGraph?
What is LangGraph?

2. Key Concepts

Understanding Nodes, Edges and State
Cyclic Graphs in LangGraph
Tools integration

1. Introduction

So let’s start from the beginning.

Why do you need LangGraph?

In the current landscape of 2024, companies, startups and individuals are rushing to create solutions using the mystical power of GenAI. However, very few are able to produce production ready systems, with enough quality and maintainability.

A couple of frameworks were created to help with some challenges that Gen AI present such as LangChain, Haystack and Llama Index (among ton others). These frameworks have provided foundational tools for integrating large language models, managing data retrieval, and orchestrating complex AI workflows.

One phrase from one of the biggest AI personalities, Andrew NG, goes like this:

The future of Generative Artificial Intelligence is Agentic

What does this mean?

Ng’s statement demonstrates the shift towards AI systems that act as autonomous agents — capable of making decisions, performing tasks, and interacting with other systems independently. This goes beyond simple, static models and towards dynamic AI that can reason, plan, and adapt on its own.

So as these solutions get more complex, it creates the need to create and use more advanced frameworks that can support these new capabilities and tasks required by these agentic systems. This is where LangGraph comes in.

What is LangGraph

From the official documentation:

LangGraph is a library for building stateful, multi-actor applications with LLMs, used to create agent and multi-agent workflows. […] it offers these core benefits: cycles, controllability, and persistence.

So in a nutshell this is a framework for creating LLM Agents that are able to use cyclic workflows with a structured architecture.

LangGraph’s flexible API supports diverse control flows — single agent, multi-agent, hierarchical, and sequential — and robustly handles realistic, complex scenarios.

2. Key Concepts

Understanding Nodes, Edges, and State

Nodes and Edges are the main building blocks of LangGraph. In a few words, nodes are the main logic and steps of the system and the edges are the flow of those steps.

Nodes

Starting with the nodes, this component is where we will create our action for each step of our system. Each node will have a function, a model, or a tool that will be executed. In Python, they are simply a Runnable object.

We can design nodes for tasks such as connecting to an API, executing a simple task, or interacting with a Language Model. Some nodes are pre-built by the lib, but you can easily create your own tailored to your need.

Edges

Edges are the connections between the nodes, directing the flow of the system. They can be simply directional or can have some conditional logic to direct to the next step.

Before building our first graph let’s look at one more core building block, the State.

State

The State in LangGraph is the component that enables nodes to use context or data generated by another node. It is basically the saved outputs from the nodes saved so that other nodes can access them.

A State is usually a TypedDict with each attribute being an information we want to persist. So if we want to keep a Chat History we would create an attribute like “messages” to keep a list of the interactions between the Human and the LLM.

In the execution of the graph, the State will be passed to each node and can be modified in that step. So the input of the node is the State and the returned object from each node becomes the new State to be passed to the next step.

So let’s build a simple version to illustrate these concepts:

In this graph, we will call the LLM with a question, count the words in the response, and if the response is bigger than 10 words upper case everything, and smaller lower case everything.

Image by Author

https://medium.com/media/9bb45c95238adc8e023726987809ac66/href

This graph demonstrates nodes with calls to LLMs and with simple functions, all of them interacting with the state. We also have simple and conditional edges, with a router implemented to direct the flow.

Ok, so let’s dive into the pieces.

First, we defined all the components: the State class, the nodes as functions, and one function for our conditional edge.

Then, we went into defining the graph. Here we use a StateGraph as the graph object and use the add_node function to name each node and add the function related to that node. After adding the nodes, we define the edges between them. The add_edge function is used to create simple directional edges and the add_conditional_edge for the conditional ones. Pretty self-explanatory, right?

The conditional edges use a function to route the flow of the application. In the example, it is the function “router” that simply returns a key if the number of words in the response is more or less than 10. This key is then used to route to the next node.

Finally, we compile the graph using the .compile method, transforming the graph into an executable workflow.

To call the graph we use .invoke() and pass a dict with the keys and values of the state attribute we want to fill.

The response from the graph is the full state. So to access the last message we have to get the correct key containing the list of messages and get the last appended message. The outputs from the print statements at the end of the code are:

dict_keys(['messages', 'number_of_words'])
ARTIFICIAL INTELLIGENCE: MACHINES MIMICKING HUMAN COGNITIVE FUNCTIONS.
7

This is still a simple implementation, and the main benefits of LangGraph are not yet in full display. For that let’s look into Cyclic Workflows and Tools.

Cyclic Graphs in LangGraph

The cyclic graph is one of the key features that sets LangGraph apart from other linear frameworks. This feature allows the nodes to not only process the data and pass it forward but to also loop back to previous nodes or even themselves.

So a cyclic graph is a graph that has edges that can loop back to earlier nodes. One of the most straightforward uses for this is allowing Agents to call tools to get the output and generate the final answer. Other use cases are Multi-Agent Systems, Planning and Reflection Agents. All these architectures will be covered in the second part of this article.

Tools integration

Let’s talk now about tools.

A couple of models have the capacity to call external tools with features like Gemini’s Function Calling. This makes it easier to interact with external APIs and gives the LLM much more capabilities.

In LangGraph Tool Nodes encapsulate the logic of calling the APIs and managing the parsings required. This is a pre-built node but if you want you can create your own.

In the next part, we will take a look at a full project implementing everything into a scalable solution.

The LangGraph Guide to the Galaxy — part 1 was originally published in Geek Culture on Medium, where people are continuing the conversation by highlighting and responding to this story.

AI Sommelier built with PaLM API and LangChain

Gabriel Cassimiro — Tue, 26 Sep 2023 12:03:54 GMT

AI Sommelier Built with PaLM API and LangChain

Project walkthrough to build an LLM-powered application using PaLM API, Pinecone as the vector database, and Streamlit as the interface.

Image by Hermes Rivera at Unsplash

In this article, I aim to demonstrate how to use the PaLM API to build solutions using LangChain. I will build a solution using LangChain chains, Google Embeddings, Pinecone as the VectorDB, and Streamlit as the interface to interact with users.

The problem we want to tackle is personalized wine recommendations. I want to build an AI capable of recommending wines based on user preference and also using a database containing 130K wines.

The full code can be found here.

The data

The data used in this project is the dataset wine reviews, license. This dataset contains 130K reviews of wines. It also contains the country, region, variety, and winery of each wine. This is a sample of the data:

Image by Author

The Problem

So, to tackle our problem we want to first understand the taste of the user, then search the database for the wines that may be a good recommendation, and decide which one is the final recommendation.

To get the user input we will use a simple questionnaire inside of Streamlit with the following questions:

Preferred Taste Profile;
Level of experience;
Red or White wine preference;
Favorite Flavours;
Pairing intent;
Open field to add any information about your taste.

All of these (except the last) have pre-selected categories. However, since we are using LLMs this isn't strictly necessary. We could leave all questions with free text input because the LLM is able to use it without the categories. However, to guide the user I chose to use categories.

The Architecture

The Architecture of the solution is shown below:

Image by Author

For the solution, we will have two calls to the LLM API. In the first one, we will pass the taste form information and ask for the LLM to generate a string query that summarizes the taste and will be used for similarity search on the vector DB. After that, we can find the most similar descriptions of the wines to the person's taste.

However, this is not enough for a final decision, because some characteristics like red vs. white are not taken into account by this simple similarity search. That is why we call the LLM again passing the original taste form and the top 3 most similar wines to the LLM and ask it to select the best one for it and explain the reasoning.

The Code

Let's start by looking at the chains used for the calls to the API.

First, if you do not know already what is LangChain, this is a brief definition:

LangChain is a framework for developing applications powered by language models.

That is the official definition of LangChain. This framework was created recently and is already used as the industry standard for building tools powered by LLMs.

Chains are a core feature of LangChain and enable the integration of various components to form a unified application. They can format user input using a PromptTemplate and then forward it to an LLM. By linking multiple chains or combining them with other elements, we can create more complex chains.

This is the code for both of the chains:

https://medium.com/media/96180daa25efe93859ecbad9f13ca02b/href

Here we have a couple of elements I would like to point out. First, we have the Response Schema and Output Parser. These are used to create a prompt in which the LLM will always return the output with the same structure and the parser will process that output to be able to work back in the code.

Then we have the prompt built using the Chat Prompt Template. This allows us to pass variables into the prompt like the answers of the taste form. Lastly, we put everything back together with the Sequential Chain, defining the inputs and outputs.

Now we have to initialize the DB.

The Database

For the Vector Database, we are using Pinecone however you can easily change the one you want to use because we are using LangChain to interact with the DB.

To add the wines to the Database we need to first create an Embedding of each one and then upload them. We can easily perform this with the code below:

https://medium.com/media/cfabb083f45f1ec6ad3ba7da48cdc5b4/href

We create an instance of Document from LangChain containing the text and the Metadata. Then we use LangChain and Google PaLM Embeddings to upload to Pinecone. The embeddings have 768 dimensions.

For this implementation I am using the free tier from Pinecone, which allows for 1 free index of the standard resource.

Image by Author

Now we just need to put everything together passing the input from the application to the chains and connect to the vector database.

The App

The App will be the main control of the flow. It is also where we will create all the resources we need: the form, the PaLM API Connection, the Pinecone connection and the final visualization.

https://medium.com/media/8e8f031d5ad2a6bfa7510852744e03ba/href

This code can be divided in three main parts:

The form to get the user input;
The interaction of the chains and the vector database;
The display of the final recommendation.

These parts are executed inside the main script but for a larger application should be modularized.

This is an example of using the tool:

Image by Author

So this is the final product. All we have to do now is deploy this application. With this in mind, the first thing we need is to allow the user to choose which LLM to use and have an input for the API key.

The final version can be seen here.

Conclusion

Using LangChain allows us to build quick and in a modular manner. For the problem of recommending wines based on a database we were able to use Chains with two prompts that helped to understand the users taste, find some options to suggest and make the final decision with an explanation.

An easy way to interact with the user is through an interface such as Streamlit, making it fast to develop and deploy.

Thanks for reading.

If you like the content and want to support me, you can buy me a coffee:

Gabriel Cassimiro is a Data Scientist sharing free content to the community

Here are a few other articles you might be interested in:

LLM Output Parsing Function Calling vs. LangChain

Gabriel Cassimiro — Thu, 21 Sep 2023 19:13:18 GMT

LLM Output Parsing: Function Calling vs. LangChain

How to consistently parse outputs from LLMs using Open AI API and LangChain function calling: evaluating the methods’ advantages and disadvantages

Image by Victor Barrios at Unsplash

Creating tools with LLMs requires multiple components, such as vector databases, chains, agents, document splitters, and many other new tools.

However, one of the most crucial components is the LLM output parsing. If you cannot receive structured responses from your LLM, you will have a hard time working with the generations. This becomes even more evident when we want a single call to the LLM to output more than one piece of information.

Let’s illustrate the problem with a hypothetical scenario:

We want the LLM to output from a single call the ingredients and the steps to make a certain recipe. But we want to have both of these items separately to use in two different parts of our system.

import openai

recipe = 'Fish and chips'
query = f"""What is the recipe for {recipe}? 
Return the ingredients list and steps separately."""

response = openai.ChatCompletion.create(
    model="gpt-3.5-turbo-0613",
    messages=[{"role": "user", "content": query}])

response_message = response["choices"][0]["message"]
print(response_message['content'])

This returns the following:

Ingredients for fish and chips:
- 1 pound white fish fillets (such as cod or haddock)
- 1 cup all-purpose flour
- 1 teaspoon baking powder
- 1 teaspoon salt
- 1/2 teaspoon black pepper
- 1 cup cold beer
- Vegetable oil, for frying
- 4 large russet potatoes
- Salt, to taste

Steps to make fish and chips:

1. Preheat the oven to 200°C (400°F).
2. Peel the potatoes and cut them into thick, uniform strips. Rinse the potato strips in cold water to remove excess starch. Pat them dry using a clean kitchen towel.
3. In a large pot or deep fryer, heat vegetable oil to 175°C (350°F). Ensure there is enough oil to completely submerge the potatoes and fish.
4. In a mixing bowl, combine the flour, baking powder, salt, and black pepper. Whisk in the cold beer gradually until a smooth batter forms. Set the batter aside.
5. Take the dried potato strips and fry them in batches for about 5-6 minutes or until golden brown. Remove the fries using a slotted spoon and place them on a paper towel-lined dish to drain excess oil. Keep them warm in the preheated oven.
6. Dip each fish fillet into the prepared batter, ensuring it is well coated. Let any excess batter drip off before carefully placing the fillet into the hot oil.
7. Fry the fish fillets for 4-5 minutes on each side or until they turn golden brown and become crispy. Remove them from the oil using a slotted spoon and place them on a paper towel-lined dish to drain excess oil.
8. Season the fish and chips with salt while they are still hot.
9. Serve the fish and chips hot with tartar sauce, malt vinegar, or ketchup as desired.

Enjoy your homemade fish and chips!

This is a huge string and parsing it would be hard because the LLM can return slightly different structures breaking whatever code you write. You could argue that asking in the prompt to always return “Ingredients:” and “Steps:” could resolve and you are not wrong. This could work, however you would still need to process the string manually and be open to eventual variations and hallucinations.

Solution

There are a couple of ways we could solve this problem. One was mentioned above, but there are a couple of tested ways that might be better. In this article, I will show two options:

Open AI Function calling;
LangChain Output Parser.

Open AI Function calling

This is a method that I have been trying and is giving the most consistent results. We use the Function Calling capability of the Open AI API so that the model returns the response as a structured JSON.

This functionality has the objective of providing the LLM the ability to call an external function by providing the inputs as a JSON. The models were fine-tuned to understand when they need to use a given function. An example of this is a function for current weather. If you ask GPT for the current weather, it won’t be able to tell you, but you can provide a function that does this and pass it to GPT so it will know that it can be accessed given some input.

If you want to dive deeper into this functionality here is the announcement from Open AI and here is a great article.

So let’s look in the code at what this would look like given our problem at hand. Let’s break down the code:

functions = [
    {
        "name": "return_recipe",
        "description": "Return the recipe asked",
        "parameters": {
            "type": "object",
            "properties": {
                "ingredients": {
                    "type": "string",
                    "description": "The ingredients list."
                },
                "steps": {
                    "type": "string",
                    "description": "The recipe steps."
                },
            },
            },
            "required": ["ingredients","steps"],
        }
]

The first thing we need to do is declare the functions that will be available to the LLM. We have to give it a name and a description so that the model understands when it should use the function. Here we tell it the this function is used to return the recipe asked.

Then we go into the parameters. First, we say that it is of type object and the properties it can use are ingredients and steps. Both of these also have a description and a type to guide the LLM on the output. Finally, we specify which of those properties are required to call the function (this means we could have optional fields that the LLM would judge if it wanted to use them).

Let’s use that now in a call to the LLM:

import openai

recipe = 'Fish and chips'
query = f"What is the recipe for {recipe}? Return the ingredients list and steps separately."


response = openai.ChatCompletion.create(
    model="gpt-3.5-turbo-0613",
    messages=[{"role": "user", "content": query}],
    functions=functions,
    function_call={'name':'return_recipe'}
)
response_message = response["choices"][0]["message"]

print(response_message)
print(response_message['function_call']['arguments'])

Here we start by creating our query to the API by formatting a base prompt with what could be a variable input (recipe). Then, we declare our API call using “gpt-3.5-turbo-0613”, we pass our query in the messages argument, and now we pass our functions.

There are two arguments regarding our functions. The first one we pass the list of objects in the format shown above with the functions the model has access to. And the second argument “function_call” we specify how the model should use those functions. There are three options:

“Auto” -> the model decides between user response or function calling;
“none” -> the model does not call the function and returns the user response;
{“name”: “my_function_name”} -> specifying a function name forces the model to use it.

You can find the official documentation here.

In our case and for using as output parsing we used the latter:

function_call={'name':'return_recipe'}

So now we can look at our responses. The response we get (after this filter [“choices”][0][“message”]) is:

{
  "role": "assistant",
  "content": null,
  "function_call": {
    "name": "return_recipe",
    "arguments": "{\n  \"ingredients\": \"For the fish:\\n- 1 lb white fish fillets\\n- 1 cup all-purpose flour\\n- 1 tsp baking powder\\n- 1 tsp salt\\n- 1/2 tsp black pepper\\n- 1 cup cold water\\n- Vegetable oil, for frying\\nFor the chips:\\n- 4 large potatoes\\n- Vegetable oil, for frying\\n- Salt, to taste\",\n  \"steps\": \"1. Start by preparing the fish. In a shallow dish, combine the flour, baking powder, salt, and black pepper.\\n2. Gradually whisk in the cold water until the batter is smooth.\\n3. Heat vegetable oil in a large frying pan or deep fryer.\\n4. Dip the fish fillets into the batter, coating them evenly.\\n5. Gently place the coated fillets into the hot oil and fry for 4-5 minutes on each side, or until golden brown and crispy.\\n6. Remove the fried fish from the oil and place them on a paper towel-lined plate to drain any excess oil.\\n7. For the chips, peel the potatoes and cut them into thick chips.\\n8. Heat vegetable oil in a deep fryer or large pan.\\n9. Fry the chips in batches until golden and crisp.\\n10. Remove the chips from the oil and place them on a paper towel-lined plate to drain any excess oil.\\n11. Season the chips with salt.\\n12. Serve the fish and chips together, and enjoy!\"\n}"
  }
}

If we parse it further into the “function_call” we can see our intended structured response:

{
  "ingredients": "For the fish:\n- 1 lb white fish fillets\n- 1 cup all-purpose flour\n- 1 tsp baking powder\n- 1 tsp salt\n- 1/2 tsp black pepper\n- 1 cup cold water\n- Vegetable oil, for frying\nFor the chips:\n- 4 large potatoes\n- Vegetable oil, for frying\n- Salt, to taste",
  "steps": "1. Start by preparing the fish. In a shallow dish, combine the flour, baking powder, salt, and black pepper.\n2. Gradually whisk in the cold water until the batter is smooth.\n3. Heat vegetable oil in a large frying pan or deep fryer.\n4. Dip the fish fillets into the batter, coating them evenly.\n5. Gently place the coated fillets into the hot oil and fry for 4-5 minutes on each side, or until golden brown and crispy.\n6. Remove the fried fish from the oil and place them on a paper towel-lined plate to drain any excess oil.\n7. For the chips, peel the potatoes and cut them into thick chips.\n8. Heat vegetable oil in a deep fryer or large pan.\n9. Fry the chips in batches until golden and crisp.\n10. Remove the chips from the oil and place them on a paper towel-lined plate to drain any excess oil.\n11. Season the chips with salt.\n12. Serve the fish and chips together, and enjoy!"
}

Conclusion for function calling

It is possible to use the feature of function calling straight from the Open AI API. This allows us to have a dictionary format response with the same keys every time the LLM is called.

To use it is pretty straightforward, you just have to declare the functions object specifying the name, description, and properties focused on your task but specifying (in the description) that this should be the response of the model. Also, when calling the API we can force the model to use our function, making it even more consistent.

The main downside of this method is that it is not supported by all LLM models and APIs. So if we wanted to use Google PaLM API we would have to use another method.

LangChain Output Parsers

One alternative we have that is model-agnostic is using LangChain.

First, what is LangChain?

LangChain is a framework for developing applications powered by language models.

That is the official definition of LangChain. This framework was created recently and is already used as the industry standard for building tools powered by LLMs.

It has a functionality that is great for our use case called “Output Parsers”. In this module, there are multiple objects that can be created to return and parse different types of formats from LLM calls. It achieves this, by first declaring what the format is and passing it in the prompt to the LLM. Then it uses the object created previously to parse the response.

Let’s break down the code:

from langchain.prompts import ChatPromptTemplate
from langchain.output_parsers import ResponseSchema, StructuredOutputParser
from langchain.llms import GooglePalm, OpenAI


ingredients = ResponseSchema(
        name="ingredients",
        description="The ingredients from recipe, as a unique string.",
    )
steps = ResponseSchema(
        name="steps",
        description="The steps to prepare the recipe, as a unique string.",
    )

output_parser = StructuredOutputParser.from_response_schemas(
    [ingredients, steps]
)

response_format = output_parser.get_format_instructions()
print(response_format)

prompt = ChatPromptTemplate.from_template("What is the recipe for {recipe}? Return the ingredients list and steps separately. \n {format_instructions}")

The first thing we do here is create our Response Schema that will be the input for our parser. We create one for the ingredients and one for the steps, each containing a name that will be the key of the dictionary and a description that will guide the LLM on the response.

Then we create our StructuredOutputParser from those response schemas. There are multiple ways to do this, with different styles of parsers. Look here to learn more about them.

Lastly, we get our format instructions and define our prompt that will have the recipe name and the format instructions as inputs. The format instructions are these:

"""
The output should be a markdown code snippet formatted in the following schema, including the leading and trailing "```json" and "```":

```json
{
 "ingredients": string  // The ingredients from recipe, as a unique string.
 "steps": string  // The steps to prepare the recipe, as a unique string.
}  
"""

Now what we have left is just calling the API. Here I will demonstrate both the Open AI API and with Google PaLM API.

llm_openai = OpenAI()
llm_palm = GooglePalm()

recipe = 'Fish and chips'

formated_prompt = prompt.format(**{"recipe":recipe, "format_instructions":output_parser.get_format_instructions()})

response_palm = llm_palm(formated_prompt)
response_openai = llm_openai(formated_prompt)

print("PaLM:")
print(response_palm)
print(output_parser.parse(response_palm))

print("Open AI:")
print(response_openai)
print(output_parser.parse(response_openai))

As you can see it is really easy to change between models. The whole structure defined before can be used in the exact same way for any models supported by LangChain. We used also the same parser for both models.

This generated the following output:

# PaLM:
{
'ingredients': '''- 1 cup all-purpose flour\n
- 1 teaspoon baking powder\n
- 1/2 teaspoon salt\n
- 1/2 cup cold water\n
- 1 egg\n
- 1 pound white fish fillets, such as cod or haddock\n
- Vegetable oil for frying\n- 1 cup tartar sauce\n
- 1/2 cup malt vinegar\n- Lemon wedges''',
'steps': '''1. In a large bowl, whisk together the flour, baking powder, and salt.\n
2. In a separate bowl, whisk together the egg and water.\n
3. Dip the fish fillets into the egg mixture, then coat them in the flour mixture.\n
4. Heat the oil in a deep fryer or large skillet to 375 degrees F (190 degrees C).\n
5. Fry the fish fillets for 3-5 minutes per side, or until golden brown and cooked through.\n
6. Drain the fish fillets on paper towels.\n
7. Serve the fish fillets immediately with tartar sauce, malt vinegar, and lemon wedges.
'''
}

# Open AI
{
'ingredients': '1 ½ pounds cod fillet, cut into 4 pieces,
2 cups all-purpose flour,
2 teaspoons baking powder,
1 teaspoon salt,
1 teaspoon freshly ground black pepper,
½ teaspoon garlic powder,
1 cup beer (or water),
vegetable oil, for frying,
Tartar sauce, for serving',
'steps': '1. Preheat the oven to 400°F (200°C) and line a baking sheet with parchment paper.
2. In a medium bowl, mix together the flour, baking powder, salt, pepper and garlic powder.
3. Pour in the beer and whisk until a thick batter forms.
4. Dip the cod in the batter, coating it on all sides.
5. Heat about 2 inches (5 cm) of oil in a large pot or skillet over medium-high heat.
6. Fry the cod for 3 to 4 minutes per side, or until golden brown.
7. Transfer the cod to the prepared baking sheet and bake for 5 to 7 minutes.
8. Serve warm with tartar sauce.'
}

Conclusion: LangChain Output parsing

This method is really good as well and has as its main characteristic flexibility. We create a couple of structures such as Response Schema, Output Parser, and Prompt Templates that can be pieced together easily and used with different models. Another good advantage of this is the support for multiple output formats.

The main disadvantage comes from passing the format instructions via the prompt. This allows for random errors and hallucinations. One real example was from this specific case where I had to specify “ as a unique string” in the description of the response schema. If I did not specify this, the model was returning a list of strings with the steps and instructions and this caused an error of parsing in the Output Parser.

Conclusion

There are multiple ways of using an output parser for your LLM-powered application. However, your choice may change depending on the problem at hand. For myself, I like to follow this idea:

I always use an output parser, even if I have only one output from the LLM. This allows me to control and specify my outputs. If I am working with Open AI, Function Calling is my choice because it has the most control and will avoid random errors in a production application. However, if I am using a different LLM or need a different output format, my choice is LangChain, but with a lot of testing on the outputs, in order to craft the prompt with the least mistakes.

Thanks for reading.

The full code can be found here.

If you like the content and want to support me, you can buy me a coffee:

Gabriel Cassimiro is a Data Scientist sharing free content to the community

Here are a few other articles you might be interested in:

LLM Output Parsing Function Calling vs. LangChain was originally published in TDS Archive on Medium, where people are continuing the conversation by highlighting and responding to this story.

Async calls for Chains with Langchain

Gabriel Cassimiro — Mon, 10 Jul 2023 15:34:52 GMT

Async for LangChain and LLMs

How to make LangChain chains work with Async calls to LLMs, speeding up the time it takes to run a sequential long chain

Image byhp koch on Unsplash

In this article, I will cover how to use asynchronous calls to LLMs for long workflows using LangChain. We will go through an example with the full code and compare Sequential execution with the Async calls.

Here is the overview of the content. If you’d like you can jump to the section of your interest:

Basics: What is LangChain
How to run a Synchronous chain with LangChain
How to run a single Asynchronous chain with LangChain
Real-world tips for long workflows with Async Chains.

So let’s start!

Basics: What is Langchain

LangChain is a framework for developing applications powered by language models. That is the official definition of LangChain. This framework was created recently and is already used as the industry standard for building tools powered by LLMs.

It is open-source and well-maintained, with new features being released in a very fast time frame.

The official documentation can be found here and the GitHub repository here.

One downside that we have in this library is that since the features are new we cannot use Chat GPT to help effectively to build new code. So this means that we have to work in the “Ancient” way of reading documentation, forums, and tutorials.

The documentation for LangChain.is really good however there are not a lot of examples of some specific things.

I ran into this problem with Async for long chains.

Here are the main resources I used to learn more about the framework:

Deep Learning AI course: LangChain Chat with your data;
Official Documentation;
Youtube channel.

(ps. They are all free)

How to run a Synchronous chain with LangChain

So let me set up the problem I had: I have a data frame with a lot of rows and for each of those rows I need to run multiple prompts (chains) to an LLM and return the result to my data frame.

When you have multiple rows, let’s say 10K, running 3 prompts for each and each response (if the server is not overloaded) taking about 3–5 seconds you end up waiting for days for the workflow to be completed.

Bellow I am going to show the main steps and code to build a synchronous chain and time it on a subset of data.

For this example, I am going to use the dataset Wine Reviews, license. The goal here is to extract some information from the written reviews.

I want to extract a Summary of the review, the main sentiment, and the top 5 characteristics of each wine.

For that, I created two chains, one for the summary and sentiment and another that takes the summary as input to extract the characteristics.

Here is the code to run it:

https://medium.com/media/9898eb355870bc136b8d78374ff507ec/href

Run time (10 examples):

Summary Chain (Sequential) executed in 22.59 seconds.
Characteristics Chain (Sequential) executed in 22.85 seconds.

If you want to understand more about the components I am using I really recommend watching the Deep Learning AI Course.

The main takeaways from this code are the building blocks for a chain, how to run it in a sequential way, and the time it took to finish this loop. It is important to remember that it was about 45 seconds for 10 examples and the full dataset contains 130K rows. So the Async implementation is the New Hope to run this in a reasonable time.

So with the problem set up and the baseline established, let's see how we can optimize this code to run much faster.

How to run a single Asynchronous chain with LangChain

So for this, we are going to use a resource called Asynchronous calls. To explain this, first I will explain briefly what the code is doing and where the time is taking too long.

In our example, we go through each row of the data frame, extract some information from the rows, add them to our prompt, and call the GPT API to get a response. After the response, we just parse it and add it back to the data frame.

Image by Author

The main bottleneck here is when we call the GPT API because our computer has to wait idly for the response from that API (about 3 seconds). The rest of the steps are fast and can still be optimized but that is not the focus of this article.

So instead of waiting Idly for the response, what if we sent all the calls to the API at the same time? This way we would only have to wait for a single response and then process them. This is called Asynchronous calls to the API.

Image by Author

This way we do the pre-process and post-process sequentially but the calls to the API do not have to wait for the last response to come back before sending the next one.

So here is the code for the Async chains:

https://medium.com/media/bba9cc4c59a1dcf4f1741b90bdc2278c/href

In this code, we use the Python syntax of async and await. LangChain also gives us the code to run the chain async, with the arun() function. So in the beginning we first process each row sequentially (can be optimized) and create multiple “tasks” that will await the response from the API in parallel and then we process the response to the final desired format sequentially (can also be optimized).

Run time (10 examples):

Summary Chain (Async) executed in 3.35 seconds.
Characteristics Chain (Async) executed in 2.49 seconds.

Compared to the sequential:

Summary Chain (Sequential) executed in 22.59 seconds.
Characteristics Chain (Sequential) executed in 22.85 seconds.

We can see almost a 10x improvement in the run time. So for big workloads, I highly recommend using this method. Also my code is full of for loops that can also be optimized further to improve performance.

The full code to this tutorial can be found in this Github Repo.

Real-world tips for long workflows with Async Chains.

When I had to run this, I ran into some limitations and a few roadblocks, that I want to share with you.

Notebooks are not Async Friendly

When running async calls on Jupyter Notebooks you may encounter some issues. However, just ask Chat GPT and it can probably help you out with that. The code I built is to run big workloads in a .py file, so it may need some changes to run in a notebook.

Too many output keys

The First one was that my chain had multiple keys as outputs and at the time the arun() only accepted chains that had one key as the output. So to fix this I had to break my chain into two separate ones.

Not all chains can be async

I had a logic of using a vector database for examples and comparisons in my prompt and that required that the examples were sequentially compared and added to the database. This rendered unfeasible the use of async for this link in the full chain.

Lack of content

For this specific matter, the best content I could find was the official documentation for async and build from there to my use case. So if you run it and find new things out share it with the world!

Conclusion

LangChain is a very powerful tool to create LLM-based applications. I highly recommend learning this framework and doing the courses cited above.

For the specific topic of running chains, for high workloads we saw the potential improvement that Async calls have, so my recommendation is to take the time to understand what the code is doing and have a boilerplate class (such as the one provided in my code) and run it Asynchronously!

For small workloads or applications that require only one call to an API it is not necessary to do it async, but if you have a boilerplate class just add a sync function so you can easily use one or the other.

Thanks for reading.

The full code can be found here.

If you like the content and want to support me, you can buy me a coffee:

Gabriel Cassimiro is a Data Scientist sharing free content to the community

Here are a few other articles you might be interested in:

Async calls for Chains with Langchain was originally published in TDS Archive on Medium, where people are continuing the conversation by highlighting and responding to this story.

A Deep Dive into the DDPG Algorithm for Continuous Control

Gabriel Cassimiro — Fri, 14 Apr 2023 19:01:43 GMT

A Deep Dive into Actor-Critic methods with the DDPG Algorithm

Full project walkthrough with the implementation of the DDPG algorithm for the Continuous Control problem of the Reacher environment

Image by Author

Welcome to a fascinating exploration of reinforcement learning in the context of continuous control! In this article, we will dive into a challenging problem: teaching an intelligent agent to control a double-jointed robotic arm in the Reacher environment, a Unity-based simulation developed using the Unity ML-Agents toolkit. Our goal is to reach target locations with high precision, and to accomplish this, we have employed the state-of-the-art Deep Deterministic Policy Gradient (DDPG) algorithm, specifically designed for continuous state and action spaces.

Robots Sharing Experience (Source)

Join me on this journey as we discuss the environment, the algorithm, the neural network architecture, and the training process that led the agent to achieve the average score of 30 in about 50 episodes, maintaining that performance for over 150 episodes. I will also share insights into future work and potential improvements that could improve this agent’s performance. Let’s dive in!

This article provides a comprehensive project walkthrough and complete code, but you can also access the code in the following GitHub repository:

GitHub - gabrielcassimiro17/rl-robot-movement: This repository contains the implementation of a Deep Deterministic Policy Gradient (DDPG) algorithm applied to solve the Reacher environment in Unity ML-Agents.

Real-world applications

The Reacher environment might be an artificial simulation, but its underlying problem of learning to control a robotic arm to reach target locations has significant real-world implications, particularly in the field of robotics. Robotic arms play a critical role in manufacturing, production facilities, space exploration, and search and rescue operations. In these contexts, the ability to control robotic arms with high precision and dexterity is really important. By employing reinforcement learning techniques it is possible to enable these robotic systems to learn and adapt their behaviour in real time, leading to improved performance and flexibility. As a result, advancements in reinforcement learning not only contribute to our understanding of artificial intelligence but also have the potential to revolutionize industries and make a meaningful impact on society.

Training robotic arm to reach target locations in the real world. (Source)

Environment

The Reacher environment is a captivating and complex simulation, offering an excellent opportunity to showcase the power of reinforcement learning techniques in continuous control tasks. In this section, we will dive deeper into the environment’s characteristics and the problem our intelligent agent needs to solve.

A Glimpse into the Reacher Environment

Built using the Unity ML-Agents toolkit, the Reacher environment is a visually engaging simulation that requires our agent to control a double-jointed robotic arm. The objective is to guide the arm toward a target location and maintain its position within the target area for as long as possible. The environment features 20 simultaneous agents, each operating independently, which facilitates an efficient collection of experiences during training.

Image by Author

State and Action Spaces

Understanding the state and action spaces is crucial for designing an effective reinforcement learning algorithm. In the Reacher environment, the state space consists of 33 continuous variables that provide information about the robotic arm, such as its position, rotation, velocity, and angular velocities. The action space is also continuous, with four variables corresponding to the torque applied to the two joints of the robotic arm. Each action variable is a real number ranging between -1 and 1.

Task Type and Success Criterion

The Reacher task is considered episodic, with each episode consisting of a fixed number of time steps. The agent’s goal is to maximize its total reward throughout these steps. A reward of +0.1 is granted for each step the arm’s end effector remains in the target location. The environment is considered solved when the agent achieves an average score of 30 or more over 100 consecutive episodes.

In the next sections, we will explore the DDPG algorithm, its implementation, and how it effectively tackles the continuous control problem in this environment.

Harnessing the Power of DDPG: Algorithm Choice for Continuous Control

When it comes to continuous control tasks like the Reacher problem, the choice of algorithm is crucial for achieving optimal performance. In this project, we opted for the Deep Deterministic Policy Gradient (DDPG) algorithm, an actor-critic method specifically designed to handle continuous state and action spaces. Let’s take a closer look at the DDPG algorithm and why it is well-suited for our task.

Deep Deterministic Policy Gradient (DDPG) Explained

The DDPG algorithm combines the strengths of policy-based and value-based methods by incorporating two neural networks: the Actor network, which determines the optimal actions given the current state, and the Critic network, which estimates the state-action value function (Q-function). Both networks have target networks, used to stabilize the learning process by providing a fixed target during updates.

By using the Critic network to estimate the Q-function and the Actor network to determine the optimal actions, the DDPG algorithm efficiently merges the benefits of policy gradient methods and deep Q-networks. This hybrid approach allows the agent to learn effectively and efficiently in continuous control environments.

https://medium.com/media/28cfe4b5da0545e244ecd3f8321ac1d9/href

The implementation also makes use of a Replay Buffer, being a crucial component to improve learning efficiency and stability. A replay buffer is essentially a memory data structure that stores a fixed number of past experiences or transitions, consisting of state, action, reward, next state, and done information. The main advantage of using it is that it enables the agent to break the correlation between consecutive experiences, thereby reducing the impact of harmful temporal correlations.

By sampling random mini-batches of experiences from the buffer, the agent can learn from a diverse set of transitions, which helps to stabilize and generalize the learning process. Moreover, the replay buffer allows the agent to reuse past experiences multiple times, thereby increasing data efficiency and promoting more effective learning from limited interaction with the environment.

Why DDPG for the Reacher Problem?

The DDPG algorithm is an excellent choice for the Reacher problem due to its ability to effectively handle continuous action spaces, a critical aspect of this environment. Furthermore, the algorithm’s design allows for the efficient use of parallel experiences collected by multiple agents, leading to faster learning and better convergence. In our project, the 20 agents operating simultaneously share experiences and learn collectively, ultimately achieving the desired performance in the Reacher task.

In the following sections, we will discuss the neural network architecture, hyperparameter selection, and the training process that enabled our agent to successfully learn and adapt its behavior within the Reacher environment using the DDPG algorithm.

How the DDPG Algorithm Works in the Reacher Environment

To better understand the effectiveness of the algorithm in the environment, let’s take a closer look at the key components and steps involved in the learning process.

Neural Networks Architecture

The DDPG algorithm employs two neural networks, the Actor and the Critic. Both networks consist of two hidden layers, each containing 400 nodes. The hidden layers use the ReLU (Rectified Linear Unit) activation function, while the output layer of the Actor network employs a tanh activation function to produce actions in the range of -1 to 1. The Critic network’s output layer does not have an activation function, as it directly estimates the Q-function.

This is the code implementing the networks:

https://medium.com/media/86018119f8aa32478c5ffe63acb7fe5f/href

Hyperparameters Selection

Carefully chosen hyperparameters are crucial for efficient learning. In this project, we used a buffer size of 200,000 to store experiences for replay, a batch size of 256 for learning updates, an actor learning rate of 5e-4, a critic learning rate of 1e-3, a soft update parameter (tau) of 5e-3, and a discount factor (gamma) of 0.995. Additionally, we incorporated action noise to facilitate exploration, with an initial noise scale of 0.5 and a noise decay rate of 0.998.

Training Process

The training process involves continuous interaction between the Actor and Critic networks, with 20 parallel agents sharing the same networks and learning collectively from the experiences gathered by all agents. This setup speeds up the learning process and enhances efficiency.

The code used for training:

https://medium.com/media/bd94609b08876d334e934ef8741c82b7/href

Here we create an agent based on the DDPG class and make it interact with the environment on a loop.

The key steps in the training process are depicted below:

Initialize the networks: The agents initialize the shared Actor and Critic networks and their respective target networks with random weights. The target networks provide a stable learning target during updates.
Interact with the environment: Each agent, using the shared Actor network, interacts with the environment by choosing actions based on its current state. To encourage exploration, a noise term is added to the actions during the initial stages of training. After taking the action, each agent observes the resulting reward and the next state.
Store experiences: Each agent stores the observed experience (state, action, reward, next_state) in a shared replay buffer. This buffer holds a fixed number of recent experiences, enabling the agents to learn from diverse transitions collected by all agents.
Learn from experiences: Periodically, a batch of experiences is sampled from the shared replay buffer. The shared Critic network is updated using the sampled experiences by minimizing the mean squared error between the predicted and target Q-values. The target Q-values are calculated using the shared Critic target network and the shared Actor target network.
Update the Actor network: The shared Actor network is updated using the policy gradient, computed by taking the gradient of the output of the shared Critic network with respect to the chosen actions. The shared Actor network learns to choose actions that maximize the expected Q-values.
Update target networks: The shared Actor and Critic target networks are softly updated using a mix of the current and target network weights. This ensures a stable learning process.

The DDPG algorithm’s design, combined with the chosen hyperparameters and neural network architecture, allows the agents to learn and adapt their behavior effectively in the continuous control environment, ultimately achieving the desired performance in the Reacher task.

Results and Future Directions

In this project, our agent successfully learned to control the double-jointed robotic arm in the Reacher environment using the DDPG algorithm. Throughout the training process, we monitored the agent’s performance based on the average score across all 20 agents. As the agent explored the environment and gathered experiences, its ability to predict optimal actions for maximizing rewards improved significantly.

Here we can see the trained agents performing the task:

Image by Author

Training Results

After about 50 episodes, the agent demonstrated a remarkable level of proficiency in the task, achieving an average score that surpassed the threshold required to consider the environment solved (30+) and maintained that level of performance for 150 episodes. Although the agent’s performance varied throughout the training process, the general trend showed an upward trajectory, indicating that the learning process was successful.

This plot shows the average score per episode of the 20 agents:

Image by Author

In conclusion, our implementation of the DDPG algorithm, combined with carefully chosen hyperparameters and neural network architecture, effectively solved the Reacher environment. By sharing experiences and learning collectively, the agents were able to adapt their behavior and achieve the desired performance in the task. This project showcases the potential of reinforcement learning algorithms in continuous control problems and opens up exciting possibilities for future research and development.

Ideas for future work

Despite the success in solving the Reacher environment, there is still room for further improvement and optimization. Here are some ideas for future work:

Hyperparameter tuning: The hyperparameters in this project were chosen based on a combination of recommendations from the literature and empirical testing. Further optimization through systematic hyperparameter tuning could lead to even better performance.
Parallel training with more agents: In this project, we used 20 agents to collect experiences simultaneously. Investigating the impact of using more agents on the overall learning process could potentially lead to faster convergence or improved performance.
Batch normalization: To further enhance the learning process, it is worth exploring the implementation of batch normalization in the neural network architecture. By normalizing the input features at each layer during training, batch normalization can help reduce internal covariate shift, accelerate learning, and potentially improve generalization. Incorporating batch normalization into the Actor and Critic networks may lead to more stable and efficient training, allowing the agent to reach even higher levels of performance in the Reacher environment.

References

Lillicrap, T. P., Hunt, J. J., Pritzel, A., Heess, N., Erez, T., Tassa, Y., Silver, D., & Wierstra, D. (2015). Continuous control with deep reinforcement learning. link
Sutton, R. S., & Barto, A. G. (2018). Reinforcement learning: An introduction. MIT press. link
Mnih, V., Kavukcuoglu, K., Silver, D., Rusu, A. A., Veness, J., Bellemare, M. G., … & Hassabis, D. (2015). Human-level control through deep reinforcement learning. Nature, 518(7540), 529–533. link
Udacity Deep Reinforcement Learning Nanodegree. link
Barth-Maron, G., Hoffman, M. W., Budden, D., Dabney, W., Horgan, D., TB, D., & Lillicrap, T. (2018). Distributed Distributional Deterministic Policy Gradients. arXiv preprint arXiv:1804.08617. link

Thanks for Reading!

If you like the content and want to support me, you can buy me a coffee:

Gabriel Cassimiro is a Data Scientist sharing free content to the community

Here are a few other articles you might be interested in:

A Deep Dive into the DDPG Algorithm for Continuous Control was originally published in Geek Culture on Medium, where people are continuing the conversation by highlighting and responding to this story.

Solving Unity Environment with Deep Reinforcement Learning

Gabriel Cassimiro — Mon, 20 Feb 2023 18:19:49 GMT

End to End Project with code of a PyTorch implementation of Deep Reinforcement Learning Agent

Image by Arseny Togulev on Unsplash

Unity is a popular game development engine that allows developers to create games with stunning graphics and immersive gameplay. It is widely used for developing games across various platforms, including mobile, PC, and consoles. However, creating intelligent and challenging game environments is a challenging task for game developers. This is where Deep Reinforcement Learning (DRL) comes into play.

DRL is a subset of machine learning that combines deep learning and reinforcement learning. It is a powerful technique that has been used to solve complex tasks in various domains, including robotics, finance, and gaming. In recent years, DRL has become a popular approach to building intelligent game agents that can learn from experience and adapt to new situations.

In this post, we will explore how DRL can be used to solve Unity game environments. We will go through an implementation of DRL in the Unity environment to collect Bananas. We will also explore some of the challenges associated with using DRL in game development and how these challenges can be overcome.

This was a project for the Deep Reinforcement Learning specialization from Udacity. The full project and code can be found on this Github repo.

Objective

This project has the objective to train an Agent using Deep Q Learning. The agent will be trained to collect yellow bananas while avoiding blue bananas from Unity’s Banana Collector environment.

More information about the Unix environment can be found here. The agent was trained using a Deep Q Learning algorithm and was able to solve the environment in 775 episodes.

Enviroment & Task

The environment consists in a square world with yellow and blue bananas. The agent has the objective to collect as many yellow bananas as possible while avoiding the blue ones. The agent has 4 possible actions: move forward, move backward, turn left and turn right.

The state space has 37 dimensions and contains the agent’s velocity, along with ray-based perception of objects around the agent’s forward direction. A reward of +1 is provided for collecting a yellow banana, and a reward of -1 is provided for collecting a blue banana.

The task is episodic, and in order to solve the environment, the agent must get an average score of +13 over 100 consecutive episodes.

The env looks like this:

Image by author

The Agent

To solve the problem given by the environment it was implemented a Deep Q Learning algorithm. The algorithm is based on the paper Human-level control through deep reinforcement learning by DeepMind.

The algorithm works by using a neural network to approximate the Q-Function. The neural network receives the state as input and outputs the Q-Value for each action. Then it uses the Q-Value to select the best action to be taken by the agent. The algorithm learns by using the Q-Learning algorithm to train the neural network. There are also two problems with a simple implementation of the algorithm: correlated experiences and correlated targets. The algorithm uses two techniques to solve these problems: Experience Replay and Fixed Q-Targets.

Correlated experiences

Correlated experiences refer to a situation where the experiences (or transitions) of an agent are correlated with each other, meaning they are not independent and identically distributed. This can lead to an overestimation of the expected reward of a particular state or action, resulting in poor performance or convergence to suboptimal policies.

To solve this problem it is used a technique called Experience Replay. The technique consists in storing the experiences of the agent in a replay buffer and sampling randomly from it to train the neural network.

Correlated targets

Correlated targets refer to a situation where the target values used to update the policy are not independent of each other, leading to correlation in the learning signal. This can slow down or prevent convergence to the optimal policy.

To solve this problem it is used a technique called Fixed Q-Targets. The technique consists in using two neural networks: the local network and the target network. The local network is used to select the best action to be taken by the agent while the target network is used to calculate the target value for the Q-Learning algorithm. The target network is updated every 4 steps with the weights of the local network.

This is the implementation in Python:

https://medium.com/media/bb8c697678a3ef4f8ad579b0503d2140/href

Neural network architecture

The neural network architecture used in the algorithm is a simple fully connected neural network with 2 hidden layers. The input layer has 37 neurons, the output layer has 4 neurons and the hidden layers have 64 neurons each. The activation function used in the hidden layers is ReLU and the activation function used in the output layer is the identity function.

The optimizer used for this implementation is Adam with a learning rate of 0.0005.

The library used to implement the neural network was PyTorch.

This was the implementation of the neural network:

https://medium.com/media/a3586476d0309506f9937b9664ec2ea9/href

Training Task

To train the agent we used a loop to interact with the environment, collect and learn from the experiences. One of the hyperparameters used in the training task was the number of episodes. This first hyperparameter was tuned manually trying to optimize the training time and the performance of the agent. The number of episodes used in the final implementation was 1200 however the env was solved in 775.

The second hyperparameter used in the training task was the number of steps per episode. This hyperparameter was also tuned manually trying to optimize the training time and the performance of the agent. The bigger the number of steps the more the agent can explore the environment but it increases a lot the training time. The number of steps per episode used in the final implementation was 1000.

Some other hyperparameters used:

Replay buffer size: 1000
Batch size: 32
Update every: 4
Gamma: 0.99
Tau: 1e-3
Learning rate: 0.0005

https://medium.com/media/a221a65a5f5b8c225821646f018b558c/href

With this, we were able to solve the environment in 775 episodes. The plot below shows the progress of the agent in obtaining higher rewards.

Image by Author

Here we can see the rewards increase as the agent improves. The tradeoff between exploration and exploitation is also visible in the plot, where the agent explores more in the first 200 episodes and then starts to exploit the environment and get higher rewards.

The full implementation can be found on this GitHub repo.

While we were able to solve the environment, there are a few improvements that can be applied to speed up the solution.

Future improvements

The algorithm can be improved by using the following techniques:

Dueling DQN — paper
Prioritized Experience Replay — paper

Another possible improvement is to work with pixel data from the environment. These improvements will likely be a topic for a new article, and I intend on going deeper into the core concepts and implementing the NN with TensorFlow.

Thanks for Reading!

Here are a few other articles you might be interested:

Solving Unity Environment with Deep Reinforcement Learning was originally published in TDS Archive on Medium, where people are continuing the conversation by highlighting and responding to this story.

Predicting my Next Workout with Machine Learning- Part 1

Gabriel Cassimiro — Tue, 12 Jul 2022 05:58:35 GMT

Predicting My Next Workout with Machine Learning | Part 1: The Data

End-to-end phyton project looking at exercise data from an Apple Watch, from data collection to model deployment

Photo by Blocks Fletcher on Unsplash

In this series of posts, I will go through all steps of an end-to-end machine learning project. From data extraction and preparation to the deployment of the model using an API and finally to the creation of a front end to actually solve the problem helping with decisions. The main topics of each one are:

Project setup, Data Processing, and Data Exploration
Model Experimenting
Model Deployment
Data App creation

So let’s begin with the first part:

Project setup, data exploration & data processing

The first and most crucial part of any project is clearly defining what problem you are solving. If you don’t have a clear definition of it, you should probably go back and brainstorm why you came up with that idea and if this is really a problem not only for you. There are a lot of methodologies in the product area that I won’t enter in this article to help in this step. For now, we will only focus on defining the current problem.

The Problem

Just like most people, I hope, I have a hard time staying fully motivated to work out every day. A tool that helps me sometimes is having a smartwatch to track my progress and help with features such as seeing friends' workouts and competitions. However, they are not enough and I still have ups and downs in my motivation. That’s why I, as a Data Scientist, want to study my past workouts to figure out what the main variables that have motivated me in the past are, and to predict my probability of reaching my exercise goals in the future.

Defining it in one sentence we have:

My problem is maintaining motivation to work out for a long period of time

Now that we have defined what problem we want to solve, we can start setting up our project and our solution.

Project Setup

In a regular Data Science project, there are a couple of initial steps that we must follow.

Git repository setup
Infrastructure provisioning
Environment setup

Being able to create versions of your project and your code is very important in any software project. So our first step will be creating a GitHub Repository in order to store and create versions of your code. Another interesting feature of GitHub is the capability of sharing and contributing to other people's code.

I will not go step by step on how to create a repository; just type “How to create a Github Repository” and you’ll be good to go. For this project, I am using this repository.

The second part is provisioning infrastructure on the cloud to develop and later deploy your solution. For now, I will not need a cloud infrastructure because the data volume fits nicely in my laptop for this initial analysis. When we dive into creating experiments and tuning our Hyper Parameters I will show you how to do this on Google Cloud Platform, specifically using Vertex AI.

The last part is creating a virtual environment to develop. I like to use pyenv for this job. To install pyenv look here. Lastly, there are a lot of OS that you can use, but I personally prefer using a Unix based such as MacOS or if you have Windows you can install a windows subsystem for Linux. Another part of the environment is keeping track of your libraries via the requirements.txt file. There is an example in the GitHub repository of the project.

The data

Now, to get the data that we need we have to export the data from the Health App on an iPhone. This is really easy to do, so just look here at how to do it.

Now we can (finally) start coding.

The export file comes as a zip containing a folder inside with routes, electrocardiograms, and an XML with all your health data. The code below will unzip the folder, parse the XML and save it as a CSV.

https://medium.com/media/5fcdffddb4031e5a41b9465280245d37/href

This is the first part of our data processing pipeline. If we wish to share this functionality or simply add newer data, having a code structured to process the data is essential. Note that the code is structured as a function. This will give us flexibility and modularity in our pipeline.

Now we have the following data frame ready to be modeled.

Image by Author

Haha, just kidding.

In real life, the data is almost never ready to use like a Kaggle Dataset. In this case, we have problems with data formats, metadata entries are stored inside lists, and dates have to be converted, just to name a few of the things we have to deal with this first.

What was done:

Filter only Exercise minutes data
Transform dates to DateTime format
Transform values to float
Create a date column without time, only with days
Group the value for Exercise minutes for each day

https://medium.com/media/ae91eda68658115f81bbb1fb70eb7bc8/href

Now we have a time series of our Exercise Minutes for each day. I selected Exercise Minutes instead of Burned Calories because this measures the days that I worked out instead of the calories spent. This is what we call a premise of the project. It is very important to keep track of these premises documenting them along with the problem statement.

Ok, we are making some progress now. So now we can begin creating models, right?

We have just a couple more things to do before that. First, we need to check the quality of the data, then we will create some features and do some exploratory plots to generate some insights before the modeling.

Data Quality check

When we talk about data quality we should go as deep as how the data is collected and think of some problems that can happen in the process. Since this data is collected on my Apple Watch, the first thing we should explore is what happens on the days I did not wear my watch?

This boils down to two things we always have to check in our data:

Missing data
Outliers

There is no missing data in terms of NAs. However, there are 167 observations with 0 as the number of exercise minutes, this appears to be the way they register days without the watch. We can clearly see it in this histogram:

Image by Author

Searching for outliers we can see that there is a couple of them, but we will keep them because they are accurate to the reality, not anomalies.

Image by Author

There are a lot of other checks we can do to verify data quality, but for this case, we will not go into them because this data source is standardized and pretty reliable.

Some important information we gathered from the data:

There are 1.737 observations (days);
167 observations have 0 as the exercise minutes of that day;
The dates go from 2017–09–25 to 2022–06–27;
There are no missing dates in this range.

Feature Engineering

Now we can get on some fun stuff. The feature engineering step is where we create hypotheses of what features can be useful to the model. This is an iterative process so we will create some here, validate them later and add or remove features.

Some guesses that I have came from classic time-series features. They are:

Date attributes (day, weekday, month, year, season)
Lag features (how many calories were spent in the last period)
Rolling Window features (moving average, standard deviation, max, min)

In the next part, we will add some other data such as sleep quality.

Here's the code:

https://medium.com/media/38212ca8304e9eee7881f96c5febea23/href

Another important transformation was creating a circular encoding of the month feature. This is a great trick to encode time features that have a cycle. This works by getting the sine and cosine for each month and in the end, we have something like this:

Image by Author

We can see that December and January are much closer to each other instead of being 1 and 12 which are farther apart.

Exploratory Analysis

Now we’ll go into some simple, but powerful analysis. This step can and often should be performed before the feature engineering, but in this case, I needed some features for the plots.

Remember: This is an iterative process

In a long project, you'll go into these steps in many cycles before arriving at the final solution.

We already looked at the distribution of our data in a previous step, so now we can see how the minutes of exercise vary with some temporal features.

Let’s start with the years:

Image by Author

We can see that 2020 stopped my trend mainly because of lockdowns from the COVID pandemic.

We can compare the distribution of the data by each month.

Image by Author

Here we can clearly see that December is not my best friend. The main causes are easy to identify: end-of-year parties, holidays, Christmas, and I usually go on vacation.

One more nice thing to look at is how my workouts vary across different days of the week. For this analysis, we consider Monday as label 0 and Sunday as label 6.

Image by Author

The median of exercise minutes is not that far from the other days of the week however, it is rarer to have big workouts on the weekend.

There are literally infinite visualizations you can create with your data. For now, we will stop at those above. The important thing here is to understand your data, the distributions, and how it behaves in different aggregations.

You can also create decisions based on those analyses. One example here is looking at the weekend data trend to be lower. A possible decision is to create a rule that I can only drink on the weekends if I work out.

The last thing we’ll do before the modeling is decomposing our time series. Oh, I forgot to mention but what we have here is a time series. Here's the definition:

A Time Series is a series of repeated observations considered within a certain time interval that are taken at equal (regular/evenly spaced) time intervals

We can decompose a time series to understand two very useful things:

Trend
Seasonality

A time series consists of the joining of those two with some residuals. There are a couple of methods to decompose it, here we will use the additive method. Time series is a huge subject, so if you want to go deeper go here.

Image by Author

So that's a lot of information, let’s dig into it.

The first plot is the original time series. The second one is the trend component, the third is the seasonality and the last one is the residuals.

The important information that we get from her is:

We can identify a seasonality that can be related to the weeks, but not so much between months;
The residuals appear to be uniformly distributed.

That helps us with two things. In the feature engineering step, we created features to capture the correct seasonality and if we wanted to apply classic time series models we would have to insert that seasonality in the parameters.

That is it for now.

Key Takeaways

The main takeaways of this part are:

DEFINE THE PROBLEM;
Set your environment up not forgetting to record your packages versions;
Record your project premisses;
Structure your data processing into functions that can be reused further down the road with new data;
Understand and clean your data;
If possible create some descriptive analysis that can already generate decisions.

In the next part:

We will set up an experimenting framework with MLFlow to record and compare our models
We will create multiple models and compare them
We will choose a model and optimize its hyperparameters

Thanks for reading!

If you liked this article you can take a look at some other projects I’ve done here.

Also, consider subscribing to get all the parts of this series when they come out.

Predicting my Next Workout with Machine Learning- Part 1 was originally published in TDS Archive on Medium, where people are continuing the conversation by highlighting and responding to this story.

How to prepare for the GCP Professional Machine Learning Engineer exam

Gabriel Cassimiro — Mon, 10 Jan 2022 16:58:36 GMT

Photo by Thought Catalog on Unsplash

Courses review, study tips, and how I did it

So I decided to take the GCP Professional Machine Learning Engineering (PMLE) test but I had only 2 months to do it in order to attain enough certifications to my company be a GCP partner. I knew this was going to be a hard challenge but I jumped at it anyway.

In this post, I will share what helped me study and prepare for the test and also stuff you should not waste your time with. Also reading feedbacks like this one helps get different perspectives about the exam, so I will start with this great repository that has posts about a lot of GCP and other certifications. I recommend reading it before beginning your studies.

As is usual in this kind of post, this is my certification:

Test Feedback

The certification exam for Professional Machine Learning Engineer is considered one of the hardest GCP certifications because of two main reasons: The content is very extensive and most questions have more than one correct answer but only one best possible answer.

The test covers how to solve real-world business problems using Machine Learning techniques and how to use the best available solutions (offered by GCP obviously) in the correct context.

Knowing what the test covers is the most important part of the study because with this information you can focus on what matters when watching courses. So the first thing you should do is read carefully the official GCP certification site. There you’ll find information on what is covered on the exam, rules, where to take your exam and other important stuff.

Another great starting point is to do these sample questions provided by Google to see without any study how you would perform in the test. From there you can focus on studying and paying more attention to what you don't know.

Previous experience recommendation

The official exam guide doesn't demand any prerequisites however, it recommends:

3+ years of industry experience including 1 or more years designing and managing solutions using Google Cloud.

That is far from my case. By the time I took the exam, I had almost one year of cloud experience (AWS) and less than one month of experience with GCP. So I will give my opinion here about that recommendation:

Years don't dictate how much you know about something, but having a meaningful experience does. In my opinion, if you have some experience with any cloud and understand the basics of the concept and products you're good to go.

Being a machine learning engineer requires you to solve problems using ML models, serving data to that model, and creating the means to generate value with that solution consistently.

In terms of machine learning, you will have to study a lot less if you have experience building models. If you know how to differentiate problems that need classification, recommendation, or regression models and know which cases you need a DNN or a basic Linear Regression, you will be able to focus your studies on the serving data to your model and predictions to users using GCP solutions part.

Wrapping up the previous experience part:

You don't need 3+ years of experience, but having some experience with any cloud provider will save you time studying.
Having experience with machine learning is needed but just enough that you're able to create solutions using ML to business problems.
Hands-on experience using GCP is possible to obtain with some courses provided by Google, and is enough for you to take the test.

How to Study

The main source of knowledge for this exam is a group of courses designed by Google and available on Coursera. However, not all courses have the same relevance regarding the exam content. That is why I will rank them and comment on each one below.

First, there are some techniques that I used for my preparation that are worth mentioning before starting on the courses. If you only care about the courses feel free to skip ahead, but this helped me a lot to absorb more of the relevant stuff.

The main thing you have to have on your head while doing the courses are:

How to use GCP solutions and ML models to solve real business problems

You need to know all GCP's ML and Data solutions, what they do, what are their strengths and weaknesses, and the use cases for each one.

Remember: A lot of problems can be solved in different ways with a good result however, the test will ask you always for the best solution.

So I have two methods that helped me learn these characteristics while watching the courses:

Flashcards

I used flashcards to remember what each solution does, its characteristics, and use cases. Then I tried to study them a couple of times until I could explain all without looking at the answer.

This is a very rich technique because you write in the flashcard a brief explanation, exercising your ability to summarise. Then you try to do them with intervals of days, exercising your long-term memory, and lastly, try to explain it to someone to really see if you learned that concept.

I used and recommend using Anki, a free flashcard app.

Mindmaps

Another great method to organize the main concepts is creating mindmaps. This way you can easily link products and solutions with business problems and advantages.

Particularly I used mind meister, but there are a lot of great solutions for free.

Courses

Finally, we’ll take a look at the courses offered by Google and their content.

Preparing for Google Cloud Machine Learning Engineer Professional Certificate

This is the main course for preparing and is of the utmost importance to watch them with your full attention.

It starts with some basics of cloud in Google Cloud Big Data and Machine Learning Fundamentals that you can skip if you have already worked with data solutions in GCP otherwise, you should do it because it gives a first view of the GCP data solutions. This is also one of the only courses of the bunch that shows data engineering solutions, so if you do not know them, just do it.

The second and third courses show some ML solutions and APIs offered by GCP. It is very important to remember what they do and their use cases.

The fifth, sixth, and seventh courses will dive deeper into ML solutions, Feature Engineering, and modeling products.

The last three courses will cover how to deploy and create effective ML pipelines with all the best practices. In my opinion, these are the most important courses (Production Machine Learning Systems, MLOps Fundamentals, and ML Pipelines on Google Cloud).

All of these courses offer Labs to implement the solutions in a real GCP environment. They are a great way of learning how things work and how to set them up.

Some Labs will have big Jupyter Notebooks with tons of code. In these situations, my tip is to focus on what is the code doing and don't worry about understanding and learning how to code it yourself. If in the future you need to implement the code yourself, just go to the open GitHub repository provided by Google and remember the syntax.

Wrapping up courses:

You should use Flashcards, Mindmaps, or other techniques to remember a lot of details about solutions.
The main focus of the test is MLOps and ML pipelines, however, do not discard data engineering knowledge and Machine Learning model-specific questions.
Do not focus on the code syntax, focus on what it does and its benefits.

Mock Tests

Finally, you HAVE to do mock tests. This is crucial to check your knowledge and to learn how to read the questions.

Answering questions

This last part is what defines if you'll pass or not. The test is huge, with 60 questions and 120 minutes to do them, meaning you have 2 minutes per question. You have to read the questions looking for characteristics of the problem that will help you find the right solution. I will do an example here:

You work for a public transportation company and need to build a model to estimate delay times for multiple transportation routes. Predictions are served directly to users in an app in real-time. Because different seasons and population increases impact the data relevance, you will retrain the model every month. You want to follow Google-recommended best practices. How should you configure the end-to-end architecture of the predictive model?

A. Configure Kubeflow Pipelines to schedule your multi-step workflow from training to deploying your model.
B. Use a model trained and deployed on BigQuery ML, and trigger retraining with the scheduled query feature in BigQuery.
C. Write a Cloud Functions script that launches a training and deploying job on AI Platform that is triggered by Cloud Scheduler.
D. Use Cloud Composer to programmatically schedule a Dataflow job that executes the workflow from training to deploying your model.

In bold are the most relevant parts of the question. You have to pay attention to details like batch ou real-time, retraining, deploying, and the architecture. The last one is one of the most important because often they'll ask for No-code solutions, serverless, or even complete control over the infrastructure. This will define what is the best offering to resolve that specific request.

In this case, Kubeflow is the only answer with the ability to do end-to-end with deploying and retraining capabilities. So the answer is A.

Another good tip is after finding the most relevant information in the question, eliminate answers that are clearly wrong, so you can have fewer options to compare.

Mock tests links

I did a couple of mock tests, but they are not perfect. There are a lot of wrong answers in all of them, but here is the link and comment on each one:

Exam Topics: This was the best mock test I did. It does not have the correct answers given by the website however, all questions have a discussion where people present arguments for each possibility. This was a great source of new knowledge and helped me deeply.

Google Sample questions: Now that you finished studying, you should revisit the first sample questions that you did at the beginning of your studies.

There are other paid preparation exams, but I cannot review them because I only did the free ones. These sites usually offer a couple of free sample questions, but the ones I did had weird answers which I did not agree with. So do them at your own risk.

If I had to do it again I would pay for the rest of the Exam Topics questions and focus only on the discussion when checking the correct answer.

Thanks for reading and good luck on your journey to become a GCP Certified Professional Machine Learning Engineer!

If you want to support my work you can Buy Me a Coffe:

Gabriel Cassimiro is a Data Scientist sharing free content to the community

How to prepare for the GCP Professional Machine Learning Engineer exam was originally published in TDS Archive on Medium, where people are continuing the conversation by highlighting and responding to this story.

Object detection with Tensorflow model and OpenCV

Gabriel Cassimiro — Thu, 15 Jul 2021 20:44:12 GMT

Using a trained model to identify objects on static images and live video

source

In this article, I’m going to demonstrate how to use a trained model to detect objects in images and videos using two of the best libraries for this kind of problem. For the detection, we need a model capable of predicting multiple classes in an image and returning the location of those objects so that we can place boxes on the image.

The Model

We are going to use a model from the Tensorflow Hub library, which has multiple ready to deploy models trained in all kinds of datasets and to solve all kinds of problems. For our use, I filtered models trained for object detection tasks and models in the TFLite format. This format is usually used for IoT applications, for its small size and faster performance than bigger models. I choose this format because I intend to use this model on a Rasberry Pi on future projects.

The chosen model was the EfficientDet-Lite2 Object detection model. It was trained on the COCO17 dataset with 91 different labels and optimized for the TFLite application. This model returns:

The box boundaries of the detection;
The detection scores (probabilities of a given class);
The detection classes;
The number of detections.

Detecting Objects

I’m going to divide this section into two parts: Detections on static images and detection on live webcam video.

Static Images

We will start by detecting objects in this image from Unsplash:

source

So the first thing we have to do is load this image and process it to the expected format for the TensorFlow model.

https://medium.com/media/b5c7cb8c0f935cc4036cce99b77b9084/href

Basically, we used OpenCV to load and do a couple of transformations on the raw image to an RGB tensor in the model format.

Now we can load the model and the labels:

https://medium.com/media/e94bcc8f707bf70a9aa389091a111766/href

The model is being loaded directly from the website however, you can download it to your computer for better performance on the loading. The text labels CSV is available on the project repo.

Now we can create the predictions and put in the image the boxes and labels found:

https://medium.com/media/40b217f00b1287892ccb56ff48d257e0/href

Now if we run plt.imshow(img_boxes) we get the following output:

source with modifications

Live Webcam Video

Now we can move on to detecting objects live using the webcam on your pc.

This part is not as hard as it seems, we just have to insert the code we used for one image in a loop:

https://medium.com/media/a39aa1dc870cc4c37024b58fd50bee3c/href

Then we get:

GIF by Author

We used VideoCapture from open cv to load the video from the computer webcam. Then we did the same processing that we used on the static image and predicted the labels and positions. The main difference is that the image input is continuous so we inserted the code inside a while loop.

All the code and notebooks used are in this repository:

gabrielcassimiro17/raspberry-pi-tensorflow

In the near future, I will load this into a raspberry pi to create some interactions using a model capable of detecting objects, and post the results here.

If you like the content and want to support me, you can buy me a coffee:

Gabriel Cassimiro is a Data Scientist sharing free content to the community

Object detection with Tensorflow model and OpenCV was originally published in TDS Archive on Medium, where people are continuing the conversation by highlighting and responding to this story.

Transfer Learning with VGG16 and Keras

Gabriel Cassimiro — Wed, 16 Jun 2021 21:55:37 GMT

How to use a state-of-the-art trained NN to solve your image classification problem

The main goal of this article is to demonstrate with code and examples how can you use an already trained CNN (convolutional neural network) to solve your specific problem.

Convolutional Networks are great for image problems however, they are computationally expensive if you use a big architecture and don’t have a GPU. For that, we have two solutions:

GPUs

GPUs are much more efficient to train NNs but they are not that common on regular computers. So that is where google colab come to save us. They offer virtual machines with GPUs up to 16 GB of RAM and the best part of it all: It is Free.

But even with those upgraded specs, you can still struggle when training a brand new CNN. That’s where Transfer Learning can help you achieve great results with less expensive computation.

Transfer Learning

So what is transfer learning?

To better explain that we must first understand the basic architecture of a CNN.

Image by Author

A CNN can be divided into two main parts: Feature learning and classification.

Feature Learning

In this part, the main goal of the NN is to find patterns in the pixels of the images that can be useful to identify the targets of the classification. That happens in the convolution layers of the network that specializes in those patterns for the problem at hand.

I’m not going deep into how this works underneath the hood, but if you want to dig deeper I highly recommend this article and this amazing video.

Classification

Now we want to use those patterns to classify our images to their correct label. This part of the network does exactly that job, it uses the inputs from the previous layers to find the best class to your matched patterns in the new image.

Definition

So now we can define Transfer Learning in our context as utilizing the feature learning layers of a trained CNN to classify a different problem than the one it was created for.

In other words, we use the patterns that the NN found to be useful to classify images of a given problem to classify a completely different problem without retraining that part of the network.

Now I am going to demonstrate how you can do that with Keras, and prove that for a lot of cases this gives better results than training a new network.

Transfer Learning With Keras

I will use for this demonstration a famous NN called VGG16. This is its architecture:

Image by Author

This network was trained on the ImageNet dataset, containing more than 14 million high-resolution images belonging to 1000 different labels.

If you want to dig deeper into this specific model you can study this paper.

Dataset

For this demonstration, I will use the tf_flowers dataset. Just as a reminder: The VGG16 network was not trained to classify different kinds of flowers.

This is what the data looks like:

Image by Author

Finally…

The Code

First, we have to load the dataset from TensorFlow:

https://medium.com/media/d1162af5f640b53efb4718309ab44d45/href

Now we can load the VGG16 model.

https://medium.com/media/758810859a173edcc91d1f893573e4d0/href

We use Include_top=False to remove the classification layer that was trained on the ImageNet dataset and set the model as not trainable. Also, we used the preprocess_input function from VGG16 to normalize the input data.

We can run this code to check the model summary.

base_model.summary()

Image by Author

Two main points: the model has over 14 Million trained parameters and ends with a maxpooling layer that belongs to the Feature Learning part of the network.

Now we add the last layers for our specific problem.

https://medium.com/media/eaf49eb09af8195fc3922be9e96152e1/href

And compile and fit the model.

https://medium.com/media/cbb220cf759aaa352bc1d81bcd9aa8c7/href

Evaluating this model on the test set we got a 96% Accuracy!

That’s it!

It is this simple. And it is kind of beautiful right?

How we can find some patterns in the world that can be used to identify completely different things.

If you want to check out the complete code and a jupyter notebook, here’s the GitHubrepo:

gabrielcassimiro17/object-detection

Extra: comparing to hand-made model

To be sure that this approach can be better in both computational resources and precision I created a hand-made simple model for this problem.

This is the code:

https://medium.com/media/1f6bb6572315e6f85c84fe61026cdf52/href

I used the same final layers and fit parameters to be able to compare the impact of the convolutions.

The accuracy of the hand-made model was 83%. Much worse than the 96% that we got from the VGG16 model.

Transfer Learning with VGG16 and Keras was originally published in TDS Archive on Medium, where people are continuing the conversation by highlighting and responding to this story.