Getting Started: LangSmith for JavaScript LLM Apps

8 min readMay 13, 2024

Unlocking the Power of Large Language Models with LangSmith: Optimize Your AI Development from Start to Finish

Imagine you’re building a chatbot on top of a large language model (LLM). As you integrate complex features like sentiment analysis, RAG retrieval, or contextual understanding, you encounter unexpected errors. Despite resolving these issues, new challenges emerge with small changes to your app or updates to the LLM. Working with LLMs can feel like plugging into a magical black box. Sometimes the results are amazing, and other times not so much. It’s hard to know why they’re behaving the way they are, and how it’s impacting your app.

LangSmith — a comprehensive DevOps platform that optimizes the entire lifecycle of an LLM application, from development to monitoring — solves exactly this problem. This integration not only simplifies development but also boosts confidence among developers, enabling them to ensure quality, manage costs, and reduce latency, making LLM application development more accessible and efficient.

In this article, I’ll discuss what LangSmith is, why you want to use it, how to use it, and run through a quick demo of the run trace feature.

Why might you want to use LangSmith?

Comprehensive Development Tools: LangSmith offers tailored development tools for LLM applications, featuring detailed call sequence visibility, real-time debugging, and performance optimization capabilities.
Advanced Testing and Evaluation: The platform provides robust testing frameworks and AI-assisted evaluations to ensure the quality of responses, covering relevance, correctness, and sensitivity.
Deployment and Scaling: LangSmith simplifies the deployment and scaling of resource-intensive LLM applications, ensuring they manage increased loads without performance loss.
Real-time Monitoring and Analytics: Comprehensive monitoring tracks cost, latency, and quality, offering real-time analytics to swiftly inform decision-making.
Cost Management: LangSmith helps manage the financial aspects of LLM projects, providing insights to optimize spending and maximize application efficiency.
Collaboration Features: The platform enhances teamwork with shared workspaces, version control, and communication tools, supporting seamless collaboration across locations.

Key Features

LangSmith has many features and continues to add new ones. Here are a few key features to note:

Run Tracing

Tracing is a powerful tool for understanding the behavior of your LLM application. Tracing can help you diagnose issues such as an unexpected result, why the agent is looping, why it’s running slowly, how much it’s costing you, or why customers are receiving suboptimal responses.

Annotated Queues

Annotation queues are a user-friendly way to quickly cycle through and annotate data. You can create workflows that allow humans or LLMs to evaluate results. This data can then be used for testing and improvement of your application.

Datasets & Testing

Datasets can be uploaded or derived from real-world runs. You can then run tests with this data using evaluators to measure the performance and accuracy of your application.

Hub

The Hub allows you to collaborate, test, and share prompts for your application. Think of it like GitHub for LLM prompts. In addition to being a great way to manage your prompts, it’s also a helpful way to see how others are crafting effective prompts.

In this article, we’ll drill into the real-time debugging

Integrating Your Project with LangChain

I have created a simple app using LangChain.js, which will give us a good starting point to set up the integration with LangSmith.

LangChain is not required to use LangSmith, but for the sake of simplicity I will provide instructions on how to set up LangSmith to work with your LangChain app.

Clone repo: git clone git@github.com:kenzic/simple-langsmith-demo.git
Install dependancies yarn
Sign-up for LangSmith Account
Get an API Key
Get OpenAI API Key
Move .env.example to .env and fill in the following values:

LANGCHAIN_PROJECT="langsmith-demo"
LANGCHAIN_TRACING_V2=true
LANGCHAIN_API_KEY=<your-api-key>

# OpenAI API is used to make calls to LLM, but not required for using LangSmith
OPENAI_API_KEY=<your-openai-api-key>

Before diving in, you should spend a minute or two with the code to understand what it’s doing. This is what it’s doing from a high-level:

Invokes the app with the input: “What is the capital of France?”
The conversationalRetrievalChain (RunnableSequence):
>>>> Adds context by calling documentRetrievalChain, which uses a custom retriever to get data to help our application answer the question (RunnableMap)
>>>> Creates a prompt that instructs our LLM to answer the users question using the context from documentRetrievalChain in the tone of Mr. Burns (ChatPromptTemplate)
>>>> Calls our LLM (ChatOpenAI)
>>>> Parses the response and returns it as a string (StrOutputParser)

Hands-On

In this article, we’re going to focus on, arguably, the most powerful feature: the run tracing and debugging capabilities.

Now we’re ready to start testing and debugging our app!

Start by running: yarn start

The script asks the app “What is the capital of France?”

The result we get back is quite unexpected. It seems to think the capital of France is Springfield. That’s odd.

Oh, ex-cu-se me! The capital of France is not Paris, you imbecile. It was changed to Springfield in 2024. And I hope you have learned your lesson. Now, be off with you!

As some of us know, the capital of France is not Springfield but Paris. So why are we getting this result? Also, it seems to be taking a long time for the app to respond with an answer. Can we speed this up?

Let’s go to https://smith.langchain.com/ and click on our project “langsmith-demo”. Once inside the project, we should see a log of runs. We’ll start with the most recent and see if there are any clues as to what’s going on

List of runs in LangSmith “langsmith-demo” project

Looking at the top level of the trace (RunnableSequence) we can see that our app takes 3.33s to run, and that the input and output match what we saw, but the output is clearly incorrect.

Stepping down to the next level (RunnableMap) we can examine the part of our app responsible for retrieving the context or knowledge the LLM will use to guide its answer.

We can quickly identify what’s happening here. Our retriever is providing factually incorrect information. For this demo we can fix it by changing the document content to “The capital of France is Paris”

Note: For this demo, we’re using a custom retriever that returns a hardcoded Document. In your app, you’ll be using something like a vector store to retrieve Documents. So having this level of visibility into the input and output the retriever is generating will help you fine-tune your retriever so it’s providing the most relevant data.

That was easy! But our app is still a little slow. Why? Let’s continue to walk down our trace. The ChatPromptTemplate looks good, and it takes 0.00s to run, so it’s not that.

Next, the ChatOpenAI step is a little slow, but we’re making the request over the Internet, so this isn’t unexpected. You’ll also want to note we can see our call to OpenAI used 139 tokens for a total of $0.0001165. As your app becomes more complex you’ll want to keep an eye on this number to understand the cost.

Finally, we make our way to the StrOutputParser. With a runtime of 0.00s and the correct parsing of the output, everything here seems nominal.

So, where is the bottleneck in our app? Let’s go back to the RunnableMap. We can see this is taking 2.01s. For the trivial nature of the task (returning a single hardcoded Document) this seems high. If we look at our code you’ll notice a function call slowLookupTask This is contrived for the sake of this demo, but in a real-world scenario your retriever can be a source of latency for your application, and it’s often the one you have the most control over.

Finally, let’s run our script again. Huzzah! We are now getting the correct answer to our query and we’ve shaved 2 seconds off!

Next Steps

We’ve only begun to scratch the surface of the power of LangSmith, but I hope this short article helped illuminate some of the possibilities.

With LangSmith’s debugging capabilities under your belt, explore further to unlock its full potential:

Leverage advanced testing and evaluation tools to continuously validate response quality.
Optimize resource usage and spend with cost management insights.
Foster collaboration using shared workspaces, version control, and prompt libraries.
Integrate advanced features like sentiment analysis and contextual understanding.
Stay tuned for new releases and join the LangSmith community to shape its future.

As your LLM applications evolve, LangSmith provides the tools to streamline development, ensure performance, and drive innovation in AI.

Wrapping Up

Congratulations on completing this introduction to LangSmith and taking your first steps toward mastering the development of large language model applications. In this guide, we’ve outlined how to set up and start using LangSmith alongside LangChain to streamline the development and maintenance of your LLM projects.

Here’s a quick recap of what we covered:

Setting up your project to integrate with LangChain and LangSmith.
Utilizing real-time debugging and run tracing to understand and optimize your application.

There is so much more you can do with LangSmith, which I will cover in later articles, but if you’re not using LangSmith today I hope this has convinced you it’s worth adding to your tech stack.

To stay connected and share your journey, feel free to reach out through the following channels:

👨‍💼 LinkedIn: Join me for more insights into LLM development and tech innovations.
💻 GitHub: Explore my projects and contribute to ongoing work.
📚 Medium: Follow my articles for more in-depth discussions on LangSmith, LangChain, and other AI technologies.

Your feedback and collaboration are invaluable. Happy building and I look forward to seeing the incredible applications you will create with LangSmith!