WhyLabs Weekly MLOps: Monitoring LLM metrics at scale.

Monitoring meaningful metrics for LLM applications, combine the Power of LLMs with computer vision, WhyLabs recognized by CB Insights GenAI 50, and more!

Sage Elliott
WhyLabs
4 min readAug 11, 2023

--

A lot happens every week in the WhyLabs Robust & Responsible AI (R2AI) community! This weekly update serves as a recap so you don’t miss a thing!

Start learning about MLOps and ML Monitoring:

💡 MLOps tip of the week:

Extract meaningful metrics from large language models to monitor in production at scale.

The open source LangKit library can be used to gain insights about prompts and responses for LLM applications. These insights can be used to understand usage, set guardrails, compare system prompts, monitor performance over time, and more.

Monitoring LLM metrics

Once you install LangKit in your Python environment using `pip` you can extract metrics for any LLM by logging the prompt and response in a dictionary format.

LangKit includes many out of the box LLM metrics for tracking text quality, text relevance, security, privacy, sentiment and toxicity. Custom metrics can easily be added using user defined functions (UDFs).

from langkit import llm_metrics
import whylogs as why

schema = llm_metrics.init()

profile = why.log(prompt_and_response,schema=schema)

These metrics can be visualized and monitored over time using the WhyLabs Observatory platform.

LLM metrics in WhyLabs

Learn more monitoring LLMs in production:

🎥 Event recordings

Combining the Power of LLMs with Computer Vision — Jacob Marks, Voxel51

At this event, we spoke with Jacob Marks about combining the power of Large Language Models (LLMs) with computer vision!

Jacob Marks from Voxel51 talks about combining LLMs with computer vision

📝 Latest blog posts:

WhyLabs Recognized by CB Insights GenAI 50 among the Most Innovative Generative AI Startups

CB Insights named WhyLabs to its first annual GenAI 50 ranking, a list of the world’s top 50 most innovative companies developing generative AI applications and infrastructure across industries. What makes this particularly notable is that Model Observability is immediately recognized as an essential category critical to the success of LLM applications. Read more on WhyLabs.AI

Ensuring AI Success in Healthcare: The Vital Role of ML Monitoring

Artificial intelligence (AI) is revolutionizing healthcare, with significant advancements in disease diagnosis, patient outcome predictions, and overall patient care and safety. According to the Future Health Index 2023 report by Philips, 83% of healthcare leaders plan to invest in AI in the next three years, up from 74% in 2021. Read more on WhyLabs.AI

📅 Upcoming R2AI & WhyLabs Events:

Join this workshop on August 17th — RSVP on Eventbrite

💻 WhyLabs open source updates:

whylogs v1.2.8 has been released!

whylogs is the open standard for data logging & AI telemetry. This week’s update includes:

  • make HTTP_PROXY env var accessible in WhyLabsWriter
  • fix serializing confusion matrix from older whylogs

See full whylogs release notes on Github.

LangKit 0.0.14 has been released!

LangKit is an open-source text metrics toolkit for monitoring language models.

  • remove pkg_resources reference
  • Makes themes groups customizable
  • fix CI tests to run tests on specified OS
  • Make everything dataset UDFs

See full LangKit release notes on Github.

🤝 Stay connected with the WhyLabs Community:

Join the thousands of machine learning engineers and data scientists already using WhyLabs to solve some of the most challenging ML monitoring cases!

Request a demo to learn how ML monitoring can benefit your company.

See you next time! — Sage Elliott

--

--