Easier development of LLM applications with motleycache

Published in

MotleyCrew.ai

3 min readJul 10, 2024

As any experienced developer knows, for any reasonably mature system the time spent writing new code is tiny compared to the time spent debugging it, writing tests for it, and making sure these tests pass — and especially in a duck-typed language like Python, debugging often means literally stepping through the code with a debugger and inspecting the local state.

Not an easy task at the best of times, this is made harder if calls to large language models (LLMs) are a key part of your application. Not only are these calls not guaranteed to return the same output given the same inputs — making your bugs harder to reproduce — but even when they do, each call costs money to make, and takes a while to evaluate, thus breaking the developer’s concentration as they wait.

A similar problem arises when writing tests — often, the functionality to be tested depends on the LLM’s output in complex ways that make using a simple mock impractical — but do you really want to make actual LLM calls in your testing suite, which can quickly grow to hundreds of tests, with all the costs, non-determinism, and waiting time that brings?

If the above problems are familiar to you, we have a solution!

We also faced all of them when developing MotleyCrew: as we aim to support all the major agent frameworks, and take to the next level the orchestration of the agents and their interaction with knowledge graphs, sound testing is essential, and the convenience of debugging is a crucial time-saver.

We made life easy for ourselves (and for you) in two ways: firstly, a thorough integration into MotleyCrew of lunary for observability; and secondly, a new package, motleycache, for disk-based caching of http calls. Lunary integration deserves a separate story, here let us just mention that it plays well with motleycache, so that you can see in the unary traces which replies to LLM calls actually came from the LLM, and which came from the cache.

The cache itself is designed to be as easy to use as possible: you just call motleycache.enable_cache() to start caching all the http calls your application makes. If you want to be more selective, you can specify either a whitelist (a list of URL patterns to be cached) or a blacklist (a set of URL patterns NOT to be cached), call enable_cache().

Note that the cache is kept on disk, rather than in-memory, so you can add it to your testing suite, thus enabling your unit tests to make successful advanced LLM calls, as long as these are contained in the cache. There is even an option set_strong_cache to raise exceptions when your application tries to make a call that’s not in the cache, for more robust testing.

While on-disk caching of http calls may not sound too sexy at first glance, we’ve found it a major time saver when debugging LLM-intensive applications — and absolutely indispensable when writing test suites for them. Give motleycache a try, and we’re sure you’ll have the same experience.

Easier development of LLM applications with motleycache

Written by MotleyCrew