Fence š¤ŗ A homegrown LLM interaction framework for python
Letās start with a big disclaimer: this post practically radiates this xkcd. Another LLM package? For real?
Yes sir: introducing Fence ā a minimalistic LLM interaction library!
š https://github.com/WouterDurnez/fence š
Let me take a step back and talk about how this came about.
If you want something done rightā¦
Generative AI is still a relatively new technology, especially in production environments. When LLMs first started gaining popularity and proof of concepts (PoCs) were sprouting up everywhere, LangChain quickly became the go-to framework for building LLM-based workflows ā it kind of helped that it was one of the only options available at the time. Today, however, there are plenty of Python frameworks to choose from, with several alternatives emerging over the past year.
LangChain, in turn, seems to have fallen victim to its own success. Partially as a result of the community-driven nature of the beast, the package quickly became bloated and relatively unstable, and developers started complaining about unnecessary layers of abstraction. In my job as an AI engineer, we went through the same steps. We depended on LangChain for a number of features, which ballooned our lambdas with superfluous dependencies until we had to turn them into Docker-based functions. At the same time, all we really depended on LangChain for our basic Chain
flows and the use of the Document
class.
In addition, we started building a bunch of extra code that would allow us to quickly swap out models while retaining all of our custom-made extras (e.g., monitoring our token expenditure). So we ended up with an hourglass structure: we were building a ton of code (wide), which through a minimal set of touch points (narrow) depended on LangChain (very wide). Por que?
ā¦you have to do it yourself
One day, we decided to cut all of that stuff out, and replace it with our own components. Or maybe I gradually snuck it in our codebase. Who knows.
Sounds kinda sketchy though, right? In replacing a third-party framework, we might end up having to maintain a beast of our own making. Worst case, that would be even more cumbersome than before. Smells like hybris. On the other hand, weāll be in full control, and you know what that means: silly log messaā¦ I mean tailor-made solutions to our own needs.
Obviously, looking for an alternative, production-oriented framework (Griptape š) would be a good solution as well. But is that fun? Not nearly as much. Also, I figured building something myself would be a great exercise. Long story short: what started out as replacing a few core components with some classes of our own turned into a miniature package ā Fence.
In the next paragraphs, Iāll give a quick intro on the framework. But before I do that, a final disclaimer: I did not make this package because I expect it to overtake the LLM lib landscape. In fact, Iām fully aware that, to a significant extent, I just reinvented the wheel in a different color š.
Still, these were the main advantages to building Fence as I see them:
- Focus on the ABCs of LLMs (cut down on bloat)
- Keep it lean: trim down the dependencies
- Learn peripheral software development stuff
- Gain some deeper understanding of LLM-based patterns (e.g., Agents!)
- Add silly logging formattiā¦ no not you
So up, up and away I went! Now letās get into the package itself.
From the ground up!
Core components
When working with a large language model, there are a few key components we need.
- First, we must establish a method to invoke the model ā this could involve sending requests to OpenAIās API or querying it locally using tools like Ollama.
- Second, we require a prompt. Anyone who has developed even a basic LLM application knows the importance of parameterizing that prompt. Depending on the model, we can provide a system message along with one or more user and assistant messages. Rather than merely passing along raw user input, we encapsulate it within a template, allowing us to incorporate additional directives, roles, guardrails, and other essential elements.
- Third, it would be very convenient if those things could be chained (š¤) together somehow.
Pretty basic stuff, right? Letās start with what this is all about: the Large Language Models themselves.
Models
Oy vey, so many models, so little time! Maybe first, letās make sure we can call an OpenAI model, since theyāre still the top dawg in the genAI service landscape. But before we do that, letās think ahead.
Consistency! Weāre building an LLM wrapper class. What should this class do, that all other models should implement as well?
- We need a
model_id
attribute, so we can point at the right endpoint/service. - We need some kind of
invoke
method that actually passes input to the model. - (While weāre at it, letās just tie the
invoke
method to the__call__
dunder method, shall we.) - Maybe we need an optional
model_name
as well to have a readable identifier. Why? Because services like AWS Bedrock use identifiers likeanthropic.claude-3-sonnet-20240229-v1:0
, which isnāt very readable.
The class, letās call it LLM
, should look something like this:
from abc import ABC, abstractmethod
from typing import Any
class LLM(ABC):
"""
Base class for LLMs
"""
model_id: str | None = None
model_name: str = None
def __call__(self, prompt: str | Any, **kwargs) -> str:
return self.invoke(prompt, **kwargs)
@abstractmethod
def invoke(self, prompt: str | Messages, **kwargs) -> str:
raise NotImplementedError
Great, now we haveā¦ well we donāt really have anything. But we can build something on top of this. Letās move on to an actual model. Hereās my implementation of a GPT base class, which I use for ā š„ā OpenAIās GPT family (simplified for blog purposes).
class GPTBase(LLM, MessagesMixin):
"""Base class for GPT models"""
model_id = None
model_name = None
inference_type = "openai"
def __init__(
self,
api_key: str | None = None,
**kwargs,
):
"""
Initialize a GPT model
:param str|None api_key: OpenAI API key
:param **kwargs: Additional keyword arguments
"""
super().__init__()
# Find API key
self.api_key = api_key or os.environ.get("OPENAI_API_KEY", None)
# Base URL
self.url = "https://api.openai.com/v1/chat/completions"
... # Some extra stuff
def invoke(self, prompt: str | Messages, **kwargs) -> str:
"""
Call the model with the given prompt
:param prompt: text to feed the model
:return: response
"""
# Call the API
response = self._invoke(prompt=prompt)
# Get response completion
completion = response["choices"][0]["message"]["content"]
return completion
def _invoke(self, prompt: str | Messages) -> dict:
"""
Handle the API request to the service
:param prompt: text to feed the model
:return: response completion
"""
# Format prompt, using OpenAIs messaging format
if isinstance(prompt, Messages):
# Get messages as dicts (using pydantic under the hood)
messages = [
message.model_dump(exclude_none=True) for message in prompt.messages
]
system_message = prompt.system
# Extract system message
if system_message:
messages.insert(0, {"role": "system", "content": system_message})
# If we receive a string, we will format it as a single user
# message for ease of use.
elif isinstance(prompt, str):
messages = [
{
"role": "user",
"content": prompt,
}
]
else:
raise ValueError("Prompt must be a string or a list of messages")
# Build request body
request_body = {
"messages": messages,
"model": self.model_id,
}
# Send request
try:
... # The actual API call.
except Exception as e:
raise ValueError(f"Something went wrong: {e}")
While this is a tiny bit beefier, it leaves out many things, such as the model kwargs, logging of metrics (such as token counts etc. ā see this link if youāre curious). There are also some hints towards other functionality (like the MessagesMixin
), but you get the jist.
The main idea is that all models of this class call the API in a similar manner, since the request body is always similarly structured. By capturing that commonality in a base class, we can speed up things in the long run. For instance, if I now want to call a specific model, say GPT4o, I can do this:
class GPT4o(GPT):
"""
GPT-4o model
"""
def __init__(self, source: str, **kwargs):
"""
Initialize a GPT-4o model
:param **kwargs: Additional keyword arguments
"""
super().__init__(**kwargs)
self.model_id = self.model_name = 'gpt-4o'
EZ. Youāre probably thinking, I could have just used OpenAIās own openai
package for this, right? All it takes is swapping out that same model_id
in the arguments?
Sure you can!
Youāll import a slew of dependencies for what is essentially a simple request
, but thatās not the worst thing in the world. What you donāt get, however, is this:
- Stuff like that extra code I mentioned (again, see here) to handle metrics in a way you can define yourself through a logging callback. Better yet, you only need to define this callback once. Metrics will be logged uniformly for all models. So you can easily swap out models in your flow. Cool, right?
- Did you notice the little
isinstance(prompt, Messages)
bit? Remember when we said LLMs accept multiple types of messages? Well, Fence has some pydantic models to deal with those (I know, reinventing the wheel again š¤·āā). Why do those models exist? Because they provide an interface between prompt templates and the eventual model invocation. See, various models assign roles to their messages (typically system, user, and assistant), yet this distinction is translated into slightly different request formats. Read: Anthropic expects you to pass those messages in a different way than OpenAI. You shouldnāt be bothered by that inconsistency though. By handling the payload creation inside our model classes, those wrinkles are ironed out for you. Just stick to the predefinedMessage
classes, and swap out models as you please.
Getting a little complicated/too in-depth, perhaps, but just remember: the goal is to be able to drop-in replace models at any time in your code, regardless of where theyāre served from!
Letās move on to templating.
Templates
One of the criticisms I heard a lot about LangChain, was that it overcomplicates things. The PromptTemplate
was often used as an example, which many people considered to be a glorified f-string. In other words, itās just a string with some placeholders that you can format with input parameters, so you can pass the result to a model. Who needs a class for that?
True, it was just an f-string (you can go the jinja
route as well, but letās not go there, āt is a silly place). Hereās a spoiler though: Fence has f-strings too! Wrapped in a class and everything! But why? What else could you need? Well, what about:
- Finding all input placeholders, and matching them with input arguments. Did you provide all the required input? Did you provide too many fields? An f-string will just shut you down if you donāt provide all the required arguments, and ignore any superfluous one. Letās be better than that, shall we.
- If youāre using messages with multiple roles (i.e., system, user, etc.), you may have some placeholders spread across them. For instance, maybe you want to change the
system
message dynamically, based on some variable, and you also have some placeholder fields in auser
message. Would be nice if you could just render the template with input arguments in one go, rather than trying to find the right input fields across these messages, right?
Fence has two classes built just for this purpose: StringTemplate
and MessagesTemplate
. The former exists for models that accept nothing but string input, such as the older Anthropic generation (e.g. Claude Instant). On the other hand, if role-based messaging is supported, the latter will be of service. That includes OpenAIās GPT models, the most recent Claude models, and more.
š¤« As mentioned earlier, these Templates rely on a bunch of pydantic, under the hood. Thereās also some additional functionality in there (e.g., automatic base64 encoding of image messages), but thatās out of scope.
Still with me? Letās wrap up with some final core components ā the Link
and Chain
classes!
Links + Chains = š¤ŗ
If you hadnāt guessed it yet, this is where the package name comes from. LangChain has its Chain
class, where you combine a model and prompt template to get an LLM completion. Being the overgrown child I am, I named single LLM invocations a Link
, and multiple links then came a Chain
. Makes sense, right? A chain made of links? š¬
It gets bettā¦ worse. What does āchainlinkā remind you of? A chainlink fence, right? Et voila, we have arrived at the most convoluted pun of all time. To make it even more confusing, I gave it a fencer logo (with fabulous hair). But I digressā¦
Putting stuff together! What should a Link
do? In essence, itās a simple combination of these things:
- An LLM
- A prompt (template)
- A way to run it with input arguments
- Optionally a way to post-process (i.e., parse) the output
Well hey, thatās exactly what it does! Rather than dive into the code, letās consider a hello world example:
# If you haven't installed the package yet,
# run `pip install fence-llm`!
# Imports
from fence.models.openai import GPT4omini
from fence.templates.messages import MessagesTemplate, Messages, Message
from fence.links import Link
# Let's define the model (pass an `api_key` arg
# or make sure one is set in your environment
model = GPT4omini()
# Create a message template
messages = Messages(
system='Respond in a {tone} tone',
messages= [
Message(role="user", content="Why is the sky {color}?"),
# Equivalent to Message(role='user', content=Content(type='text', text='Why is the sky blue?'))
# But Content can also be an image, etc.
]
)
messages_template = MessagesTemplate(
source=messages
)
# Create the Link
link = Link(
template=messages_template,
model=model,
# you can pass additional arguments, such as
# a `parser`, an `output_key`, or `name` for logging purposes
)
response = link.run(input_dict={'tone': 'rude', 'color': 'blue'})
In your output, youāll see something like this (depending on log settings):
> [2024ā10ā02 12:37:38] [ā¹ļø INFO] [links.run:...] Executing unnamed Link
If you print the response, youāll get something like this:
> {'state': "Wow, really? You're asking about the color of the sky? It's because of Rayleigh scattering, alright? Shorter blue wavelengths scatter more than other colors when sunlight hits the atmosphere. So, yeah, that's why it looks blue! Get with the program!"}
As you can see, the response is in a dictionary, under state
key. This is by design. You can have the output stored under a custom output key as well, but the state
key is special. It will allow us to propagate the output of different Link
objects with minimal effort.
Now, you can easily swap the model with any other model. Even models that donāt support the āmessagesā format, such as Claude Instant! If you run change the model to this:
from fence.models.claude import ClaudeInstant
model = ClaudeInstant()
# Note: this is currently the Bedrock instance,
# and requires you to be logged in! Will update
# the models to also use Anthropic as a provider
# eventually...
and run the same code, you get:
> {'state': ' I apologize, but I do not feel comfortable responding in a rude tone.'}
Err.. Yeahā¦ #JustClaudeThings. Well, at least the code worked. How did it work? Because under the hood, the Link
object detects that the āmessagesā format is not supported by the older Claude models, and the template is converted to a StringTemplate
instead. Shazam!
One more thing! And then weāll wrap upā¦
Combining Links
Quite often, weāll want to invoke a model more than once. Could be weāre splitting up a first step into parallelisable components, followed by a synthesis step (a pattern called MapReduce). Or maybe we want to run a verification after answering a question. In any case, it would be nice if we could have an easy tool to combine multiple Link
objects into a single flow.
Fence calls these combination objects a BaseChain
(again, š¤). Currently, there are two types of chains: a LinearChain
and a Chain
. The first simply propagates input through a set of Link
objects, as long as all of them (except the first one) take a state
input key. Then you can build something like this:
# Build some links
link_opposite = Link(
template=StringTemplate(
"What's the opposite of {A}? Reply with a few words max."
),
name = 'opposite',
output_key = 'opposite',
)
link_poem = Link(
template=StringTemplate(
"Write a poem about {state}. Return only the poem, beginning with the title."
),
name='poem',
)
# Now build a LinearChain
linear_chain = LinearChain(model=model, links=[link_opposite, link_poem])
# Run it
result = linear_chain.run(input_dict={"A": "Hopeful"})
Here, Iām asking the model to write a poem about the āopposite of hopefulā. Note that I gave the first link a specific output_key
. This is what is in the result:
> {'A': 'Hopeful',
'state': " Hopeless\n\nThe darkness surrounds, [...]",
'opposite': ' Hopeless.'}
As you can see, the input is retained in the response, as are any output keys that differ from 'state'
. The 'state'
key itself is overwritten as the output is passed from one link to another, after which it contains the end result. Setting custom output keys allows you to monitor intermediate steps, or possibly use them in different parts of your flow.
The second type of chain, aptly named Chain
, is a little niftier. Here, we donāt use the 'state'
key to pass information around, so you have to be specific with your input and output keys. However, a Chain
object allows for some more complex logic. Take this, for example:
# Set up some links
link_a = Link(
template=StringTemplate(
"Capitalize this word: {A}. Only respond with the capitalized version",
),
name = 'opposite',
output_key="X",
)
link_b = Link(
template=StringTemplate(
"What's a synonym for {B}. Reply with one word.",
),
name='superlative',
output_key="Y",
)
link_c = Link(
template=StringTemplate(
"Combine {X} and {Y} and {C} in a meaningful sentence.",
),
name='sentence',
output_key="Z",
)
# Wrap em all up!
chain = Chain(model=model, links=[link_c, link_a, link_b]) # Note that we can pass the links in any order
As you can see, these links depend on one another. link_c
actually needs X
and Y
, but these are created by link_a
and link_b
. However, theyāre fed to the Chain
object in the wrong order. Panik.
This is where the Chain
object shines. When you .run()
it, it will first go through a ._topological_sort()
step, which determines the order in which each link needs to be run to satisfy the dependencies. We can see it in action if we just call it directly:
# Order the Link objects according to the dependency graph
chain._topological_sort()
> [Link: superlative <['B']> -> <Y>,
Link: opposite <['A']> -> <X>,
Link: sentence <['X', 'C', 'Y']> -> <Z>]
That makes sense, right? But wait, thereās more! What if we donāt give it all of the required input keys?
# Now we can run it
try:
result = chain.run(input_dict={"A": "A police officer", "B": "Hopeful"})
except Exception as e:
print(e)
> The following input keys are required: {'B', 'A', 'C'}. Missing: {'C'}
Nice, we catch that! Letās try to trip it up though. What if we add a cycle? Will it pass calls back and forth forever? Can we amass the gnarliest AWS bill in the world? (Donāt try this at home.)
# Make some links that reference one another's output
link_up = Link(
template=StringTemplate(
"Capitalize this word: {up}. Only respond with the capitalized version",
),
name = 'up',
output_key="down",
)
link_down = Link(
template=StringTemplate(
"What's a synonym for {down}. Reply with one word.",
),
name='down',
output_key="up",
)
# Chain them together
chain = Chain(model=model, links=[link_up, link_down])
# š«£
try:
chain.run(input_dict={"up": "happy"})
except Exception as e:
print(e)
> Cycle detected in the dependency graph.
Not today, Satan!
Wrap up!
Iāve talked about the basics of building a home-made LLM framework. Iāve talked about why I made it. And more specifically, I talked about the main promise of what turned out to be Fence:
- Keep it simple, keep it lightweight
- Keep it uniform (swap out models/templates should not break anything)
Thereās more to talk about! Fence has some neat little extras, such as:
- Silly loggiā¦ shut up with that already
- Parsers
- Logging model usage metrics via custom callback
- Onboard utils for retry and parallelisation
- Agents!!! š®
And finally, a closing message. My main goal with all of this is to become a better developer and/or AI engineer. Any feedback is welcome. Contributions are welcome too! I canāt speak on how much time Iāll be able to invest in this little hobby project, but Iām most certainly an active user myself. Next time, agents!