Fence 🤺 A homegrown LLM interaction framework for python

Wouter Durnez

Published in

Product & Engineering at Showpad

14 min readOct 3, 2024

Let’s start with a big disclaimer: this post practically radiates this xkcd. Another LLM package? For real?

Yes sir: introducing Fence — a minimalistic LLM interaction library!

🚀 https://github.com/WouterDurnez/fence 🚀

Let me take a step back and talk about how this came about.

If you want something done right…

Generative AI is still a relatively new technology, especially in production environments. When LLMs first started gaining popularity and proof of concepts (PoCs) were sprouting up everywhere, LangChain quickly became the go-to framework for building LLM-based workflows — it kind of helped that it was one of the only options available at the time. Today, however, there are plenty of Python frameworks to choose from, with several alternatives emerging over the past year.

LangChain, in turn, seems to have fallen victim to its own success. Partially as a result of the community-driven nature of the beast, the package quickly became bloated and relatively unstable, and developers started complaining about unnecessary layers of abstraction. In my job as an AI engineer, we went through the same steps. We depended on LangChain for a number of features, which ballooned our lambdas with superfluous dependencies until we had to turn them into Docker-based functions. At the same time, all we really depended on LangChain for our basic Chain flows and the use of the Document class.

In addition, we started building a bunch of extra code that would allow us to quickly swap out models while retaining all of our custom-made extras (e.g., monitoring our token expenditure). So we ended up with an hourglass structure: we were building a ton of code (wide), which through a minimal set of touch points (narrow) depended on LangChain (very wide). Por que?

…you have to do it yourself

One day, we decided to cut all of that stuff out, and replace it with our own components. Or maybe I gradually snuck it in our codebase. Who knows.

Sounds kinda sketchy though, right? In replacing a third-party framework, we might end up having to maintain a beast of our own making. Worst case, that would be even more cumbersome than before. Smells like hybris. On the other hand, we’ll be in full control, and you know what that means: silly log messa… I mean tailor-made solutions to our own needs.

Obviously, looking for an alternative, production-oriented framework (Griptape 😘) would be a good solution as well. But is that fun? Not nearly as much. Also, I figured building something myself would be a great exercise. Long story short: what started out as replacing a few core components with some classes of our own turned into a miniature package — Fence.

In the next paragraphs, I’ll give a quick intro on the framework. But before I do that, a final disclaimer: I did not make this package because I expect it to overtake the LLM lib landscape. In fact, I’m fully aware that, to a significant extent, I just reinvented the wheel in a different color 🌈.

Still, these were the main advantages to building Fence as I see them:

Focus on the ABCs of LLMs (cut down on bloat)
Keep it lean: trim down the dependencies
Learn peripheral software development stuff
Gain some deeper understanding of LLM-based patterns (e.g., Agents!)
Add silly logging formatti… no not you

So up, up and away I went! Now let’s get into the package itself.

From the ground up!

Core components

When working with a large language model, there are a few key components we need.

First, we must establish a method to invoke the model — this could involve sending requests to OpenAI’s API or querying it locally using tools like Ollama.
Second, we require a prompt. Anyone who has developed even a basic LLM application knows the importance of parameterizing that prompt. Depending on the model, we can provide a system message along with one or more user and assistant messages. Rather than merely passing along raw user input, we encapsulate it within a template, allowing us to incorporate additional directives, roles, guardrails, and other essential elements.
Third, it would be very convenient if those things could be chained (🤓) together somehow.

Pretty basic stuff, right? Let’s start with what this is all about: the Large Language Models themselves.

Models

Oy vey, so many models, so little time! Maybe first, let’s make sure we can call an OpenAI model, since they’re still the top dawg in the genAI service landscape. But before we do that, let’s think ahead.

Consistency! We’re building an LLM wrapper class. What should this class do, that all other models should implement as well?

We need a model_id attribute, so we can point at the right endpoint/service.
We need some kind ofinvoke method that actually passes input to the model.
(While we’re at it, let’s just tie the invoke method to the __call__ dunder method, shall we.)
Maybe we need an optional model_name as well to have a readable identifier. Why? Because services like AWS Bedrock use identifiers like anthropic.claude-3-sonnet-20240229-v1:0, which isn’t very readable.

The class, let’s call it LLM, should look something like this:

from abc import ABC, abstractmethod
from typing import Any

class LLM(ABC):
    """
    Base class for LLMs
    """

    model_id: str | None = None
    model_name: str = None

    def __call__(self, prompt: str | Any, **kwargs) -> str:
        return self.invoke(prompt, **kwargs)

    @abstractmethod
    def invoke(self, prompt: str | Messages, **kwargs) -> str:
        raise NotImplementedError

Great, now we have… well we don’t really have anything. But we can build something on top of this. Let’s move on to an actual model. Here’s my implementation of a GPT base class, which I use for — 🥁— OpenAI’s GPT family (simplified for blog purposes).

class GPTBase(LLM, MessagesMixin):
    """Base class for GPT models"""

    model_id = None
    model_name = None
    inference_type = "openai"

    def __init__(
        self,
        api_key: str | None = None,
        **kwargs,
    ):
        """
        Initialize a GPT model

        :param str|None api_key: OpenAI API key
        :param **kwargs: Additional keyword arguments
        """

        super().__init__()

        # Find API key
        self.api_key = api_key or os.environ.get("OPENAI_API_KEY", None)
        
        # Base URL
        self.url = "https://api.openai.com/v1/chat/completions"

        ... # Some extra stuff

    def invoke(self, prompt: str | Messages, **kwargs) -> str:
        """
        Call the model with the given prompt
        :param prompt: text to feed the model
        :return: response
        """

        # Call the API
        response = self._invoke(prompt=prompt)

        # Get response completion
        completion = response["choices"][0]["message"]["content"]

        return completion

    def _invoke(self, prompt: str | Messages) -> dict:
        """
        Handle the API request to the service
        :param prompt: text to feed the model
        :return: response completion
        """

        # Format prompt, using OpenAIs messaging format
        if isinstance(prompt, Messages):

            # Get messages as dicts (using pydantic under the hood)
            messages = [
                message.model_dump(exclude_none=True) for message in prompt.messages
            ]
            system_message = prompt.system

            # Extract system message
            if system_message:
                messages.insert(0, {"role": "system", "content": system_message})
        
        # If we receive a string, we will format it as a single user
        # message for ease of use.        
        elif isinstance(prompt, str):

            messages = [
                {
                    "role": "user",
                    "content": prompt,
                }
            ]
        else:
            raise ValueError("Prompt must be a string or a list of messages")

        # Build request body
        request_body = {
            "messages": messages,
            "model": self.model_id,
        }

        # Send request
        try:

            ... # The actual API call.

        except Exception as e:
            raise ValueError(f"Something went wrong: {e}")

While this is a tiny bit beefier, it leaves out many things, such as the model kwargs, logging of metrics (such as token counts etc. — see this link if you’re curious). There are also some hints towards other functionality (like the MessagesMixin), but you get the jist.

The main idea is that all models of this class call the API in a similar manner, since the request body is always similarly structured. By capturing that commonality in a base class, we can speed up things in the long run. For instance, if I now want to call a specific model, say GPT4o, I can do this:

class GPT4o(GPT):
    """
    GPT-4o model
    """

    def __init__(self, source: str, **kwargs):
        """
        Initialize a GPT-4o model
        :param **kwargs: Additional keyword arguments
        """
        super().__init__(**kwargs)
        self.model_id = self.model_name = 'gpt-4o'

EZ. You’re probably thinking, I could have just used OpenAI’s own openai package for this, right? All it takes is swapping out that same model_id in the arguments?

Sure you can!

You’ll import a slew of dependencies for what is essentially a simple request, but that’s not the worst thing in the world. What you don’t get, however, is this:

Stuff like that extra code I mentioned (again, see here) to handle metrics in a way you can define yourself through a logging callback. Better yet, you only need to define this callback once. Metrics will be logged uniformly for all models. So you can easily swap out models in your flow. Cool, right?
Did you notice the little isinstance(prompt, Messages) bit? Remember when we said LLMs accept multiple types of messages? Well, Fence has some pydantic models to deal with those (I know, reinventing the wheel again 🤷‍♂). Why do those models exist? Because they provide an interface between prompt templates and the eventual model invocation. See, various models assign roles to their messages (typically system, user, and assistant), yet this distinction is translated into slightly different request formats. Read: Anthropic expects you to pass those messages in a different way than OpenAI. You shouldn’t be bothered by that inconsistency though. By handling the payload creation inside our model classes, those wrinkles are ironed out for you. Just stick to the predefined Message classes, and swap out models as you please.

Getting a little complicated/too in-depth, perhaps, but just remember: the goal is to be able to drop-in replace models at any time in your code, regardless of where they’re served from!

Let’s move on to templating.

Templates

One of the criticisms I heard a lot about LangChain, was that it overcomplicates things. The PromptTemplate was often used as an example, which many people considered to be a glorified f-string. In other words, it’s just a string with some placeholders that you can format with input parameters, so you can pass the result to a model. Who needs a class for that?

True, it was just an f-string (you can go the jinja route as well, but let’s not go there, ‘t is a silly place). Here’s a spoiler though: Fence has f-strings too! Wrapped in a class and everything! But why? What else could you need? Well, what about:

Finding all input placeholders, and matching them with input arguments. Did you provide all the required input? Did you provide too many fields? An f-string will just shut you down if you don’t provide all the required arguments, and ignore any superfluous one. Let’s be better than that, shall we.
If you’re using messages with multiple roles (i.e., system, user, etc.), you may have some placeholders spread across them. For instance, maybe you want to change the system message dynamically, based on some variable, and you also have some placeholder fields in a user message. Would be nice if you could just render the template with input arguments in one go, rather than trying to find the right input fields across these messages, right?

Fence has two classes built just for this purpose: StringTemplate and MessagesTemplate. The former exists for models that accept nothing but string input, such as the older Anthropic generation (e.g. Claude Instant). On the other hand, if role-based messaging is supported, the latter will be of service. That includes OpenAI’s GPT models, the most recent Claude models, and more.

🤫 As mentioned earlier, these Templates rely on a bunch of pydantic, under the hood. There’s also some additional functionality in there (e.g., automatic base64 encoding of image messages), but that’s out of scope.

Still with me? Let’s wrap up with some final core components — the Link and Chain classes!

Links + Chains = 🤺

If you hadn’t guessed it yet, this is where the package name comes from. LangChain has its Chain class, where you combine a model and prompt template to get an LLM completion. Being the overgrown child I am, I named single LLM invocations a Link, and multiple links then came a Chain. Makes sense, right? A chain made of links? 😬

It gets bett… worse. What does ‘chainlink’ remind you of? A chainlink fence, right? Et voila, we have arrived at the most convoluted pun of all time. To make it even more confusing, I gave it a fencer logo (with fabulous hair). But I digress…

Putting stuff together! What should a Link do? In essence, it’s a simple combination of these things:

An LLM
A prompt (template)
A way to run it with input arguments
Optionally a way to post-process (i.e., parse) the output

Well hey, that’s exactly what it does! Rather than dive into the code, let’s consider a hello world example:

# If you haven't installed the package yet, 
# run `pip install fence-llm`!

# Imports
from fence.models.openai import GPT4omini
from fence.templates.messages import MessagesTemplate, Messages, Message
from fence.links import Link

# Let's define the model (pass an `api_key` arg 
# or make sure one is set in your environment
model = GPT4omini()

# Create a message template
messages = Messages(
    system='Respond in a {tone} tone',
    messages= [
        Message(role="user", content="Why is the sky {color}?"),
        # Equivalent to Message(role='user', content=Content(type='text', text='Why is the sky blue?'))
        # But Content can also be an image, etc.
    ]
)
messages_template = MessagesTemplate(
    source=messages
)

# Create the Link
link = Link(
        template=messages_template,
        model=model,
        # you can pass additional arguments, such as
        # a `parser`, an `output_key`, or `name` for logging purposes
)
response = link.run(input_dict={'tone': 'rude', 'color': 'blue'})

In your output, you’ll see something like this (depending on log settings):

> [2024–10–02 12:37:38] [ℹ️ INFO] [links.run:...] Executing unnamed Link

If you print the response, you’ll get something like this:

> {'state': "Wow, really? You're asking about the color of the sky? It's because of Rayleigh scattering, alright? Shorter blue wavelengths scatter more than other colors when sunlight hits the atmosphere. So, yeah, that's why it looks blue! Get with the program!"}

As you can see, the response is in a dictionary, under state key. This is by design. You can have the output stored under a custom output key as well, but the state key is special. It will allow us to propagate the output of different Link objects with minimal effort.

Now, you can easily swap the model with any other model. Even models that don’t support the ‘messages’ format, such as Claude Instant! If you run change the model to this:

from fence.models.claude import ClaudeInstant
model = ClaudeInstant()

# Note: this is currently the Bedrock instance, 
# and requires you to be logged in! Will update 
# the models to also use Anthropic as a provider
# eventually...

and run the same code, you get:

> {'state': ' I apologize, but I do not feel comfortable responding in a rude tone.'}

Err.. Yeah… #JustClaudeThings. Well, at least the code worked. How did it work? Because under the hood, the Link object detects that the ‘messages’ format is not supported by the older Claude models, and the template is converted to a StringTemplate instead. Shazam!

One more thing! And then we’ll wrap up…

Combining Links

Quite often, we’ll want to invoke a model more than once. Could be we’re splitting up a first step into parallelisable components, followed by a synthesis step (a pattern called MapReduce). Or maybe we want to run a verification after answering a question. In any case, it would be nice if we could have an easy tool to combine multiple Link objects into a single flow.

Fence calls these combination objects a BaseChain (again, 🤓). Currently, there are two types of chains: a LinearChain and a Chain. The first simply propagates input through a set of Link objects, as long as all of them (except the first one) take a state input key. Then you can build something like this:

# Build some links
link_opposite = Link(
    template=StringTemplate(
        "What's the opposite of {A}? Reply with a few words max."
    ),
    name = 'opposite',
    output_key = 'opposite',
)
link_poem = Link(
    template=StringTemplate(
        "Write a poem about {state}. Return only the poem, beginning with the title."
    ),
    name='poem',
)

# Now build a LinearChain
linear_chain = LinearChain(model=model, links=[link_opposite, link_poem])

# Run it
result = linear_chain.run(input_dict={"A": "Hopeful"})

Here, I’m asking the model to write a poem about the ‘opposite of hopeful’. Note that I gave the first link a specific output_key. This is what is in the result:

> {'A': 'Hopeful',
   'state': " Hopeless\n\nThe darkness surrounds, [...]",
   'opposite': ' Hopeless.'}

As you can see, the input is retained in the response, as are any output keys that differ from 'state'. The 'state' key itself is overwritten as the output is passed from one link to another, after which it contains the end result. Setting custom output keys allows you to monitor intermediate steps, or possibly use them in different parts of your flow.

The second type of chain, aptly named Chain, is a little niftier. Here, we don’t use the 'state' key to pass information around, so you have to be specific with your input and output keys. However, a Chain object allows for some more complex logic. Take this, for example:

# Set up some links
link_a = Link(
    template=StringTemplate(
        "Capitalize this word: {A}. Only respond with the capitalized version",
    ),
    name = 'opposite',
    output_key="X",
)
link_b = Link(
    template=StringTemplate(
        "What's a synonym for {B}. Reply with one word.", 
    ),
    name='superlative',
    output_key="Y",
)
link_c = Link(
    template=StringTemplate(
        "Combine {X} and {Y} and {C} in a meaningful sentence.", 
    ),
    name='sentence',
    output_key="Z",
)

# Wrap em all up!
chain = Chain(model=model, links=[link_c, link_a, link_b]) # Note that we can pass the links in any order

As you can see, these links depend on one another. link_c actually needs X and Y, but these are created by link_a and link_b. However, they’re fed to the Chain object in the wrong order. Panik.

This is where the Chain object shines. When you .run() it, it will first go through a ._topological_sort() step, which determines the order in which each link needs to be run to satisfy the dependencies. We can see it in action if we just call it directly:

# Order the Link objects according to the dependency graph
chain._topological_sort()

> [Link: superlative <['B']> -> <Y>,
   Link: opposite <['A']> -> <X>,
   Link: sentence <['X', 'C', 'Y']> -> <Z>]

That makes sense, right? But wait, there’s more! What if we don’t give it all of the required input keys?

# Now we can run it
try:
    result = chain.run(input_dict={"A": "A police officer", "B": "Hopeful"})
except Exception as e:
    print(e)

> The following input keys are required: {'B', 'A', 'C'}. Missing: {'C'}

Nice, we catch that! Let’s try to trip it up though. What if we add a cycle? Will it pass calls back and forth forever? Can we amass the gnarliest AWS bill in the world? (Don’t try this at home.)

# Make some links that reference one another's output
link_up = Link(
    template=StringTemplate(
        "Capitalize this word: {up}. Only respond with the capitalized version",
    ),
    name = 'up',
    output_key="down",
)
link_down = Link(
    template=StringTemplate(
        "What's a synonym for {down}. Reply with one word.", 
    ),
    name='down',
    output_key="up",
)

# Chain them together
chain = Chain(model=model, links=[link_up, link_down])

# 🫣
try:
    chain.run(input_dict={"up": "happy"})
except Exception as e:
    print(e)

> Cycle detected in the dependency graph.

Not today, Satan!

Wrap up!

I’ve talked about the basics of building a home-made LLM framework. I’ve talked about why I made it. And more specifically, I talked about the main promise of what turned out to be Fence:

Keep it simple, keep it lightweight
Keep it uniform (swap out models/templates should not break anything)

There’s more to talk about! Fence has some neat little extras, such as:

Silly loggi… shut up with that already
Parsers
Logging model usage metrics via custom callback
Onboard utils for retry and parallelisation
Agents!!! 👮

And finally, a closing message. My main goal with all of this is to become a better developer and/or AI engineer. Any feedback is welcome. Contributions are welcome too! I can’t speak on how much time I’ll be able to invest in this little hobby project, but I’m most certainly an active user myself. Next time, agents!