For devs, by devs: The story of the Mercado Libre Go toolkit

Sebastian Orfino
Mercado Libre Tech

--

We love making software

Making software must be one of the most creative activities there is. You’re constantly faced with new problems or variations of existing problems in a world that is always changing. While there are some tools and techniques that can help us, you still need to come up with that idea — usually elusive — the one that solves the problem in both an effective and elegant way.

Perhaps that’s the hardest part of all: coming up with such a solution Obviously, we don’t have the answer; it would be ideal if we did. In general, it doesn’t happen very often. But we’ll tell you about a time when it did.

Applications and libraries

In addition to building apps that solve specific business problems, we developers also create libraries. Libraries make us more productive because they allow us to reuse solutions for specific, highly transferable problems, saving us from having to rewrite code every time — from one domain to another. An example could be generating a 128-bit cryptographic key or calculating the CRC32 of a file. Another benefit is that libraries have many more “flight hours” than our apps, meaning they are less likely to contain bugs compared to code we write from scratch. And last but not least, perhaps most importantly, libraries are written by experts in their respective domains, such as networking or cryptography, who have undoubtedly spent much more time on the subject and have already answered all the questions we haven’t even thought to ask yet.

The problem with libraries is that neither the amount nor the way they will be used can be entirely predicted. An issue in a library (e.g., a bug or a vulnerability) affects an indeterminate number of apps. Plus, fixing a bug can break a client that, in some mysterious way, was dependent on that buggy behavior (see Hyrum’s Law).

Another topic is versioning. In libraries, versioning is crucial because it determines how package managers handle conflict resolution. In Go, this means adhering strictly to semantic versioning (semver) while still leaving room for changes — without modifying too much — to avoid breaking the library’s clients with disruptive changes and thus hindering adoption, which ultimately determines a library’s success.

Lastly, it’s essential to remember that libraries must be idiomatic. That is, they should naturally blend with the programming language and its conventions. In Go, for example, a custom library shouldn’t differ much from the standard library. In fact, developers shouldn’t even notice that they aren’t using the standard library; it should flow seamlessly, as if there were no boundary.

Running applications at MELI

Image 1: Example of interconnected apps in our Fury ecosystem.

In a company as large as MELI, where millions of requests per minute are constantly happening across payments, sales, shipping, and searches, thousands of interconnected apps form an endless and ever-changing river (we could quote Heraclitus, but we won’t). Approximately half of our productive traffic flows through Go applications — and this proportion is increasing. These thousands of apps, created from our internal platform Fury, run on a highly standardized and secure infrastructure, where certain rules must be followed to ensure the protection and experience of our users. The platform offers services that apps can consume if needed, such as SQL databases, key-value stores, document databases, free text search, messaging, secrets storage, object storage, audits, and more. You name it, we have it. We even have design pattern implementations as services, like event-sourcing.

The typical backend apps at MELI essentially perform the following tasks: receive requests, make requests, transform (or aggregate) data, and then respond. In other words, most of the processing is I/O bound. Therefore, it’s crucial for apps to handle I/O carefully if we want the platform to scale to millions of requests per minute (RPM) while also being cost-effective.

Each of these apps require some form of observability. This allows us to monitor them, ensure that incoming traffic is authenticated, enable logging, and let developers query those logs for troubleshooting, among other things. All of these features are provided by the platform they run on. Learn more about Mercado Libre’s observability ecosystem in this post.

For devs, by devs

Image 2: This room hosted the first meetings of the Go toolkit.

In the beginning, there wasn’t much. We weren’t native to Go, with only some SDKs that allowed us to use Fury’s services and a somewhat opinionated router based on Gin. We could say that was enough. However, as our expertise in Go grew and the number of teams continued to expand, we realized that what we had wasn’t the best, and it was time to create something new from scratch — idiomatic, robust, easy to use, and… versioned (unlike the legacy SDKs).

This initiative began with a team of four people at Mercado Pago (MP), which later grew to five or six, but no more. The idea was for teams in one sector of MP to adopt this new toolkit. We wanted to create something we enjoyed using — something comfortable, versatile, and at the same time, elegant. Because we know there is beauty in those proportions where nothing is missing and nothing is superfluous.

So, what do we want? The list isn’t exhaustive, but it gives an idea. Of course, it emerged after quite a bit of iteration:

  • Developers can create a web application and register their endpoints with their respective handlers.
  • Then, start it up and begin receiving requests.
  • Try to remove repetitive code when handling requests (marshaling, unmarshalling, error encoding, parameter parsing, etc.).
  • It should be extendable with custom middleware.
  • It should be able to stop receiving requests. That is, listen for operating system signals to shut down the server and terminate the app.
  • It should know how to start receiving requests. That is, emit (or more accurately, react to) a liveness signal.

That’s for handling downstream, but when the app turns around and looks upstream, we also want:

  • HTTP clients that allow setting retry policies, backoff, and timeouts.
  • Perhaps a circuit breaker.
  • And all of this should be observable, meaning it emits metrics and traces at all levels so that developers can see and understand what their app is doing.

Since MP must comply with certain regulations typical of a FinTech, some computations must be performed in a segmented network. Although this network has different features than Fury, we still program in Go there. Therefore, the toolkit also needed to support this type of application.

The closer we are to the platform, the more opinions we have to adopt because we are less abstracted from it. Thus, it became clear that our toolkit actually needed to be split into two:

  • go-core: This is for all the common components that any app might need (even outside of MELI). It includes circuit breakers, HTTP clients with retries, round-trippers to customize request/response, tracing and metrics, a web server with some opinions, logging, etc.
  • go-platform: This is for components specific to Fury and the other network, with opinions formed only within MELI, given our knowledge of the platform. It includes bootstrapping the web server and initializing observability components according to the platform.

We often joke about spending hours debating the name of an API, comparing those meetings to “The Chamber of Ents”. Naming things is a fundamental problem: we can’t fully understand what we can’t name, and we’re aware that this isn’t a trivial issue. It always starts with the question, “What is the public API?” Afterward, we can discuss implementations. The public API requires much more work and involves many more iterations.

Adopting the toolkit

The success of a library or SDK is often tied to the adoption it receives, which can only be measured in terms of options. If there are no options, it doesn’t make sense. In this case, there were other options — in fact, there were many. Fury is a platform flexible enough that no one is forced to use any particular library, or anything at all. The only requirement is that your app complies with certain rules while running on Fury, but if a team prefers to write everything from scratch, they can do so. Therefore, we can assume that this wasn’t the reason, at least not initially, why developers across MELI slowly began using this toolkit and started abandoning the previous one.

It’s important to emphasize that no one — no team — was forced to adopt the new SDKs and other tools. There was no directive or order to enforce or push this adoption. In fact, from the beginning, the idea wasn’t to create an “official” toolkit, just something for our closest teams to use.

The shift toward something better — focused not just on performance, but also on API ergonomics and stability — is what led MELI developers to adopt these new tools. Simply because we could do better at what we do, improving the quality of our work, and no special justification was needed for that. That’s why the adoption of the toolkit eventually crossed the boundaries of MP and spread throughout the company. Its capabilities evolved until it ultimately became a community asset, which it is today.

Moving forward

At the time of writing this article, the Go version is 1.22.2. Several things have changed since then, prompting us to work on a new version of the toolkit. In addition to the standard library gaining new features (for example, generics didn’t exist when we started), we’ve also learned from some mistakes we made in the design. Plus, the observability platform is transitioning to OpenTelemetry, so it’s no longer necessary to provide that abstraction.

Another important reason is the growing demand to use the toolkit across different types of apps. It’s no longer limited to classic web apps, as described above, but is also extending to certain infrastructure components that address platform issues rather than business cases (e.g., a sidecar). Different types of apps require different considerations.

It’s an ongoing process, and perhaps it always will be. Saying, “This is finished,” is difficult because software development is constantly evolving and adapting to an ever-changing world. With this in mind, we embrace the challenge, thoughtfully planning and then programming — because that’s what we love to do most.

--

--