Refactoring a Python codebase using the Single Responsibility Principle

Marcelo Lebre
9 min readMay 3, 2017

--

One of the most important takeaways of McCabe’s cyclomatic complexity is that functions and methods that have the highest complexity tend to also contain the most defects. This effect tends to manifest itself with an increase in the number of refactoring cycles in your development process/pipeline.

The Single Responsibility Principle (SRP) is an effective strategy against this sort of problem by reducing the amount of code in the several layers of your codebase, focusing each one on specific objectives and separating them by logical domain.

The single responsibility principle states that every module or class should have responsibility over a single part of the functionality provided by the software, and that responsibility should be entirely encapsulated by the class. All its services should be narrowly aligned with that responsibility. Robert Cecil Martin

I used a structural design pattern to actually turn SRP into code using several notions of past codebases and languages (Java, Ruby, Elixir and Python). If you search a bit you'll find several good blogposts on these concepts, all with their own nuances and specificities.

Our structural design pattern relies on 4 building blocks, Handlers, Services, Finders, Values and the relation between them.

Handlers

The handler objects are the entrypoint on our structure, they exist only to dispatch and compose. In itself the handler is an orchestrator, it orders execution of tasks and/or fetches data to put a response back together.

It does not perform any direct action other than dispatching to other objects and composing answers together.

In this example we have a Flask view responsible for creating an app with a name and a description.

Our handler here abstracts all the hurdles of the main action and returns what we expect it to, an app.

As you can see, we call up our handler like so:

>>> AppHandler.create("a new app", "small description")

The handler object has only class methods and holds no state. In this example, AppHandler, complies with the SRP by only handling App related things, dispatching everything to services and finders.

Handler relation rules:

A Handler can call Services and Finders to execute a task or fetch data and Values for data representation. A handler can only be called from upstream, in this case, a view (in Django would be an action in a controller).

Services

A service is our go-to pattern for getting sh..tuff done. Due to the nature of this object, the initialisation and execution moments are performed separately.

It may be used to do something with an external API, our project's database or any other type of processing. Also, if you contemplate the possibility of migrating your code to a micro-service pattern, the cool thing is, you'll be able to maintain your interface/method signature as it is and just change the object's internal call method.

After initialisation, these objects will hold a temporary and immutable state. Once they are created they can be passed around and called up at later stages but must never be changed. The reason for this is that the mutability in this object could mean that upon execution the object would actually do something different than expected due to some developer mistake.

Here's an example:

>>> name = "an-awesome-app"
>>> description = "an-app-that-is-awesome"
>>> creation_service = CreateAppService(name, description)
>>> app = creation_service.call()

The CreateAppService seems like a pretty basic scenario, however, this is the most common setup around. In these situations it usually means you need another layer of processing so why not invoke another Service to do it?! Remember, a Service should only do what it is supposed to, nothing else.

Service relation rules:

A Service can call other Services, Finders or Values. As such only Handlers and Services themselves can call a Service (sorry for the redundancy).

Finders

A Finder does exactly what the name suggests, it finds things. It is usually associated with retrieving information from data sources, such as a database, files or external APIs.

The Finder object only defines class methods and holds no state. Since a finder is responsible for retrieving a specific type of information object, such as a Role model, it takes only a few number of simple arguments (usually 1 or 2).

A Finder is bound by SRP to retrieve only what its name states. A RoleFinder only retrieves roles, a UserFinder only retrieves users, so on and so forth. However, as in most databases you are able to retrieve the same info by building different queries each classmethod defines a unique way to retrieve the info you need.

The signature of a finder is pretty straightforward:

>>> role_id = "A0EEBC99-9C0B-4EF8-BB6D-6BB9BD380A11"
>>> role = RoleFinder.get_from_id(role_id)
>>> app_id = "A8FFB99-1CDB-5AC7-DD5C-4AB8AC211B32"
>>> role= RoleFinder.get_from_app(app_id)

And so is the implementation. In this specific example we used SQL Alchemy to query the database:

Finder relation rules:

A Finder can be called by a Handler or a Service and can call a value to return complex information.

Values

The value object, as a complete opposite to Handlers, Services and Finders is designed to hold state. This pattern is used to 1) convey data from one point to another and 2) beautifully compose complex objects in a super simple and fast manner.

Value objects are either a singular representation of an entity or a list of entity representations. In practice, it's either a dict or an array.

From all patterns this usually is the one that raises more questions. Before we go into implementation, let's define a very simple use case, building an REST API for web and mobile. The thing with building a REST API for mobile is that it may differ quite a bit from a web version. In most cases it envolves tweaking data representation fetched from the database, several duplicate or very similar lines of code in the way JSON responses are built etc.

With value objects you can create an abstraction and simply compose responses according to your need. A UserValue represents a user, with their basic info such as username, email, etc. In the same way, the AddressValue represents the address of a user, with street, zipcode and so on. If you want to build a JSON response that represents a user profile, you can just compose the UserValue and AddressValue together. Should you need to build a JSON response to list basic info from Users, you’d just (re)use the UserValue.

Let's get down to some examples.

>>> user = UserFinder.get_from_email("marcelo@unbabel.com")
>>> marcelo = UserValue(user)
>>> marcelo.raw
{'username': 'M', 'email': 'marcelo@unbabel.com', 'first_name': 'Marcelo', 'last_name': 'L.'}
>>> marcelo.json()
'{"username": "M", "email": "marcelo@unbabel.com", "first_name": "Marcelo", "last_name": "L."}'

First we used our UserFinder to get a user object directly from the database, then we pass it on to the UserValue which will use the necessary fields to create a representation of the User.

We use the raw property to access the value's raw representation, which, in this case is a dict. This is useful for composing or accessing data inside a value object.

The json method is exposed as means to illustrate the API scenario, it's not mandatory, you can expose the data however suits your purpose.

To achieve composition using this raw and json protocol we use a base class through inheritance called ValueComposite.

Value Composite

The ValueComposite object is a base object that you only need to define once and is an abstraction that you can port it from project to project.

The ValueComposite allows your ValueObjects to be initialised as a dict or array by using an abstract constructor.

As explained earlier, this is where we define the raw property (which always exists and is used to output the value's raw content) and the json() method (which is used to translate the raw value to a json format). This last method is not mandatory, you should use this to format the data in whatever way makes sense in your project.

Regarding the serialisers, there are two types of serialisers, the ones that add to the dict, for single values, and the ones that append values to the array, for multiple values. The code is self explanatory.

At this point we already have a good understanding of the ValueObject concept and logic. There are 3 generic scenarios where to use this pattern, single, nesting and multiple values.

Single Value

This was the example we analysed before. But let's go at it again and see how to build it from scratch.

>>> user = UserFinder.get_from_email("marcelo@unbabel.com")
>>> marcelo = UserValue(user)
>>> marcelo.raw
{'username': 'M', 'email': 'marcelo@unbabel.com', 'first_name': 'Marcelo', 'last_name': 'L.'}
>>> marcelo.json()
'{"username": "M", "email": "marcelo@unbabel.com", "first_name": "Marcelo", "last_name": "L."}'

Building this is as simple as this:

We start by super initialising our value and stating that it will be a dict based objet.

Post initialisation process, all that it's left is for us to serialize whichever fields we need. Simple and effective.

Nesting Values

So far we've been talking a lot about composition but never put it in practice. So let's do it.

In this scenario we need to create a Profile based on a Editor info and it's basic user data.

So we can create something like this:

>>> marcelo = UserValue(user)
>>> editor_marcelo = EditorValue(user)
>>> profile = ProfileValue(marcelo, editor_marcelo)
>>> profile.json()
'{"username": "M", "email": "marcelo@unbabel.com", "first_name": "Marcelo", "last_name": "L.", "number_of_lps": 4, "general_rating": 4.5}'

This is all we need to do:

Our ProfileValue's serializer handles the serialization of previously existing ValueObjects passed as arguments.

Multiple Values

In several cases we need to build an array instead of an individual object. In the next scenario we build 2 value objects and compose a higher lever one.

>>> user1 = UserFinder.get_from_email("marcelo@unbabel.com")
>>> user2 = UserFinder.get_from_email("rick@unbabel.com")
>>> marcelo = UserValue(user1)
>>> rick = UserValue(user2)
>>> users = [marcelo, rick]
>>> user_values = UserValues(users)
>>> user_values.raw
[{'username': 'M', 'email': 'marcelo@unbabel.com', 'first_name': 'Marcelo', 'last_name': 'Lebre'}, {'username': 'R', 'email': 'rick@unbabel.com', 'first_name': 'Rick', 'last_name': 'Sanchez'}]

The UserValues object is as simple as this:

As you may have realised by now, our ValueComposite does all the heavy lifting for the ValueObjects. Regardless of how complex you need to go, if you’re dealing with single or multiple values, the effort is exactly the same due to the magic of our composite pattern.

Value relation rules:

A Value object can only call other Value objects for composition. It can, however, be called by all other types of objects.

Conclusion / TL;DR

Applying the SRP using Handlers, Services, Finders and Values keeps your code clean and your mind sane.

Instead of having blown up views or incredibly fat models, you have multiple layers of thin and simple objects.

You may also ask, but why do I need to create a service/finder/handler with this kind of code overhead that do simple things? Several reasons actually, but, SRP aside, if you don’t follow any kind of rule, you’ll see things degrade incredibly fast as you scale. The time overhead to write something like a quick service is small enough that you can do this from the get go, it's not like you're building a micro-service in a different language.

The only downside to this approach is that the number of existing files in your project will increase considerably.

In short, where does SRP actually pay off:

  • Readability — Though I’m a big fan of pasta, I really don’t enjoy spaghetti code. Having simple objects defined based on what they do make our life a lot easier coming back to the code we wrote months ago.
  • Testability — Since the signatures of the objects are well-defined and very much contained, creating unit and integration tests is super straightforward and fast.
  • Robustness — Simple objects allow you to focus on the specificities of each task individually and reduces the amount of input/output variables you need to consider at any given time. Thus making the whole process less error-prone.
  • Onboarding — This approach has proven itself very helpful when handing down knowledge as the thought process is like a standard line protocol instead of a wibbly wobbly mix of instructions.
  • Caching Layers — For scaling scenarios, you can cache objects using solutions such as Redis just by adding 2/3 lines of code to an object. As such you don't need interfere with the rest of the codebase.
  • Reusability Given all the examples we've seen, I think this speaks for itself.
  • Less Issues Considerably reduces cyclomatic complexity, hence, reducing the amount of defects

Whether you’re starting a new codebase from scratch or slowly refactoring a legacy system, the Single Responsibility Principle and the framework detailed above can save you a world of trouble. Your future self will thank you! If you liked this piece, please hit the little heart button below, share to those who might find it useful and leave some comments below.

--

--

Marcelo Lebre

CTO at Remote.com Passionate about building products and scaling architectures. Love all things sci-fi and martial arts.