Transforming data with Django Serializers
Django Rest Framework is an incredible add-on for Django developers, helping set up rich RESTful APIs for our apps using the well-known CBV constructs. Overall, it makes managing REST APIS a breeze.
One such way is how to serialize and deserialize, a.k.a. “transform” data using serializer classes. In this article, I’ll be diving into just that: how to transform data into a normalized format and back out again.
The Models
As always with these articles, I’ll start by setting the stage with the models and views we’ll be working with. We’ll start with a model used for tracking software releases, below:
from django.db import models
class Release(models.Model):
name = models.CharField(max_length=128)
release_date = models.DateField(auto_now_add=True)
version = models.CharField(max_length=12)
Let’s create the corresponding DRF ModelSerializer
class and view set for this new model. To keep things simple, we’ll serialize all three fields from our model:
from rest_framework import serializers, viewsets
from .models import Release
class ReleaseSerializer(serializers.ModelSerializer):
class Meta:
model = Release
fields = '__all__'
class ReleaseViewSet(viewsets.ModelViewSet):
queryset = Release.objects.all()
serializer_class = ReleaseSerializer
Finally, we register the new REST endpoint in our urls.py
, for example:
from django.urls import path
from rest_framework.routers import DefaultRouter
from .models import ReleaseViewSet
router = DefaultRouter()
router.register(r'releases', ReleaseViewSet)
urlpatterns = [
path('api/', include(router.urls)),
]
Everything’s in place. Let’s launch the server and then test out our new REST endpoints from the client side using the python requests library:
>>> import requests
>>> data = {
... 'name': 'Untitled',
... 'version': '1.2.3',
... }
>>>
>>> r = requests.post('http://localhost:8000/api/releases', data=data)
>>> r.status_code
201
>>> r.content
'{"id":4,"version":"1.2.3","name":"Untitled","release_date":"2024-11-26"}'
Thanks to DRF, our new Release
model now has a fully-featured REST API endpoint. Easy-peasy. Now we can get down to business.
For the purposes of this exercise, let’s make a simple change to our Release
model. Currently, our versions are stored as strings. The drawback to this approach is that they cannot be sorted easily — for example, “10.9.9” will sort alphabetically before “2.0.0”, since 1
is less than 2
. If we want to sort our version numbers correctly, we’ll need to store our version numbers in a different format — an integer. To accomplish this, we’ll need to normalize the version string into an integer (i.e. “10.9.9” becomes 10009009
), but we’d want do this in a way that preserves the integrity of the information for the end client. In this article, I’ll show how to do just that using serializers.
Creating a custom serializer field
DRF comes with the ability to create custom serializer fields. This can be extremely useful when dealing with non-standard data types, as in our case.
To solve this problem, we’ll create a new serializer field which can convert version strings to integers and back!
Thankfully, the DRF framework makes the conversion really simple by providing us with two methods we can overwrite:
to_internal_value(self, data)
is used for data coming in and determines how to serialize data to prepare it for storage into the model.to_representation(self, value)
is used for data coming out and determines how to deserialize data to prepare it in a human-readable format to be sent back to the REST client.
To convert our version string into a singular integer, we’ll take a simple approach where we’ll remove the .
from the string, then stitch the digits back together, padding out with zeroes where necessary. We’ll assume each version component is less than 1,000 for the sake of this example. In our previous example, the version string 10.9.9
would become the integer 10,009,009
and the version string 2.0.0
would become the integer 2,000,000
and since 2 million is less than 10 million, the versions will now sort correctly.
Of course, there are a number of alternative serialization approaches that also work here (octal, hex, for example) to represent the version number, but for the purposes of this example I’ll keep it simple and keep to decimal.
With that in mind, let’s begin with the serialization method. We can convert the version string by multiplying each of the three components by multiples of 1,000, then add them all together into a single integer:
def to_internal_value(self, data:str) -> int:
# Convert to integer using decimal representation
parts = data.split('.')
major, minor, patch = map(int, parts)
return (major * 1_000_000) + (minor * 1_000) + patch
For the deserialization, we do the reverse operation — isolate the millions column for the first component, the thousands column for the second component, and the remainder for the last component. Finally, we concatenate the three numbers back together into a string.
def to_representation(self, value: int) -> str:
major = value // 1_000_000
minor = value // 1_000 % 1_000
patch = value % 1_000
return f"{major}.{minor}.{patch}"
Let’s not forget to add some validation on the version strings to make sure they follow the anticipated format. We can throw DRF’s ValidationError
to send back an HTTP 400 to the client to let them know the value they passed in was not formatted correctly.
def validate_version_str(data: str):
if not re.match(r'^\d+\.\d+\.\d+$', data):
raise serializers.ValidationError("Version number must follow the format X.Y.Z.")
Putting it all together
Let’s take our new serializer method, deserializer method, and validator method and put them all together into a custom serializer field class:
from rest_framework import serializers
import re
def validate_version(data: str):
if not re.match(r'^\d+\.\d+\.\d+$', data):
raise serializers.ValidationError("Version number must follow the format X.Y.Z.")
class VersionNumberField(serializers.Field):
def to_internal_value(self, data: str) -> int:
# Validate input
validate_version(data)
# Convert to integer using decimal representation
parts = data.split('.')
major, minor, patch = map(int, parts)
return (major * 1_000_000) + (minor * 1_000) + patch
def to_representation(self, value: int) -> str:
# Convert normalized value back to X.Y.Z format
major = value // 1_000_000
minor = value // 1_000 % 1_000
patch = value % 1_000
return f"{major}.{minor}.{patch}"
Note that I’ve modified the to_internal_value
method to call the validate_version
method before the serialization starts.
With this new serializer field, we update our existing viewset by overwriting the version
field to make use of our new class:
from rest_framework import serializers
from .models import Release
class ReleaseSerializer(serializers.ModelSerializer):
version = VersionNumberField()
class Meta:
model = Release
fields = '__all__'
The view and model classes will remain unchanged.
Let’s try our our new REST API from a python client:
>>> import requests
>>> data = {
... 'name': 'Untitled 2',
... 'version': '1.2.3',
... }
>>>
>>> r = requests.post('http://localhost:8000/api/releases', data=data)
>>> r.status_code
201
>>> r.content
'{"id":5,"version":"1.2.3","name":"Untitled 2","release_date":"2024-11-26"}'
Notice that the version string returned is in the human-readable format — the client is none the wiser about how the data is stored on the back-end. From the shell, we can validate the internal representation of the version for the newest release we created via the API is indeed an integer:
>>> from .models import Release
>>> r = Release.objects.last()
>>> r.version
1002003