Mastering File Handling in Python: A Comprehensive Guide Part 9

Mr Stucknet
Python’s Gurus
Published in
5 min readMay 27, 2024

Custom encoding/decoding with JSON

In the JSON world, we can consider terms like encoding/decoding as synonyms for serializing/deserializing. They basically mean transforming to and back from JSON.In the following example, we are going to learn how to encode complex numbers — which aren’t serializable to JSON by default — by writing a custom encoder:

# json_examples/json_cplx.py
import json
class ComplexEncoder(json.JSONEncoder):
def default(self, obj):


print(f"ComplexEncoder.default: {obj=}")
if isinstance(obj, complex):
return {
'_meta': '_complex',
'num': [obj.real, obj.imag],
}
return super().default(obj)


data = {
'an_int': 42,
'a_float': 3.14159265,
'a_complex': 3 + 4j,
}

json_data = json.dumps(data, cls=ComplexEncoder)
print(json_data)

def object_hook(obj):
print(f"object_hook: {obj=}")
try:
if obj['_meta'] == '_complex':
return complex(*obj['num'])
except KeyError:
return obj

data_out = json.loads(json_data, object_hook=object_hook)
print(data_out)

We start by defining a ComplexEncoder class as a subclass of the JSONEncoder class. Our class overrides the default method. This method is called whenever the encoder encounters an object that it cannot encode and is expected to return an encodable representation of that object.

Our default() method checks whether its argument is a complex object, in which case it returns a dictionary with some custom meta information, and a list that contains both the real and the imaginary part of the number. That is all we need to do to avoid losing information for a complex number. If we receive anything other than an instance of complex, we call the default() method from the parent class, which just raises a TypeError. We then call json.dumps(), but this time we use the cls argument to specify our custom encoder. The result is printed:

$ python json_cplx.py
ComplexEncoder.default: obj=(3+4j)
{"an_int": 42, "a_float": 3.14159265,
"a_complex": {"_meta": "_complex", "num": [3.0, 4.0]}}

Half the job is done. For the deserialization part, we could have written another class that would inherit from JSONDecoder, but instead we have chosen to use a different technique that is simpler and uses a small function: object_hook.

Within the body of object_hook(), we find a try block. The important part is the two lines within the body of the try block itself. The function receives an object (notice that the function is only called when obj is a dictionary), and if the metadata matches our convention for complex numbers, we pass the real and imaginary parts to the complex() function. The try/except block is there because our function will be called for every dictionary object that is decoded, so we need to handle the case where our _meta key is not present.

The decoding part of the example outputs:

object_hook: obj={'_meta': '_complex', 'num': [3.0, 4.0]}
object_hook: obj={'an_int': 42, 'a_float': 3.14159265, 'a_complex':
(3+4j)}
{'an_int': 42, 'a_float': 3.14159265, 'a_complex': (3+4j)}

You can see that a_complex has been correctly deserialized.

Let’s now consider a slightly more complex (no pun intended) example: dealing with datetime objects. We are going to split the code into two blocks, first the serializing part, and then the deserializing one:

# json_examples/json_datetime.py
import json
from datetime import datetime, timedelta, timezone

now = datetime.now()
now_tz = datetime.now(tz=timezone(timedelta(hours=1)))

class DatetimeEncoder(json.JSONEncoder):
def default(self, obj):
if isinstance(obj, datetime):
try:
off = obj.utcoffset().seconds
except AttributeError:
off = None
return {

'_meta': '_datetime',
'data': obj.timetuple()[:6] + (obj.microsecond, ),
'utcoffset': off,
}
return super().default(obj)

data = {
'an_int': 42,
'a_float': 3.14159265,
'a_datetime': now,
'a_datetime_tz': now_tz,
}

json_data = json.dumps(data, cls=DatetimeEncoder)
print(json_data)

The reason why this example is slightly more complex lies in the fact
that datetime objects in Python can be time zone-aware or not; therefore, we need to be more careful. The flow is the same as before, only we are dealing with a different data type. We start by getting the current date and time information, and we do it both without (now) and with (now_tz) time zone awareness, just to make sure our script works. We then proceed to define a custom encoder as before, overriding the default() method. The important bits in that method are how we get the time zone offset (off) information, in seconds, and how we structure the dictionary that returns the data. This time, the metadata says it’s datetime information. We save the first six items in the time tuple (year, month, day, hour, minute, and second), plus the microseconds in the data key, and the offset after that. Could you tell that the value of ‘data’ is a concatenation of tuples? Good job if you could!.

When we have our custom encoder, we proceed to create some data, and then we serialize. The print statement outputs the following (we have reformatted the output to make it more readable):

{
"an_int": 42,
"a_float": 3.14159265,
"a_datetime": {
"_meta": "_datetime",
"data": [2021, 5, 17, 23, 1, 58, 75097],
"utcoffset": null
},
"a_datetime_tz": {
"_meta": "_datetime",
"data": [2021, 5, 17, 23, 1, 58, 75112],
"utcoffset": 3600
}
}

Interestingly, we find out that None is translated to null, its JavaScript equivalent. Moreover, we can see that our data seems to have been encoded properly. Let’s proceed with the second part of the script:

# json_examples/json_datetime.py
def object_hook(obj):
try:
if obj['_meta'] == '_datetime':
if obj['utcoffset'] is None:
tz = None
else:
tz = timezone(timedelta(seconds=obj['utcoffset']))
return datetime(*obj['data'], tzinfo=tz)
except KeyError:
return obj
data_out = json.loads(json_data, object_hook=object_hook)

Once again, we first verify that the metadata is telling us it’s a datetime, and then we proceed to fetch the time zone information. Once we have that, we pass the 7-tuple (using * to unpack its values in the call) and the time zone information to the datetime() call, getting back our original object. Let’s verify it by printing data_out:

{
'a_datetime': datetime.datetime(
2021, 5, 17, 23, 10, 2, 830913
),
'a_datetime_tz': datetime.datetime(
2021, 5, 17, 23, 10, 2, 830927,
tzinfo=datetime.timezone(datetime.timedelta(seconds=3600))
),
'a_float': 3.14159265,
'an_int': 42
}

As you can see, we got everything back correctly. As an exercise, we would like to challenge you to write the same logic but for a date object, which should be simpler.

Before we move on to the next topic, a word of caution. Perhaps it is counter-intuitive, but working with datetime objects can be one of the trickiest things to do, so although we are pretty sure this code is doing what it is supposed to do, we want to stress that we only tested it lightly. So, if you intend to grab it and use it, please do test it thoroughly. Test for different time zones, test for daylight saving time being on and off, test for dates before the epoch, and so on. You might find that the code in this section needs some modifications to suit your cases.

That’s it for today. See you tomorrow.

If you love my blogs please considering purchasing me a book.

Python’s Gurus🚀

Thank you for being a part of the Python’s Gurus community!

Before you go:

  • Be sure to clap x50 time and follow the writer ️👏️️
  • Follow us: Newsletter
  • Do you aspire to become a Guru too? Submit your best article or draft to reach our audience.

--

--