Understanding Custom Encoders and Decoders in Python’s JSON Module

Ayush Thakur
11 min readNov 16, 2023

--

Hi there, welcome to my blog! Today, I’m going to talk about encoders and decoders in Python, and how you can use them to manipulate data in different formats. Whether you want to convert text to bytes, bytes to text, or even custom objects to JSON, Python has some built-in functions that can help you achieve your goals. Let’s dive in and learn more about these powerful tools!

Photo by Alex Chumak on Unsplash

What are encoders and decoders?

Encoders and decoders are fundamental components in computer science and information technology used to convert data from one format to another.

Encoder:

An encoder is a device, software, or algorithm that transforms data from one representation to another. It takes input in a specific format and converts it into a different format suitable for transmission, storage, or processing. Encoders are used in various fields, including communications, multimedia, and computing.

In the context of programming and data interchange, an encoder typically refers to a function or module that converts data structures or objects from one form to another. For example:

  • Data Encoding: In data communication, encoders convert raw data or analog signals into digital format suitable for transmission over communication channels. Examples include encoding schemes like ASCII, Unicode, Base64, etc.
  • Serialization: In programming, encoders convert complex data structures or objects into a specific format (like JSON, XML, or binary format) for storage, transmission over networks, or interprocess communication.

Decoder:

A decoder, on the other hand, performs the reverse operation of an encoder. It takes encoded data and converts it back into its original format or a usable format for consumption by a system, device, or application.

In various domains:

  • Data Decoding: In data communication, decoders receive transmitted data in encoded form and convert it back to its original format for interpretation or processing by the receiving device.
  • Deserialization: In programming, decoders are used to convert data from serialized formats (like JSON, XML, or binary) back into native data structures or objects within a program.

Relationship:

Encoders and decoders often work in pairs, with an encoder converting data from one form to another for transmission or storage and a decoder reversing the process to restore the data to its original or usable form. This pairing ensures that information can be efficiently transmitted, stored, and processed across different systems and applications that might use different formats or representations.

Encoders convert data from one representation to another, while decoders reverse the process by converting encoded data back into its original or usable format. They are essential components in various domains, facilitating data interchange, communication, and compatibility across different systems and technologies.

How to use encode() and decode() functions in Python?

Python’s encode() and decode() methods are used to encode and decode the input string, using a given encoding. Let us look at these two functions in detail in this article.

Encode a given string

We use the encode() method on the input string, which every string object has. The format is:

input_string.encode(encoding, errors)

This encodes input_string using encoding, where errors decides the behavior to be followed if, by any chance, the encoding fails on the string. encode() will result in a sequence of bytes. For example:

inp_string = 'Hello'
bytes_encoded = inp_string.encode()
print(type(bytes_encoded))

This results in an object of <class ‘bytes’>, as expected:

<class 'bytes'>

The type of encoding to be followed is shown by the encoding parameter. There are various types of character encoding schemes, out of which the scheme UTF-8 is used in Python by default. Let us look at the encoding parameter using an example.

a = 'This is a simple sentence.'
print('Original string:', a)
# Encodes to utf-8 by default
a_utf = a.encode()
print('Encoded string:', a_utf)

Output

Original string: This is a simple sentence.
Encoded string: b'This is a simple sentence.'

NOTE: As you can observe, we have encoded the input string in the UTF-8 format. Although there is not much of a difference, you can observe that the string is prefixed with a b. This means that the string is converted to a stream of bytes, which is how it is stored on any computer. As bytes! This is actually not human-readable and is only represented as the original string for readability, prefixed with a b, to denote that it is not a string, but a sequence of bytes.

Handling errors

There are several types of errors encountered while handling Unicode decoding, each with its specific behavior:

  • strict: This represents the default behavior, raising a UnicodeDecodeError when encountering failure during decoding.
  • ignore: It ignores any un-encodable Unicode characters from the result.
  • replace: All un-encodable Unicode characters are substituted with a question mark (?).
  • backslashreplace: Instead of un-encodable Unicode characters, this method inserts a backslash escape sequence (\uNNNN).

Let us look at the above concepts using a simple example. We will consider an input string where not all characters are encodable (such as ö),

a = 'This is a bit möre cömplex sentence.'
print('Original string:', a)
print('Encoding with errors=ignore:', a.encode(encoding='ascii', errors='ignore'))
print('Encoding with errors=replace:', a.encode(encoding='ascii', errors='replace'))

Output

Original string: This is a bit möre cömplex sentence.
Encoding with errors=ignore: b'This is a bit mre cmplex sentence.'
Encoding with errors=replace: b'This is a bit m?re c?mplex sentence.'

Decoding a stream of bytes

Similar to encoding a string, we can decode a stream of bytes to a string object, using the decode() function. The format is:

encoded = input_string.encode()
# Using decode()
decoded = encoded.decode(decoding, errors)

Since encode() converts a string to bytes, decode() simply does the reverse. For example:

a = 'This is a simple sentence.'
# Encodes to utf-8 by default
a_utf = a.encode()
print('Encoded string:', a_utf)
# Decodes from utf-8 by default
a_decoded = a_utf.decode()
print('Decoded string:', a_decoded)

Output

Encoded string: b'This is a simple sentence.'
Decoded string: This is a simple sentence.

How to use JSON encoder and decoder in Python?

JSON (JavaScript Object Notation) is a popular data format for exchanging information between applications. It is based on a subset of JavaScript syntax and can represent simple data structures and objects. JSON is widely used in web development, APIs, and databases.

Python has a built-in module called json that provides functions for encoding and decoding JSON data. You can use json.dumps() to convert a Python object to a JSON string, and json.loads() to convert a JSON string to a Python object. Let us see how these functions work with some examples.

Encoding a Python object to JSON

Encoding a Python object to JSON is a common task in programming when you want to serialize data into a format that can be easily stored, transmitted, or shared across different systems. JSON (JavaScript Object Notation) is a lightweight data interchange format that is human-readable and easy for both humans and machines to understand.

Python provides a straightforward way to encode Python objects into JSON using the built-in `json` module. This module offers two main functions: `json.dumps()` and `json.dump()`.

Using `json.dumps()`

The `json.dumps()` function converts a Python object (such as a dictionary, list, tuple, string, integer, float, etc.) into a JSON string:

import json
# Sample Python object (dictionary)
python_object = {
"name": "John",
"age": 30,
"city": "New York"
}
# Encoding Python object to JSON string
json_string = json.dumps(python_object)
print(json_string) # Output: {"name": "John", "age": 30, "city": "New York"}

Using `json.dump()`

If you want to write the JSON data directly to a file instead of getting a JSON string, you can use `json.dump()`:

import json
# Sample Python object (dictionary)
python_object = {
“name”: “John”,
“age”: 30,
“city”: “New York”
}
# Writing JSON data to a file
with open(‘data.json’, ‘w’) as json_file:
json.dump(python_object, json_file)

This will create a file named `data.json` containing the JSON representation of the Python object.

Handling Custom Objects

While basic Python objects like dictionaries, lists, strings, and numbers can be easily serialized to JSON, custom objects require additional handling. You can define a custom serialization method by using the `default` parameter in `json.dumps()` or creating a custom JSON encoder by subclassing `json.JSONEncoder`.

Here’s an example of using the `default` parameter:

import json
# Custom object
class Person:
def __init__(self, name, age):
self.name = name
self.age = age
# Custom serialization function for Person objects
def custom_serializer(obj):
if isinstance(obj, Person):
return {"name": obj.name, "age": obj.age}
raise TypeError("Object of type 'Person' is not JSON serializable")
# Sample custom object
person_obj = Person("Alice", 25)
# Encoding custom object using custom serialization function
json_string = json.dumps(person_obj, default=custom_serializer)
print(json_string) # Output: {"name": "Alice", "age": 25}

Always remember to handle exceptions and ensure that the objects you’re serializing are compatible with JSON’s data types and serialization rules. JSON is a simple and widely used format for data interchange, making it an excellent choice for transmitting and storing structured data across different platforms and systems in a human-readable form.

Decoding a JSON string to a Python object

Decoding a JSON string to a Python object involves the reversal of the process of encoding. The json module in Python provides methods to parse JSON strings and convert them back into Python data structures. The primary functions used for decoding JSON are json.loads() and json.load().

Using json.loads()

The json.loads() function is used to parse a JSON string and convert it into a Python object:

import json
# Sample JSON string
json_string = '{"name": "John", "age": 30, "city": "New York"}'
# Decoding JSON string to a Python object
python_object = json.loads(json_string)
print(python_object)
# Output: {'name': 'John', 'age': 30, 'city': 'New York'}

Using json.load()

If you have a JSON file and want to load its content into a Python object, you can use the json.load() function:

import json
# Reading JSON data from a file
with open('data.json', 'r') as json_file:
python_object = json.load(json_file)
print(python_object)
# Output: Contents of the JSON file loaded into a Python object

Handling JSON to Custom Python Object Conversion

Sometimes, you might need to convert a JSON object into a custom Python object. To do this, you can use the object_hook parameter in json.loads() or json.load().

Here’s an example:

import json
# Custom object
class Person:
def __init__(self, name, age):
self.name = name
self.age = age
# Custom object hook for decoding JSON into a Person object
def custom_decoder(obj):
if 'name' in obj and 'age' in obj:
return Person(obj['name'], obj['age'])
return obj
# Sample JSON string representing a Person object
json_string = '{"name": "Alice", "age": 25}'
# Decoding JSON string to a custom Python object
person_object = json.loads(json_string, object_hook=custom_decoder)
print(person_object.name, person_object.age)
# Output: Alice 25

Using object_hook, you can define a function that will be called for each JSON object decoded from the JSON string or file. Inside the function, you can define the logic to convert JSON objects into custom Python objects.

Remember to handle exceptions and ensure that the JSON structure aligns with the expected conversion logic for custom objects. JSON decoding in Python is flexible and allows for easy transformation of JSON data into native Python data structures or custom objects as needed.

How to create custom encoders and decoders in Python?

You can create custom encoders and decoders to handle the serialization and deserialization of custom objects or non-standard data types when working with JSON.

Custom JSON Encoder

To create a custom JSON encoder, you can subclass the json.JSONEncoder class and override its default() method. This method is called whenever an object cannot be directly serialized to JSON using the default behavior.

Here’s an example:

import json
# Custom object
class Person:
def __init__(self, name, age):
self.name = name
self.age = age
# Custom encoder for Person objects
class CustomEncoder(json.JSONEncoder):
def default(self, obj):
if isinstance(obj, Person):
return {"name": obj.name, "age": obj.age}
return super().default(obj)
# Sample custom object
person_obj = Person("Alice", 25)
# Encoding custom object using the custom encoder
json_string = json.dumps(person_obj, cls=CustomEncoder)
print(json_string) # Output: {"name": "Alice", "age": 25}

In this example, the CustomEncoder class inherits from json.JSONEncoder and overrides the default() method to specify how Person objects should be serialized into JSON.

Custom JSON Decoder

For a custom JSON decoder, you can use the object_hook parameter in json.loads() or json.load(). This parameter allows you to specify a function that will be called for each JSON object decoded from the JSON string or file.

import json
# Custom object
class Person:
def __init__(self, name, age):
self.name = name
self.age = age
# Custom decoder for decoding JSON into Person objects
def custom_decoder(obj):
if 'name' in obj and 'age' in obj:
return Person(obj['name'], obj['age'])
return obj
# Sample JSON string representing a Person object
json_string = '{"name": "Alice", "age": 25}'
# Decoding JSON string to a custom Python object using the custom decoder
person_object = json.loads(json_string, object_hook=custom_decoder)
print(person_object.name, person_object.age)
# Output: Alice 25

In the custom_decoder() function, you define the logic to convert JSON objects into custom Python objects.

Custom encoders and decoders provide a way to handle complex or non-standard data types during the serialization and deserialization processes in JSON, giving you more control over how your data is converted to and from JSON format.

Conclusion

In the world of Python programming, handling data serialization and deserialization is a crucial aspect, especially when working with different data formats and structures. The json module within Python provides powerful tools for encoding Python objects into JSON strings and decoding JSON strings back into Python objects. However, there are scenarios where custom handling of encoding and decoding processes becomes necessary, particularly when dealing with complex or non-standard data types.

Custom Encoders

Custom encoders, implemented by subclassing the json.JSONEncoder class and overriding its default() method, empower developers to define specialized serialization rules for specific objects or data structures. This approach allows the conversion of custom Python objects into JSON-compatible representations, enabling seamless serialization of complex data structures.

Through the use of custom encoders, developers can take control of the serialization process, ensuring that instances of custom classes or non-standard data types are encoded into JSON format with precision and accuracy.

Custom Decoders

On the decoding side, employing custom decoders involves using the object_hook parameter in json.loads() or json.load(). This functionality allows developers to specify a function that will be invoked during the JSON parsing process. Custom decoders empower developers to transform JSON data into custom Python objects or handle specific JSON structures with tailored logic, providing flexibility and control over the deserialization process.

By leveraging custom decoders, developers can seamlessly convert JSON data into custom Python objects, facilitating the restoration of complex data structures from JSON representations.

Benefits and Use Cases

The ability to create custom encoders and decoders in the Python json module offers several benefits:

  • Handling Custom Data Structures: Custom encoders and decoders enable the serialization and deserialization of complex Python objects or non-standard data types into JSON format and vice versa.
  • Control and Flexibility: Developers have granular control over how their custom data types are encoded into JSON and decoded back into Python objects, ensuring accuracy and consistency in the data conversion process.
  • Custom Serialization Logic: Tailoring serialization and deserialization logic to handle specific data structures or complex object hierarchies simplifies the exchange of data between different systems or storage formats.
  • Compatibility and Interoperability: By implementing custom encoders and decoders, developers can ensure seamless compatibility between their Python applications and external systems that rely on JSON as a data interchange format.

In conclusion, the ability to create custom encoders and decoders in Python’s json module empowers developers to efficiently handle serialization and deserialization of diverse data types, providing control, flexibility, and customization in data conversion processes. These tools are invaluable when working with complex data structures or custom classes, allowing for smooth integration and interoperability in Python applications.

--

--

Ayush Thakur

🔍 Inquisitive Researcher 📚 Academic Writer 🐧 Linux 💻 Python Developer