A simple guide to JSON in Python, simple and advanced variations

Naren Yellavula
Dev bits
Published in
5 min readSep 25, 2022

--

Hi everyone. I hope you are safe and sound with the tech industry’s volatility at the moment. In this article, I am going to discuss a few simple ways to process JSON in Python. You may already know how to convert JSON to Python objects and vice-versa, but extending that knowledge furthermore will simplify your job.

We will use Python 3.9 for all the examples shown as part of this article.

Before you immerse into the article, if you are new to programming or want to learn Python in-person, please checkout:

https://happy-pythonist.com

Starting With Basics

JSON (Javascript Object Notation) is a data format used to exchange/transfer data between two systems connected to a network. It is a text-based (generally UTF-8 encoded) data format that provides interoperability between various clients and servers. The other well-known formats are XML and protocol buffers.

As a developer, you may deal with JSON while designing communications with external systems. That’s where Python comes in handy, as it provides standard library constructs to generate or consume JSON with Python.

The data generation part is called JSON encoding, and JSON can be encoded into two targets:

  1. string (in-memory)
  2. file (in-memory, disk)

In similar way, data consumption is called JSON decoding. The decoded content will be transformed into:

  1. Python datatypes like list or dict or None

where the decoding source comes from either a file or a string. Now, let’s see few examples to understand the usage.

JSON package from standard library

Python has an in-built package that supports encoding and decoding JSON data.

import json

The main functions to discuss from package:

  1. loads() — Decode JSON from a string into a Python object
  2. load() — Decode JSON from a file into a Python object
  3. dumps() — Encode a python object to a JSON string
  4. dump() — Encode a python object to a JSON file

Let’s see these functions in detail.

Decoding (Aka. loading)

A JSON decode is required when a Python program gets JSON data from an external system, and it can be done in two ways:

  • loads() function takes a JSON string and converts it into respective Python object.

In the above snippet, a JSON string can be decoded into a Python list. Many different JSON data types have equivalent Python types and vice-versa.

Next, let’s see how to decode a JSON file (ex: data.json) instead of a string foo using load() method.

The above snippet is loading a file called data.json into memory and passing it to load function. The content from file is loaded into Python dict object.

This process of loading JSON content is also referred as de-serialization in other programming languages and technologies. Sometimes, one needs to modify the loaded content while de-serializing JSON data. It can be done with a custom decoder function.

Advanced transformations with custom decoder

Let’s take previous example where we load a JSON file into memory. Instead of computing length of `usedFor` field after loading JSON, we can compute that information while transforming the JSON.

In the above snippet, we defined a custom decoder function called `add_count_to_decoded_dict`. The `json.load` function takes an argument called object_hook. The object hook allows user to tamper with the original dict and can add or remove information from the JSON dict.

Coming back to program, we are counting the uses of a programming language and adding it to the original dict items with a new key usageCount. A practical examples would be 1. decoding raw responses from Amazon DynamoDB or 2. Only loading necessary data from a huge JSON source.

In this way, one can build custom decoders using object hooks. For more information see here: https://docs.python.org/3.9/library/json.html#json.load

Encoding (Aka. dumping)

  1. The JSON package dump() function takes a serializable* python object and converts it into a file-like object (in-memory) or a JSON file on disk.

Note: `serializable` means convertible Python data types into JSON. Python types like complex numbers, sets, and bytes are not serializable by default.

A JSON encode is required when data in Python memory needs to be converted to JSON text string (UTF-8 encoded) or a JSON file.

  • dumps() function takes a Python dict and converts it into a JSON string

Let’s see an example to convert a Python list of dicts into a JSON string and print it to standard output.

In the example, the dumps function is taking an extra argument called indent to prettify the JSON.

Similarly `dump()` function is used to convert Python object into a file by taking a file pointer.

Note: To minify JSON(Aka. remove extra spaces)while dumping, do not provide the `indent` keyword argument to the function.

Advanced features with custom encoder

Python json package provides a way to create custom encoders. The approach is similar to the object_hookthat we saw in case of a custom JSON decoder.

To create a custom encoder, simply create a function and pass it to argument named default in dumpsfunction. Let us take an example where we serialize/encode a Python class into JSON using a custom encoder. The below code example creates a Color class with two properties.; “Name” and “Found In”. The “custom_encoder” is a function that will iterate over the object properties and capitalize keys.

The program prints the string: {“name”: “RED”, “foundIn”: “soil”}

In addition to programmatically manipulate JSON, Python provides a command-line interface to directly work with JSON.

For ex: One can validate JSON string (from file/http) without even writing a Python program like this.

echo '{"name":"RED", "foundIn": "soil"}' | python -m json.tool

(or)

curl https://api.github.com/users/narenaryan/repos | python -m json.tool

One can checkout detailed command-line interface here: https://docs.python.org/3/library/json.html#json-commandline

There are standard library alternatives to serialize/de-serialize JSON in Python. Notable mentions are:

  1. Orjson (https://github.com/ijl/orjson)
  2. Ultrajson (https://github.com/ultrajson/ultrajson)

Most of these libraries preserve the API compatibility with standard library jsonpackage. Unless you have performance bottlenecks with serializing/de-serializing JSON, standard library would be sufficient.

Tip: Don’t optimize pre-maturely, you will end up having a dependency with no visible benefits.

Stay safe and have a great time coding in Python. If you like this article and want to follow updates: https://twitter.com/Narenarya3

References:

--

--

Naren Yellavula
Dev bits

Experienced software engineer. "To understand the universe, know yourself"