JSON Streaming — How to Work with Large JSON Files Efficiently

Alexander Obregon
5 min readApr 4, 2023

--

Image Source

Introduction

Dealing with large JSON files can be a challenge when it comes to memory management and performance. In this article, we will explore JSON streaming, a technique that allows you to work with large JSON files efficiently. We will discuss how JSON streaming works and demonstrate its usage with code examples in Python.

What is JSON Streaming?

JSON streaming is a technique that processes JSON data in chunks instead of loading the entire file into memory. This approach enables efficient memory usage, faster processing, and the ability to work with massive JSON files that might not fit into memory.

Getting Started: Python Libraries

To work with JSON streaming in Python, you can use the ijson library. To install it, simply run:

pip install ijson

Using ijson

ijson is a Python library that allows you to parse and extract data from JSON files iteratively. This means that you can process JSON data in smaller chunks, making it possible to work with large files without running into memory constraints.

Here’s an example of how to use ijson to parse a large JSON file:

import ijson

# Open the JSON file
with open('large_file.json', 'r') as file:
# Parse the JSON objects one by one
parser = ijson.items(file, 'item')

# Iterate over the JSON objects
for item in parser:
# Process each JSON object as needed
print(item)

In this example, we open a large JSON file called large_file.json and create an ijson parser that processes the file object by object. We then iterate over the JSON objects using a for loop, processing each object as required.

Working with JSON Arrays

When working with large JSON arrays, it’s essential to ensure that you’re processing the data efficiently. Here’s an example of how to use ijson to stream JSON arrays:

import ijson

# Open the JSON file
with open('large_array.json', 'r') as file:
# Parse the JSON array items one by one
array_items = ijson.items(file, 'item')

# Iterate over the JSON array items
for item in array_items:
# Process each JSON array item as needed
print(item)

In this example, we open a large JSON file called large_array.json and create an ijson parser that processes the file item by item. We then iterate over the JSON array items using a for loop, processing each item as required.

Streaming JSON from Server to Client

Streaming JSON data from a server to a client is a powerful technique for efficiently handling large datasets, particularly in web applications. This method is beneficial when dealing with large streams of data that need to be processed and displayed in real-time, without waiting for the entire data set to load. Here, we’ll explore how to implement JSON streaming from a server endpoint and process the chunks of JSON data as they arrive on the client side, using JavaScript.

Server-Side Setup

First, ensure your server is configured to stream responses. For example, in a Node.js environment using Express, you can use the response.write() method to send chunks of data as they become available, and response.end() to close the stream:

const express = require('express');
const app = express();
const PORT = 3000;

app.get('/stream-data', (req, res) => {
res.writeHead(200, {
'Content-Type': 'application/json',
'Transfer-Encoding': 'chunked',
});

// Simulate streaming data by sending JSON chunks
const data = [
{ id: 1, name: 'Item 1' },
// Assume more data objects here
];

data.forEach((item, index) => {
if (index > 0) {
res.write(','); // Properly format JSON array
}
res.write(JSON.stringify(item));

// Simulate delay for demonstration
sleep(1000);
});

res.end();
});

function sleep(ms) {
return new Promise(resolve => setTimeout(resolve, ms));
}

app.listen(PORT, () => {
console.log(`Server running on port ${PORT}`);
});

Client-Side Processing with JavaScript

On the client side, you can use the Fetch API with a ReadableStream to process JSON chunks as they arrive. This approach is particularly useful for web applications that need to display data in real-time:

async function streamJSON(url) {
const response = await fetch(url);
const reader = response.body.getReader();
const decoder = new TextDecoder();
let result = '';

while (true) {
const { done, value } = await reader.read();
if (done) break;

// Decode and process chunk
result += decoder.decode(value, { stream: true });
// Handle JSON parsing safely, considering incomplete chunks
try {
let json = JSON.parse(`[${result}]`);
console.log('Chunk processed:', json);
// Process or display your data here
} catch (error) {
// Error means incomplete JSON, wait for next chunk
}
}
}

// Call the function with your endpoint
streamJSON('http://localhost:3000/stream-data');

In this client-side example, we’re using a loop to continuously read from the stream provided by the Fetch API. The TextDecoder is used to convert the stream chunks from Uint8Array to a string. We attempt to parse the accumulated string as JSON inside a try-catch block; if it fails, we wait for more data to complete the JSON structure.

Considerations

  • Error Handling: Make sure you have strong error handling on both client and server sides to manage issues like network failures or malformed JSON.
  • Performance: Test the performance for your specific use case, especially if the data rate is high or if the client-side resources are limited.
  • Security: Implement appropriate security measures, especially if sensitive data is being streamed.

Streaming JSON data from a server endpoint and processing it on the client side with JavaScript is an effective strategy for handling large datasets in real-time web applications. This approach minimizes memory usage on the client side and enhances the user experience by displaying data as soon as it’s received. With the right setup and careful handling of data chunks, developers can leverage streaming to build responsive and efficient web applications.

Conclusion

JSON streaming is a key technique for managing large JSON files, enhancing memory efficiency and processing speed. Our exploration included using ijson in Python and extended to client-side streaming from server endpoints. These methods facilitate real-time data handling and improve application responsiveness. As developers embrace these techniques, they empower themselves to address data-intensive challenges effectively.

  1. ijson Documentation
  2. Python Documentation on JSON

--

--

Alexander Obregon

Software Engineer, fervent coder & writer. Devoted to learning & assisting others. Connect on LinkedIn: https://www.linkedin.com/in/alexander-obregon-97849b229/