Process HTTP body line-by-line

Published in

Tech Tonic

3 min readFeb 12, 2022

In some use cases, a file upload server may need to process the request body line-by-line. For example — Process the uploaded text file, do some processing (like count the words), and return the number of words found in the file. There are two ways to do this:

Convert the uploaded file to text, split by LF, and process the lines
Direct the uploaded file stream through a LF stream processor, and process the lines

In this article, we’ll learn how to efficiently process an HTTP body line-by-line.

First way — Without streams

The first way is fast, easy to use, and is suitable for small sized files. There are three simple steps:

Convert the body to string (using .text() API)
Split the body by LF (using .split() API)
Process the lines by splitting each line by space and adding to the word count

Here is an implementation of the above algorithm:

import { serve } from "https://deno.land/std/http/mod.ts";const port = 9000;async function handleRequest(req: Request) {
  let words = 0;
  if (req.body) {
    const file = await req.text();
    for (const line of file.split("\n")) {
      words += line.split(" ").length;
    }
  }  return new Response(words.toString() + "\n");
}serve(handleRequest, { port });
console.log(`Listening on http://localhost:${port}`);

Let’s do a run with a ~2M text file containing ~39K lines:

$ du -kh textFile2M.txt 
1.8M textFile2M.txt$ wc -l textFile2M.txt 
39012 textFile2M.txt

A hundred requests are made using curl’s parallel mode:

$ curl -i --parallel --parallel-immediate --parallel-max 5 --data-binary "@./textFile2M.txt" --config /var/tmp/curl.txt

The processing was very fast. The peak memory usage was:

Physical footprint (peak):  79.5M

Now, let’s repeat the test with a ~50M text file containing ~1M lines:

$ du -kh textFile50M.txt 
50M textFile50M.txt
$ wc -l textFile50M.txt 
1053486 textFile50M.txt

A hundred requests are made using curl’s parallel mode:

curl -i --parallel --parallel-immediate --parallel-max 5 --data-binary "@./textFile50M.txt" --config /var/tmp/curl.txt

The processing was still very fast. The peak memory usage was:

Physical footprint (peak):  660.3M

That was a sharp increase in the memory usage! When the load stops, the memory usage does go down.

That’s all about the first way. Let’s have a look at the stream way and find out if we can do better.

Second way — Streams

The second way is also easy, but a bit slow when compared to the first way. The advantage of the second way is that it keeps memory under control.

There are three simple steps:

Import LineStream API from standard library’s streams module
Pipe the request body to LineStream (using .pipeThrough() API)
Process the yielded lines by splitting each line by space and adding to the word count

Here is the implementation of the above algorithm:

import { serve } from "https://deno.land/std/http/mod.ts";
import { LineStream } from "https://deno.land/std/streams/delimiter.ts";const port = 9000;const td = new TextDecoder();
const dec = (b: Uint8Array) => td.decode(b);async function handleRequest(req: Request) {
  let words = 0;
  if (req.body) {
    const file = req.body?.pipeThrough(new LineStream());
    for await (const line of file) {
      words += dec(line).split(" ").length;
    }
  }  return new Response(words.toString() + "\n");
}serve(handleRequest, { port });
console.log(`Listening on http://localhost:${port}`);

Let’s repeat the first test with a ~2M text file containing ~39K lines:

$ du -kh textFile2M.txt 
1.8M textFile2M.txt$ wc -l textFile2M.txt 
39012 textFile2M.txt

A hundred requests are made using curl’s parallel mode:

$ curl -i --parallel --parallel-immediate --parallel-max 5 --data-binary "@./textFile2M.txt" --config /var/tmp/curl.txt

The processing is quite slow compared to the first way. The peak memory usage was:

Physical footprint (peak):  58.4M

The peak memory usage is less than the first way, but not by a big margin.

Now, let’s repeat the second test with a ~50M text file containing ~1M lines:

$ du -kh textFile50M.txt 
50M textFile50M.txt
$ wc -l textFile50M.txt 
1053486 textFile50M.txt

A hundred requests are made using curl’s parallel mode:

$ curl -i --parallel --parallel-immediate --parallel-max 5 --data-binary "@./textFile50M.txt" --config /var/tmp/curl.txt

The processing was quite slow when compared to the first way. The peak memory usage was:

Physical footprint (peak):  59.1M

Now, that’s a huge difference from the first way, where the peak memory usage for 50M files was ~660M.

Regardless of the size of file (2M or 50M), the memory usage stayed at ~59M.

Process HTTP body line-by-line

First way — Without streams

Second way — Streams

Written by Mayank C