The Fault In Our Responses: Bytes Remaining To Read

Ville Lahdenvuo
Grano
Published in
6 min readJul 28, 2017

This is the story on how a single line of code caused a lot of grey hairs and moments of frustration. I never thought I would be writing such a post, but I guess it’s my time to shine… the light on a hairy bug that halted developer productivity for three full workdays.

We’re currently working on a new e-commerce solution for Grano, aiming to provide a better customer experience by modernizing our platform, improving automation and integration to our production services. Our tech stack is reasonably cutting edge, considering that we’re working with an enterprise-level product and environment: Node.js, Angular, Redux, AWS accompanied with high code quality and modern development standards.

Day One: The Descent

It all started when we started noticing weird errors in the Chrome DevTools and soon realized that this problem occurred on every error response from our API server.

POST /user/register net::ERR_SPDY_PROTOCOL_ERROR

We had recently switched to HTTPS on our development environment to more closely match the staging and future production environments and as we can see from the screenshot, AWS Application Load Balancer (ALB) automatically uses HTTP/2 when HTTPS is used. And as you might know, Google pioneered HTTP/2 with a custom protocol in Chrome called SPDY, which explains the protocol error we were seeing. That lead us to believe that the problem had to be in the ALB.

As we were trying to find more information on this issue we found out that people had had problems on AWS related to the MTU or Maximum Transmission Unit size. So we connected to the API instance to conduct some tests and noticed no problems with the MTU. While connected to the API instance we decided to test the API server locally.

ubuntu@aws:~$ curl -H “Content-Type: application/json” \
-d ‘{”email”:”asd@example.com”,”password”:”foo"}’ \
http://localhost:8888/user/register
curl: (18) transfer closed with 1 bytes remaining to read
{“message”:”Invalid password.”%

That ruled out any issues with the load balancer or HTTP/2. However we could not replicate the issue locally! So we thought there must be something wrong in the VPS environment, but it was time to log off and rest our eyes.

Day Two: The Despair

With fresh eyes and a mind free of clutter we began installing a new VPS and copied the built artefacts and copied the environment configuration and started the server. But the hopeful mood was soon swept away as we were greeted with the same message: “transfer closed with 1 bytes remaining to read”.

My mind starting to feel blank I realized I had not tried running the built version of the backend locally. You see we use Babel to enjoy the latest and greatest JavaScript features without worrying about runtime support. We only install Babel plugins that are needed by modern Node versions and let everything else run natively so that they will be automatically optimized once V8 developers optimize these new language features. For development we use a handy tool called babel-node. It works exactly like your node binary, except it automatically transpiles your code in-memory.

So we opened the command line in the build directory and ran the transpiled code via regular node and boom, we were able to replicate the issue locally. Progress at last! So the problem must be caused by Babel, right? That sounds very strange, especially since there should be no difference transpiling in-memory or to file. We aren’t even using many bleeding-edge features in the backend. So we started removing Babel plugins and seeing it the problem persisted. One by one we removed all the plugins and we still saw the dreaded message: “transfer closed with 1 bytes remaining to read”.

We sank into deep despair until we realized that I still had the production environment configuration in the build directory, which meant that I was running the API server with different configuration as the local development environment. So we copied the local environment, restarted the server and the problem was gone, just like that. Now all we had to do was to copy the production configuration, one variable at a time, until we found which option broke it. Finally after 61 variables we had found it:

SESSION_REDIS_URL=dev-session.xxxx.yyyy.euw1.cache.amazonaws.com

Locally we store sessions in-memory, since we are only running one process there is no need to have a shared session store, but as soon as we configured a session database using Redis we started to get broken error responses. But that was only half of the story…

Day Three: The Demystifying

We had pinpointed exactly how to reproduce the problem, it should be an easy fix. We just have to figure out why the session store breaks our responses. At first I noticed that express-session option called saveUninitialized caused the issue. So we started putting together a small Express server using the same configuration as our API server, but we couldn’t get the problem to appear reliably.

First we noticed that if the Redis host was invalid or a working Redis instance, the problem didn’t occur at all. Then we saw that if the host was valid, but did not host a Redis instance the HTTP request timed out and our problem appeared. That did not match what we saw in our app. In our case the error occurred straight way without any timeout and only on error responses while in the example the error only occurred as a result of a timeout and on all requests.

At that point I opened a bug ticket on the connect-redis module, asking if anyone had experienced similar behavior. Marc Harter (@wavded) responded saying that he was only able to reproduce the problem with a timeout or killing the server before the timeout.

That’s when I remembered that we had some custom error handling logic in a module we use internally. We setup a custom error handler for the Express server to return errors as JSON instead of the default HTML representation. We also filter out almost all error details in production for added security, for those we have logging in place. We added an error handler and boom, responses started breaking just like they did in our API server.

We noticed right away that the error handler looks a bit strange, why would we call next(err) after responding to the client? Removing that unnecessary call solved our problem, finally! We opened a new issue on the express-session project to try and figure out why this happens and if it could be detected so that a better error message could be displayed.

Douglas Wilson (@dougwilson) from the Express team immediately responded explaining that nothing can be done, because the next proceeds to the default Express error handler, which checks if the response has been started, and it has because of the res.send, immediately disconnecting the socket. This is because we might be streaming a long response to the client and we don’t want to leave the connection open wasting our resources when an error occurs.

The problem is that the Redis store is still in process of saving the session and the error handler disconnects the socket before the response can be ended. There we had it, the fault in our responses.

The Takeaway

The lesson of the story is that even the most mysterious bug can be found and fixed. What you need is determination (not wanting to delay the launch), caffeine and a method to pinpoint and replicate the bug. To summarize:

  1. Replicate the bug in a development environment with all your tools available.
  2. If you can’t replicate the bug compare the environment configurations first, before thinking about problems in the code or operating system / hardware.
  3. The bug is most likely in your own code and not in third party code, check your own code first.
  4. A minimal example to reproduce the bug will help tremendously when testing and is very nice when submitting an issue to an open source project.
P.S. We’re hiring.

Ville works as a Lead Developer at Grano, the most versatile content service provider in Finland and the leading graphic industry company in the Nordics. Currently he is working on Grano’s E-Commerce solutions.

--

--