Getting burnt with HttpClient

System.Net.Http offers pretty neat building blocks to create kick-ass HTTP requesting pipelines. The Designing Evolvable Web APIs with ASP.NET gives a great overview what can be achieved beyond sending simple GET requests.

This post is more about how to shoot yourself in the foot. At home. Not in production:

  • Spicing up downloading huge files by using HttpClient’s default timeout.
  • The uninvited guest while downloading huge files: OutOfMemoryException.
  • Is HttpCompletionOption.ResponseHeadersRead your friend or your enemy?

It’s story telling time!

Please go ahead, release it to production. Source: http://i2.kym-cdn.com/entries/icons/original/000/000/043/disaster-girl.jpg

Spicing up downloading huge files by using HttpClient’s default timeout.

Let’s download huge files. I’m sorry, but I hope you have a slow connection. What could possibly go wrong this code snippet?

We’re downloading loads of bytes and based on your bandwidth, the requests may or may not throw a TaskCancelledException before it has a chance to run to completion.

LEARNING: HttpClient has a default timeout set to 100 seconds. So you better have a good connection or download small files.

Let’s fix this issue once and for all with Timeout.InifiniteTimeSpan.

Do you want to hear the bad or the good news first?

The good news: Introducing infinite timeout, low bandwidth scenarios will cause less headache.

The bad news: I didn’t inivite OutOfMemoryException to the party. Oh wait. You just did.

Take a look at the previous code snippet. All these bytes retrieved from the internet will be held in the memory, so OutOfMemoryExceptions might occur. Of course, not on your high end developer machine, testing only with small videos, but in your creative, let’s push to the limits kind of end users’ machines.

Streaming comes to the rescue

Instead of reading the whole remote response stream to the memory, let’s read small chunks and do it right.

If you executed the code snippet above, you would see that the “bytes.bin” file is not created until the whole response’s content is downloaded. You can go to the File Explorer and see it for yourself. We can come to the following conclusion: “await client.GetAsync(string url)” block until the response’s content is fully downloaded/read to the memory. We wrote a bunch of code and we’re at square one.

The mighty HttpCompletionOption parameter

With HttpCompletionOption you can control the timing the request should be considered completed. By default a GetAsync, …, SendAsync methods are blocking up until the entire response message is read (headers + content), which we want to avoid in some cases.

If you want to stream the response’s content and not loading it entirely into memory, HttpCompletionOption.ResponseHeadersRead is your new best friend. Let’s modify the previous example. You go to file explorer and see the “bytes.bin” keeps growing and growing.

Now we’re safe from OutOfMemoryExceptions, however we’re subjected to programming errors.

Is HttpCompletionOption.ResponseHeadersRead your friend or your enemy?

It’s your friend, but you need to be careful. Using HttpCompletionOption.ResponseHeadersRead can go really wrong in case of using infinite timeout and not disposing the response objects. You can actually create a sort of deadlock situation. Let’s see how:

I simulated the scenario when some media files are not available on my CDN. I set up my application to only allow 2 concurrent requests at a time (btw: I wasn’t aware of this option for a while). As long as two requests are running at a same time, the next requests are just blocking. To make things worse I didn’t dispose my response object. On top of all these, I use infinite timeout for the HTTP operation. The whip cream is the HttpCompletionOption.ResponseHeadersRead on this tasty cake.

The third time the code’s hitting the httpClient.GetAsync, it’s blocking forever, as none of the previous requests ever ran to completion or got disposed. The default timeout didn’t save our sorry asses, because we override it with infinite timeout.

This version looks better:

Learnings:

  • Defaults just like the timeout in this case are there for a reason. Try to find the reason behind them.
  • Timeouts are also important and make sure you know what you do when you set it to infinite. Even if you do, have some seconds thoughts about it and remove the infinite timeout. What’s better? Clogging your whole HTTP requesting pipeline time to time, because of some files downloading with 0 KB/s or keeping your pipeline responsive and accepting the fact that some files might not be downloaded or only after a few retry?
  • ServicePointManager.DefaultConnectionLimit is something that I’ve only found by doing my homework, which reminded me that some APIs still have a learning curve, which you may or may not skip, but there’s a chance that your customer will pay the price.
  • Know the context. Just take my file downloading example. It’s essential to know how large are they. Are you saving them on disk. What’s your CDN’s performance and so on … With this information in mind, you’re chances are higher for getting the calibration of the timeouts and the default connection limit right.
  • Experiment with the given frameworks and APIs to find their strengths and weaknesses in time.
  • Lookout for IDisposable return types (there are exceptions to it of course: https://stackoverflow.com/questions/15705092/do-httpclient-and-httpclienthandler-have-to-be-disposed)