A SPDYer web page…

bkchung
10 min readFeb 20, 2015

--

Experimenting with SPDY using mod_spdy on Apache httpd on Windows

Now that the HTTP2 specification has been approved, I wanted to take some time to look back and journal the progress of an (exciting) side experiment that I’ve been spending time on: “Will simply switching SPDY on show better page load time(PLT) performance?” I’ve read too many times that SPDY was the starting point of HTTP2 and Google has been labeling it SPDY/4, so basically, I’ve been looking at it as just preparing for the new wave(HTTP2).

I had high hopes because of all the features that are supposed to be going into the new protocol. Less RTT, header compression, server push, multiplexing, prioritization and flow control, and more. And at the same time “with” doubt since I’ve read people talking about an increase of 25% to 50% for PLT which is too good to be true. There’s been a lot of tricks people came up with throughout the history of HTTP/1(sharding, bundling/sprinting, cdns, inlining, proxying/caching and so on) by exhausting the local resources as much as possible to avoid the head of line blocking and make the page faster(with the cost of more resources), so by multiplexing or compressing a bit more, would it really show that much improvements?

I wasn’t eyeing for a general experiment, rather a specific one constraining it to the product that I’m working on at work. (Un)fortunately, our product is using the Windows version of Apache httpd as the frontend and also (un)fortunately, there was a SPDY plugin called mod_spdy “from Google” that (I thought) supported it.

mod_spdy on Windows

The gist of this section is that the mod_spdy project is currently a zombie project. Google announced that they have donated the code to ASF and that’s about it. I have no access to what the Apache httpd devs are doing behind the scenes, but from what I can see from the mailing lists or bug trackers, there’s zero activity on the code which don’t even build properly and is only supporting httpd 2.2.

Apache httpd seems to have it’s limitations of having one connection bound to one request to one thread which doesn’t fit he bill of multiplexing among other HTTP2 features. You can listen to Matthew Steels talking about working around it by creating a fake connection managing a separate thread pool apart from the mpm here.

The sad part for me was that it doesn’t support Windows, to be precise, mod_spdy doesn’t build on Windows as is. Search easily yields a link to someone’s good effort on updated code to work with httpd 2.4, I used this as a starting point. And used Visual Studio rather than trying cygwin/mingw, although the chromium dependency somewhat depends on cygwin by itself(which I heard causes more headaches).

I guess I should have just falled back to proxying using another product or using another solution such as nginx — which has a working SPDY implementation on Windows — just for the sake of the experiment since there’s plenty around. But since I’ve already spent a bunch of time on this, I just persisted eventually wasting a lot of time. Digging through a myriad of documents(hating gyp) and comments and former experiences(such as standard! library differences among platforms), and also manual tweaking of what the build scripts should have been doing, fixing hundreds of errors and cherry picking warnings that are critical.

Yet again, the next hurdle, Chrome40 drops support for previous versions of SPDY and only support the newest version SPDY/3.1. The code above didn’t have all the changes required for 3.1, so merging them was another manual task. (And who doesn’t want to try out changes to controlling window sizes. ☺)

Here’s the built plugin with some dependencies, no guarantees on anything so only use it for experimentation purposes if you dare☺. It’s a Win32 version based on Apache 2.4.10 with openssl 1.0.1i and I think you can either build yourself or find a distro with the matching versions. I don’t think anyone would be much interested on details of what was done to get this built since SPDY/3.1 is already something from the past now with HTTP/2 up the horizon, so I’ll skip elaborating.

SPDY over TLS

Back on to the actual experiment. The first attempt was to use SPDY over HTTP/1(I’ll use HTTP/1 instead of plain HTTP) to avoid setting up SSL. You have to run chrome with “—use-spdy=no-ssl” and turn on “SpdyDebugUseSpdyForNonSslConnections“ from the httpd configuration. This not only didn’t work as intended, I had to switch back and forth restarting the server changing the configuration which wasn’t so fun. Unlike using TLS’s NPN that switches the connection to use SPDY, SPDY over HTTP/1 is just using it by default, breaking existing http connections. I believe HTTP/2 had a lot of debate over whether it should support this, and the final decision seems to be to do so(but no http2:// of course). But switching to SPDY from HTTP/1 costs extra RTTs because there’s no way to switch in place unlike TLS choosing SPDY during the handshake.

Still, I needed to go back and forth between HTTP/1 and SPDY for my experiment, and my next attempt was to configure SSL(or rather TLS) and take the expected initial RTT(Rount Trip Time) hit(cert and cipher with the hello). There are multiple ways to optimize the RTT back to near 1 even with TLS, but decided to just assume the hit from the numbers and I’m not aiming to test throughput/load. It’s also a local test(most likely 1~2 hops) so the cost of an RTT is relatively tiny. Real world scenarios, this would be a big deal.

There are browser plugins that detect whether if a page is using SPDY, search for “spdy indicator”. The indicator shows that things are now working as expected, at least functionally. Now I think I’m good to go.

Measure!

Ok, the next fun thing is to choose a tool that can measure the PLT impact of SPDY. It looks like webpagetest is what everyone uses these days, and we have a local instance(that only had the latest chrome). And also there’s been some work done to support SPDY.

Performance Results from initial attempt

Guess which one’s the result with SPDY? I wouldn’t ask if the first one was HTTP/1, the result from the top which shows slower is the page using SPDY(over HTTPS). 707ms to 666ms. The example above might be too small to prove anything but repeating it hundreds of times with other pages(similar pages from our product) showed the same results.

There’s no way one can conclude if a protocol is not faster with some anecdotes that might not be the best example to show improvements of a new protocol, but even with that aside, there was still something fishy here. Well, of course there’s the obvious extra RTT for the HTTPS handshakes — and wpt is set to ignore cert errors although the flag(—ignore-certificate-errors) it’s using shows a warning but still works. But even excluding that cost which shows around 20ms from the waterfall, it was still slow.

Disappointingly, the page that I was trying to test didn’t show PLT improvement. There could be various reasons, from bugs somewhere including the server code to the configurations.

Cipher

Looking at the network capture with wireshark, I noticed that the cipherspec was using the default(the document notes that openssl’s default is used). I left the httpd configuration related to this to use the default so it’s been using a relatively heavy encryption.

Example of a default Cipher suite shown by wireshark

Encryption is much cheaper than what is was way back when but still it surely uses extra resources. And the purpose of this experiment is not security nor a real world scenario. I’ve also read somewhere that a null cipher would optimize the performance if the page doesn’t need the extra security.

The client hello didn’t include any null ciphers I would have had to determine if it’s because null is not included because it’s supported by default or if it’s not listed because it’s not supported. But even before that, here’s what the mod_ssl document says:

aNULL, eNULL and EXP ciphers are always disabled

Beginning with version 2.4.7, null and export-grade ciphers are always disabled, as mod_ssl unconditionally prepends any supplied cipher suite string with !aNULL:!eNULL:!EXP: at initialization.

The exclamation(“!”) means it cannot be added back. I had to find the fastest one that was actually supported and this is what I chose:

SSLCipherSuite RC4-MD5

SPDY Configuration

To see what was happening, I turned to the concurrency visualizer for visual studio 2013 to visualize the execution which shows better context/abstraction of what’s happening than just plain xperf with WPA(as in visualizing dtrace outputs, if you’re from a different world).

I added some markers to the mod_spdy code and here’s a results from a single js file request that’s 2MB:

mod_spdy serving a SPDY request(instrumented with some custom markers)
httpd serving a non-SPDY request

Comparing the two, some expected results visually verifiable:

  1. A bunch of apr_poll for each 109 SendFrame’s send for the SPDY connection which shows the worker thread chopped into pieces, where one apr_poll after zipping(mod_deflate).
  2. Each SendFrame goes through mod_ssl and probably encrypts individually.
  3. I didn’t visually verify zlib being called while the data frames were being sent, possibly because it could have been faster than the sample rate, or content is not being compressed. But the samples collected from zlib(coming from mod_deflate) was similar.

For #1, it’s hard to say whether if this could be a benefit for a single request over a single connection. There’s no need for multiplexing to happen so it might be spreading out unnecessary function calls. The time spent on execution and synchronization shows this also where SPDY spends more time on the execution compared to plain HTTP/1 on I/O — 13%+15% to 9%. Leaky abstraction from a higher layer and it might be nice if this could be more intelligent, but not something that can be controlled for now.

#2 is mitigated by using a weaker cipher described in the previous section. For #3, couldn’t find anything obvious and didn’t bother to trace into it. Might need to dig into how httpd works more. If zlib were to be called for each SendFrame, how would that compare to compressing the whole file versus doing so, would be a nice experiment but nothing to do here for now. Probably I’ll turn off mod_deflate in the next iteration.

It’s likely that a single request for a single connection in the real world is relatively rare. But I’ll skip pasting the screenshot of the multiplexed case. To mitigate the findings from this case, some tweaks that mod_spdy exposes in the configuration were changed: SpdyMaxThreadsPerProcess and SpdyMaxStreamsPerConnection.

Also some additional changes: lowered the LogLevel(which would affect both HTTP/1 and SPDY case) and took out the instrumentation code(markers used for concurrency viewer).

Measure Again!

Again, this experiment wouldn’t prove much other than the obvious since it’s a synthetic test that doesn’t do much to simulate congestions(head-of-line blocking) and retransmissions(or mimicking the rate of it) and so on. But just to see if SPDY would even help this artificial experiment…

There we have it. Ran 9 iterations from webpagetest and well, SPDY did show improvements! Even with the SSL penalty of extra RTTs, it shows faster. And here’s the overlapped waterfall that webpagetest provides:

Most of all the metrics were showing improvement including the speed index, document complete, and of course the PLT.

Well, that was a fun ride, hope that kind of helps, keeping you in peace(a slight bit) when thinking of adding HTTP/2 support to your web sites. The HTTP/2 — SPDY in this context — specification is one thing, implementing it correctly and also using it with that in context is going be a long road, after all, HTTP/1 is what, 20 years old?

What’s Next

A list of things that I would want to continue looking at:

  • Did compression work? I’ll just add the server key to wireshark and see if it shows the details.
  • I haven’t noted this but the repeat view(PL2) was slower because somehow caching didn’t work correctly. HTTP/1 was caching so much faster. Need to figure out what’s going on there.
  • Try nginx instead of apache httpd. Haven’t looked at the implementation of nginx’s SPDY but if it works without all the build headaches out of the box, then it might be a better solution for now.
  • How the browser prioritizes and requests. Plus server push and hinting. Also with actual optimizations on the page itself.
  • And HTTP/2 instead of SPDY. Need to find a good product that would worthwhile spending time on.
  • mod_pagespeed. I’m trying to build this also, but this is much worse than mod_spdy. Worse, as in, code being farther away from the Windows land.
  • Google.com is defaulting to QUIC on my machine. Would love to look at that too.

--

--

bkchung

Principal MTS at Tableau Software, Data Modeling & Calculations