How a bug in Skype prevented our users from using GameVox

A story on how we tracked it down… and squashed it.

verrier
5 min readJun 1, 2014

Update: The amazing Skype team released a new version which fixes the root cause of this bug and removes any excess certificates. Hooray!

As a developer at GameVox we noticed an uptick in the number of support tickets created by our users that read “I was able to connect to GameVox yesterday, but today I can’t!”.

Day after day these support tickets would pile up, more and more users unable to connect. By sheer luck we had two separate users put in a ticket saying that another computer directly next to them could connect to GameVox just fine.

When we heard that, we knew we had to set aside all the projects we are working on to figure this out.

There had to be something on the computer that was preventing GameVox from connecting.

The Root Cause

Inside of the “Trusted Root Certification Authorities” there were tens of thousands of Skype “Click to Call” certificates.

Note that this user has 29,926 certificates in his Trusted Root Certification Authorities… about 29,884 of them were the same “localhost” / “localhost” certificate. A normal computer has around 40 total certificates.

Unfortunately, this screen does not show you much detail about the certificate. However; if you export one of the certificates you can see that they belong to Skype’s “Click to Call” functionality that is optionally installed along with Skype.

Under some unknown conditions Skype’s Click to Call is adding thousands and thousands of certificates to the Trusted Root Certification Authorities in Windows.

Why did this break GameVox? Along with many other cross platform applications, GameVox uses the Qt framework. Qt’s default settings are to import all of the Trusted Root Certification Authorities which was causing our application to hang longer than the TCP timeouts while it loaded all the certificates.

How we figured out what was happening…

First and foremost we made sure our users disabled and/or uninstalled the usual culprits including firewalls and security software on their computer. Nothing. They still could not connect.

Okay… we decided to bust out the big guns. We had these users run the SysInternals tool Process Monitor and send us a dump of what every process on their computer was doing while they attempted to connect. Surely this would expose whatever was interfering with network traffic. Again… nothing.

What.

We needed even bigger guns. We had some of our users install WireShark and send us complete packet captures of the network traffic while they attempted to connect.

This is what your typical pcap looked like (with IP addresses removed to protect the innocent)

If you have any experience looking at pcaps, you’ll notice the time on the left. 73 seconds pass between the TCP connection being established… and the Client Hello being sent that signals the start of the SSL handshake.

Also note that at the 48 time mark our server says “Nope, this is taking too long!” and closes the TCP connection (FIN).

73 seconds to do a SSL handshake? What?!

Why is the Client Hello delayed in the SSL Handshake? That client is supposed to send it immediately after the TCP connection. It is even in the spec.

Could something be messing with SSL handshakes on this computer? Some security software gone rogue? Or perhaps things are being queued up before they reach the NIC card.

We do self sign our certs.. perhaps there is something… no, that does not make any sense because we have not even started the SSL handshake.
No certs have been exchanged.

Luckily, we had a very generous user who was willing to remote desktop with us in attempt to debug his computer. Over the course of 3 days we checked everything from SSL certificate order, eventvwr for errors involving schannel, QoS on the network adapter, MTU size… nothing we tried was working.

Finally we found and decided to use an amazing application called API Monitor which hooks into applications (and DLLs) allowing you to see the Windows API calls that are being made.

Using this application, we added QtNetwork4.dll, libeay32.dll, ssleay.dll into the external DLL monitoring pane and launched GameVox natively… selecting the “Monitor” option when prompted by API Monitor.

On this user’s system, our GameVox application made the exact same API calls working system did up until the point where QtNetwork4.dll calls X509_STORE_add_cert.

The X509_STORE_add_cert call was repeated tens of thousands of times while the client was attempting to connect to the GameVox network (after TCP connection, before SSL handshake).

We found our problem.

Removing the certificates by launching the Microsoft Management Console (mmc), adding the Certificate Snap-In, browsing to Trusted Root Certification Authorities and deleting the extraneous localhost certs resolved the issue.

That’s not really a solution

You’re right. We can’t have every user who is affected simply delete these certificates that are buried in the Microsoft Management Console.

We dealt with this inside of GameVox by generating a list of all the certificates that were NOT issued by Skype and passing this filtered list into into QSslConfiguration::setDefaultConfiguration().

We tried simply uninstalling Skype, which unfortunately did not remove those certificates. They must be deleted by hand.

Closing

The reason we would eventually see the Client Hello sent was because it would eventually iterate through all ~30,000 CA certificates and upon reaching the end, carry on sending the client hello as expected.

Unfortunately by this time the client would either send a FIN becuase it’s timeout was exceeded or the server would close the connection due to SSL handshake never completing.

Hopefully Skype can identify why this occurs on some machines and fix the issue since it will affect any Qt application that does not specifically work around this.

TL; DR

Skype’s “Click to Call” inserted tens of thousands of certs into windows Trusted CA store causing our Qt applications hang for minutes after establishing a TCP connection.

Removing the certs fixed it. Yay!

--

--