Wi-Fi Assist: a $5 Million Mess
Apple Insider is reporting that a class action lawsuit has been filed against Apple in California, claiming damages from unexpected usage of metered data plans. The suit contends that the Wi-Fi Assist feature introduced in iOS 9 causes updated devices to improperly download content over an LTE connection when the user expects it to be using Wi-Fi.
I worked on the Mac OS Wi-Fi client user experience at Apple from 2007–2012, implementing a number of features to help users identify when a Wi-Fi connection was not working as expected, as well as the utilities built into the OS to help debug problems when they occurred. When I left Apple, just after my five year anniversary, I owned the Wi-Fi Utility and Network Utility apps, along with some assorted supporting components deep inside the OS. During my last few years I spent a lot of time working closely with AppleCare on customer Wi-Fi and networking issues: poring over user trouble reports, sitting down at call centers and listening in on calls, and generally doing everything I could to improve the user experience of Wi-Fi for Apple users.
I failed. It may have been possible to succeed, but the structure of the various teams working on Wi-Fi and networking at the time made it a seemingly insurmountable challenge. This current situation makes it clear to me that there are still forces inside of Apple which prevent any kind of real, comprehensive solution from being implemented. Balkanization, poor management and some uninformed decisions by executives contributed to the problem; and as I’m all to human, my own limitations and personal struggles played a large part. But it didn’t have to happen this way, and it doesn’t have to continue.
How bad is the problem?
Well, I’m just one user and I don’t have any good way of knowing just how widespread the issue is but here’s the cellular data usage of one app which I exclusively use at home when I’m connected to Wi-Fi (Video D/L Pro):
Note that these number reflect the most recent billing cycle [Edit: looks as if those number may reflect total download, not just over Cellular, which makes this list not just poorly sorted, but very misleading], and that I upgraded to an iPhone 6s on release day. Here I am, at home, connected to a Comcast Business Class cable connection, faithfully using my AirPort Extreme base station, because I enjoy the idea of a no-touch appliance for my internet access.
Now, I’m a heavy user of the T-Mobile LTE network, almost certainly a 1% customer. I have an unlimited plan with tethering so that I can be connected anywhere and don’t have to worry about data charges. I’m happy to pay for this service, and the overage is not going to cost me anything, but for millions of cellular data customers this could easily led to doubling or tripling of their monthly bill.
How did we get here?
Much of what transpired while I was still working at Apple can’t be discussed. I take confidentially seriously and most of it doesn’t really matter, but one particular directorial edict which I pushed back against at the end of my tenure sticks out as not just particularly telling, but deeply misguided:
“Make it self-healing”
Self healing in this context meaning that the networking system, Wi-Fi in particular, should try to correct problems that caused the network to fail, which, if you have spent any time trying to diagnose networking issues is a clear misunderstanding of the issues involved.
You see, the client is rarely the source of the problem in networking. For an Internet connection to work, a million miracles must happen each second. Literally hundreds of specialized computers, playing the roles of Access Point, Router, Bridge, Gateway, Server, & c. must be correctly configured, maintained and monitored. These machines often take several of these roles simultaneously, for thousands of users at the same time, providing DHCP, DNS, IP routing, caching, running dozens of different protocols and doing all of this in perfect concert with each other 24 hours a day, 365 days a year, without fail.
Armies of surly IT administers watch over all of this, with limited budgets, long hours, little sympathy and all while constantly interrupted by the demands of customers, managers and mouthy software developers on Twitter. The Internet is arguably the largest single machine humanity has ever created and, to my mind, it’s a miracle that it works at all. Asking the devices which connect to this vast complex network of networks to detect, and then transparently fix problems in the infrastructure without the permission of the administrators is, well, it’s absolutely the pinnacle of buzzword driven product management. Real pointy-haired boss territory.
But that’s exactly the request I received, and despite having a clear set of recommendations for things which would help and, I like to think, a very simple and clear argument for why it could not be done, that’s the request which eventually led me to put my badge on the table in front of the HR hit man assigned to put me on a “performance improvement plan”.
Wi-Fi Assist represents possibly the best that a client can do when told to “self heal”. Ditch the network connection that’s not working (Wi-Fi), switch to an alternate. LTE being a paid service with a managed access network, it tends to be more reliable than the $50 Wi-Fi box you got on Amazon.
How do we fix this!?
There’s a phenomenon in networking that I like to call the “network support laser”: when a user calls the vendor of their Wi-Fi access point, nearly the entire profit margin for that box is destroyed by the end of the call. Or, it would be if they paid to hire support reps who could actually fix the problem, and so the script that the minimum-wage, often globalized, support staff reads off the screen instructs them to have the user call the ISP. Similarly, the ISP support staff is instructed to defer as many calls as possible to the AP vendor. And the poor customer is bounced back and forth between the two until they explode. Not exactly a recipe for customer satisfaction.
This lack of coordination and visibility into the operation of the network from the two sides of the distribution network is just one example of where communications breaks down in the networking industry. A large and powerful player in the market (like Apple) could compel the distribution network providers (like Comcast and AT&T) to cooperate with their support organization to resolve this particular problem, but they would clearly rather compete with them on video delivery, making this sort of arrangement difficult to negotiate.
Similarly, low-cost, low-margin networking equipment typically deployed into homes and small business is less than perfectly reliable. If the local DNS cache fails the user might as well not have any internet service at all. One of the most common fixes implemented by network support is to configure the client with Google’s any-cast DNS servers, whose IP addresses you can see spray-painted on walls in places where the network infrastructure is attacked during times of unrest. [Edit: this image is from Turkey, the text reads “let your bird speak!” in reference to Twitter.]
When I proposed to my management that Apple should be running its own such servers, and that our devices should use them when the local or ISP DNS servers failed, I was told that the executives would never approve the expense of provisioning and running them. This was around the same time as the company had more money in the bank than the US Government.
Measure, measure again, then cut
At the end of the day, most problems with networking are solved with better communication between the people who build these systems. My work after Apple took me to a few IETF meetings and I saw first hand how the chain of sausages that is the Internet is made. It’s not always pretty, but the process generally works and does so on the basis of open discussion and sharing of ideas.
But before we can arrive at solutions, we have to understand not only how often there is a problem but how bad it is and how strongly end users react to it. As social animals we restrict communications as a form of punishment: Boyfriend or girlfriend being a jerk? Give them the cold shoulder. Kids acting up? Put them in timeout. Someone leaks important government secrets? Solitary confinement.
Wi-Fi Assist, then, is an attempt to prevent end users from feeling punished by the failure of the network. But at a high cost to the end user. It salves the pain of being disconnected but puts the user at the mercy of their cellular providers who know all to well how much they are willing to pay to forgo that particular kind of suffering.
The real solution is for the groups at Apple who design, implement and support networking to communicate better with themselves and outside groups, and for management to step out of their very comfortable no-limit data plan mindsets and consider the costs, both emotional and financial, of the decisions they make to their end users.
Every device that Apple makes relies on the Internet to provide an excellent user experience. If the executives don’t take networking connectivity seriously at every level, from the hardware to the network services, they will be severely limited in what they can deliver to their customers and all the hard work that goes into their hardware and software will be meaningless. An iPhone, Mac, Watch or TV is worthless if it doesn’t deliver the best connectivity, performance and reliability in the industry, and the massive investments in the network services powering iCloud are wasted if the clients can’t communicate with them.
Apple is one of the only companies in the industry which has control over the entire value chain of Internet services, Wi-Fi Access Points and Wi-Fi Client devices which is required to provide end-to-end visibility into the network along with the influence and resources to make it happen. All they lack is the apparent will to do so.