The dark corners of the Multipeer Connectivity Framework

Introduction

During my work on the ThaliProject I had an outstanding chance to explore the strength and and at the same time weakness of Apple’s multipeer connectivity framework. Thali is a cross platform Cordova plugin for peer to peer communication. My job was to enable this framework to run peer to peer on iOS. Unfortunately, Apple’s documentation left something to be desired. During my time on this project I had to dig down into the guts of the MPCF to see how it behaves in some undocumented scenarios. All cases listed below are correct as of for iOS9, although perhaps Apple have changed something in iOS10.


The Multipeer connectivity framework was introduced by Apple at WWDC 2013. It enables communication between multiple iOS devices in the worst case without any available network infrastructure around, .i.e. no wi-fi access points or cell towers. The MPCF enables a lot of new application features including: multiplayer games, chat, files exchange, etc. without requiring a massive backend infrastructure

Discovery and Session phases

Data exchange in MPCF consists of 2 parts: the Discovery phase where devices find each other and Session phase when devices establish connections between each other. The basic flows are explained clearly in the documentation.

MCPeerID

In working with the MPCF I thought that dealing with MCPeerIDs should be easy. Look at the raw MCPeerID object, its interface just contains a displayName: String property:

open class MCPeerID : NSObject, NSCopying, NSSecureCoding {

public init(displayName myDisplayName: String)

open var displayName: String { get }
}

So, I thought that we can work with the MCPeerID object like a value object. So if I need to connect to a discovered peer with the displayName “I am value-type peer id” I can create a MCPeerID object at any time with this displayName and it will connect to the peer. Unfortunately for me my hypothesis was wrong — the MPCF won’t connect to the new MCPeerID object even if it has the correct displayName as a previously discovered MCPeerID object. Of course, I was naive to assume this was possible. Moreover, MCPeerID conforms to NSSecureCoding protocol which says "if you want to store theMCPeerID and connect to it later then use me!". But any way it worth trying.

Apple: 1, Thali: 0

Managing multiple MCSessions

In the simplest scenario one device is always an Advertiser and another is always a Browser. But in the Thali context we have to enable devices to be both an Advertiser and Browser at the same time. The reason is that in Thali it is quite common for one device to want to send something to to another device (advertise) and simultaneously listen for changes from other devices (browse). In the WWDC video about MPCF, Apple strongly recommends using only one MCPeerID for all your multipeer connectivity needs. This means if you started advertising with one instance of MCPeerID you should use the same object for future both advertising and browsing. Relying on this information Thali used the same instance of MCPeerID for MCNearbyServiceAdvertiser and for MCNearbyServiceBrowser. But this approach caused a serious problem:

  • We ran at the same time both advertising and browsing actions for 2 iOS devices — A and B
  • Everything looks fine — MCNearbyServiceBrowserDelegate on each phone discovers other device — A found B and B found A
  • Then these 2 devices try to invite each other to connect. The related MCSession objects move to the .Connecting state.
  • But after a second all the MCSession objects on both sides move to the .NotConnected state.

At first we thought that MPCF could not support handling multiple MCSession objects. But there is no clear statement in documentation saying "Yo, we don't allow you to have multiple sessions!" In fact, the MPCF API doesn’t seem to have any restrictions on supporting multiple simultaneous MCSession objects. So it looked like it was time to dig deeper.

Particularly in this case the answer to this problem was kind of obvious — move away from our “one MCPeerID for everyone" policy and instead start generating brand new MCPeerID objects for every Browser and Advertiser object. These changes solved our issue and now we can easily manage as many MCSessions as we want. Of course, we still have different limits on how many connections each device has depending on what network infrastructure (WiFi, Apple's WiFi-Direct, Bluetooth) we are using to make these connections. In our case, we only needed 2 active MCSession objects between any two peers.

Apple: 1, Thali: 1

Zombie Advertisers

The next chapter of our story with the MPCF started when we ran automated tests on real devices to make sure that our native code worked as expected. We have several discovery tests — each one executes calls to start advertising and browsing in setup and to stop advertising and browsing in teardown. Everything looked clean. But our tests were failing randomly. Well…

After adding some logging we figured out the problem — the Browser was finding Advertisers from the previous tests, pick this zombie Advertiser, trying to connect and immediately failing. But wait! We’re stopping advertising in teardown. What is happening?

There are 2 possible ways to think about our Zombies:

  • stopAdvertising is called asynchronously and the Advertiser had time to send several announcements before stopping
  • announcements from the Advertiser are stored on the Browser’s side in a queue, but this queue isn’t cleared out on stopListening call.

We figured out that the Zombies were on the Browser’s side. To understand why see the following scenario:

  1. Create and start N Advertisers: A1, A2, …, AN;
  2. Create and start Browser B1;
  3. B1 received foundPeer events for A1, A2, ..., AN;
  4. Call stopAdvertising for A1, A2, ..., AN;
  5. Call stopBrowsing for B1 and asynchronously release B1;
  6. Before actually being destroyed B1 has time to receive lostPeer for M Advertisers N, where M<=N;
  7. Create and start Browser B2;
  8. B2 receives N-M foundPeer announcements from Advertisers from the first step and right after that B2 receives another N-M announcements with lostPeer.

This example shows that the announcements are stored on the Browsers side. Advertisers aren’t producing zombies. Moreover, as you may see from steps 2 and 7, we’re working with different instances of MCNearbyServiceBrowser. It means that these announcements are stored in some kind of private queue and we have no chance to avoid these zombies. We just have to accept them as they are and try to live with them.

Apple: 2, Thali:1 (Unfair play)

It is worth mentioning that these zombie Advertisers aren’t a problem in real life but only a problem for our tests. In real life you have to be prepared for the fact that when you find someone that you want to connect to they can unexpectedly disappear for any number of reasons (the device was switched to airplane mode, exploded, etc.).

MPCF under pressure

We also wrote test project to see how the MPCF behaves when it is maintaining active sessions and in parallel some peer wants to create a new session or join an existing one. All the code is available on github. In README file you can see detailed description for each test. These tests uncover some interesting facts. Depending on what network we use we have limited channel bandwidth (bigger for WiFi and smaller for Bluetooth). This channel is used to transfer data between peers. And if this channel isn’t full we can create new MCSessions or join existing ones. But if we have filled the channel with data then our attempts to create new sessions or join existing ones will fail randomly.


I hope this article will help the reader to understand and forgive the multipeer connectivity framework when it doesn’t treat you well. And, of course, Apple won. Apple always wins.