Understanding business dynamics through WiFi packets sniffing

Daniele Mazzei
Oct 29 · 6 min read

Recently, several indoor localization solutions based on WiFi, Bluetooth, and RFID have been proposed. Most of these, however, require the active involvement of the subjects, through an active connection to WiFi network or Bluetooth devices.

I prototyped a framework for indoor people tracking with one of my students of the “Design of Interactive Systems” course at the University of Pisa. The framework is aimed at analyzing the social behavior of people in business buildings and shared workspaces.

The goal of the project was to develop a non-invasive system that can give shared space community managers data and insights that are useful for the management of their communities.

So, we built a lightweight WiFi sniffer using a cheap microcontroller development board based on the ESP32 chip, that has been programmed in Zerynth.

For 11 days 10 devices have been placed in a business center in Pisa, to understand the behavior of the 7 companies, distributed over the 2 floors of the building.

The business space map. Company spaces are highlighted with different colors, WiFi sniffers have been placed where indicated by the blue dots.

Capturing people movements from WiFi data

The use of Wi-Fi to estimate the presence and location of the subject is widely used nowadays. WiFi devices (mobile phones, computers, etc.) continuously send broadcast messages to discover available WiFi networks. WiFi Access Points (AP) receive these messages that contain a device identifier (the MAC address) and from the analysis of the radio properties are able to calculate the RSSI (Received Signal Strength Indication). This procedure is called active scanning and it is well explained in this article, together with its legal/privacy-related consequences.

In theory, knowing the unique identifier (MAC address) and the signal strength (RSSI) gives you the possibility to track the position of packet senders (the mobile device asking for available networks), by using trilateration techniques. In practice, in real scenarios, this is impossible for the following reason:

a) WiFi device vendors introduced MAC address randomization algorithms in order to avoid active scanning;

b) signal strength isn’t an absolute value. Each mobile phone has different radio power and antenna shapes. So if two different devices are placed at the same distance from an access point they can send packets that are received from the AP with different RSSI. Moreover, building structures like walls, doors and furniture influence the signal propagation thus increasing the RSS.

So, we are unable to understand who is asking for the available network and where it is :(

Let’s think about this some more

In 11 days the 10 probes collected messages from 121396 MAC addresses. This number is clearly super affected by the devices MAC randomization feature and is not representative of the real number of devices\people available in the business center.

Step 1: All MAC addresses that were detected at least once during nighttime (between 12 A.M and 5 A.M.) were deleted from the datasets. This step is necessary to remove MAC of other sniffers, WiFi AP and fixed always-on devices such as desktop computers.

Step 2: This step aims at splitting the dataset into two sub-dataset: Workers and Visitors. All MAC addresses appearing on the dataset for more than 5 days on a total of 11 days of the acquisition were added to the Workers dataset while others were added to the Visitors dataset if detected for at least 30 minutes.

Step 3: This step aims at removing random generated MACs. MAC Vendors API is a web tool based on a MAC address DB that allows inferring the vendor of a WiFI chip taking as input the MAC address. Using the MAC Vendors API we cleaned the two sub dataset removing all the MAC address not associated with a vendor assuming that they were random generated.

Step 4: All the MAC addresses that presented a maximum RSSI lower than -90dB (very bad signal) were discarded because we assumed that with such a low signal that device never entered in one of the monitored rooms.

After these cleaning steps, the initial 121396 detected unique MAC addresses were reduced to 52 addresses for the worker's dataset and 178 for the visitor's dataset.

Workers dataset: The room (sniffer) in which every MAC address had the maximum RSSI on average over the 11 days was found. This feature has been considered as an indicator of where the device passed most of the time and so has been used for assigning MACs to companies. MAC addresses with a maximum RSSI value in the common space were discarded.

Data coming from the same MAC address (for any detector) were grouped every five seconds and the detected RSSI was averaged antenna by antenna. In this window, every appearing MAC address was considered to be located in the room in which the antenna that detected the highest RSSI was placed.

Does it work?

This method is clearly affected by errors due to the MAC addresses randomization method. The cleaning of the dataset is based on very basic heuristics and it is probably not optimal.

The 52 worker’s MAC addresses were associated with companies (B, F, N, R, Z) inferring the number of employees: B: 4; F: 4; N: 12; R: 18; Z: 14

The number of employees located in their company rooms during one working day has been also calculated.

Number of “workers” devices located in their company room over a day

The 178 visitor’s MAC addresses were assigned in the same way to the companies calculating the maximum number of visitors a each company had:

B: 4; F: 1; N: 11; R: 69; Z: 61 and 32 assigned to the common space.

The number of people visiting each company during one working day has been also extracted.

Number of “visitors” devices associated with a company room over a day

Movements of workers between companies have been also extracted inferring how people are moving within companies' spaces, thus extracting an indicator of business collaborations active in the business center.

Presence of workers associated with a company to other company’s rooms. Each line represents an employee associated with the company indicated as title of the graph. On the Y axis company’s rooms are represented.

The same data plotted as network graphs show a clearer picture of the business dynamics active in the business center - highlighting which companies are more involved in collaborations.

Network graph generated from the worker dataset. It is clear how companies aggregated in the business center established different collaborations between each other and it is also clear which are the most collaborative companies.

The network graph allows to immediately understand what are the most collaborative companies, and which collaboration clusters are born in the business center.

This technique is clearly affected by errors and isn’t suitable for precise people tracking or counting for security purposes. However, on big numbers these errors are mitigated and normalized over the entire dataset, thus allowing the extraction of business social dynamics.

This is an example of how pervasive, low-power, resource constrained and low-cost technologies can be used for the optimization of strategic processes. The point isn’t the technology it is how we use it.

We should start thinking of the problems we have to solve and not of the technology we have to improve.

This work started with a project of Andrea Cometa, a student of the Bionics Engineering MSc at the University of Pisa. The WiFi sniffer firmware and MAC Addresses de-randomization algorithm has been then extended and validated by Dimitris Kokkonis during his internship at Zerynth. Zerynth provided the technological infrastructure necessary for the project and the entire software stack (from firmware to data acquisition and cloud server)

Human Centred Technology

Humanizing technology for avoiding de-humanizing humans

Daniele Mazzei

Written by

Assistant Professor at University of Pisa. Chief Innovation Officer at Zerynth

Human Centred Technology

Humanizing technology for avoiding de-humanizing humans

Welcome to a place where words matter. On Medium, smart voices and original ideas take center stage - with no ads in sight. Watch
Follow all the topics you care about, and we’ll deliver the best stories for you to your homepage and inbox. Explore
Get unlimited access to the best stories on Medium — and support writers while you’re at it. Just $5/month. Upgrade