It is often the case that a simple customer-facing UI element hides complex functionality on the backend side. You have probably experienced features like this in your projects as well. This is exactly the story behind our Device Activity Reporting feature for Jamf Connect. Jamf Connect gives our customers the opportunity to connect their users securely to enterprise resources such as SaaS apps or internal applications running on private networks. Jamf Connect achieves this by establishing a Software Defined Perimeter which connects customer devices to the required resources. This SDP is based on Wireguard, which is a lightweight, fast, and extremely secure VPN protocol. As you can imagine, a customer using Jamf Connect finds it important to be able to check the connectivity status of individual devices. This can be useful during enrolment or when investigating potential issues.
When is the last time you saw this device?
“Show me a single timestamp of the last activity” — this short sentence is enough to summarize the purpose of this feature. A customer admin looking into our Jamf Security Cloud portal will see a number showing the last activity of the device (data being passed through our Jamf Connect DNS gateways), displayed next to each device in their device list.
But as many of you have already guessed from the first paragraph, the UI was just the tip of the iceberg from the implementation point of view. Let’s have a look at what lies beneath the surface of this slick UI component.
How to define an “active device”?
We use different methods to define an active device for our various products, e.g. data being passed through our proxy or device checking in with the customer’s Unified Endpoint Management system (UEM). The output is usually the same — the timestamp of the last activity. With a DNS gateway, the most obvious approach is to always update the timestamp of last the DNS request being seen in our DNS gateway. DNS requests are continuously generated from every device connecting to our cloud gateways. If you update the last activity status for each device after every single DNS request and multiply this by the number of devices connecting to our data centers, then you get a recipe for overloading the service by simple reporting-related logic.
Obviously, the customer admin looking at the last reported activity of a device isn’t really interested in activity reporting on a millisecond scale. Usually, seconds or minutes are just fine. This led us to introduce a sampling mechanism that provided the necessary granularity.
Our sampling mechanism in the DNS gateways accumulates the newest DNS request samples for each device inside an in-memory buffer, which is upper-bounded on its size. The device activity buffer is periodically flushed into the Apache Kafka topic.
Kafka and batching
The Kafka topic is consumed by the Device Activity consumer, which takes care of device activity deduplication and proper timestamp propagation to the rest of our backend services and our management console.
Even with data being sampled, the amount of activity updates would be unnecessarily high if the data is sent to the collector separately. The management overhead of sending a Kafka message for each data sample would be enormous. This is the reason why the samples are collected, held in memory, and flushed once per defined period of time.
As you can see in the image above, we have configured the flush interval so that we always send a reasonable amount of device activity updates at once. This allows us to deliver updates effectively to our device activity data collector, which is then able to process and present them to the end user.
Even a relatively simple feature like last activity reporting requires careful thought when operating at scale. However, using our more sophisticated approach was worth the effort, as the proper design of this feature prevented us from creating unnecessary bottlenecks or overloading other, more distant parts of our infrastructure.