A mechanism to transfer complex data between dumb systems

Andrew Howden
bioprofile.co
Published in
7 min readJan 1, 2019
Photo by Victor Freitas on Unsplash (Edited)

More recently, I have been thinking about ways to connect a wide range of devices up to a single data plane to make it easy to express data between devices that do not otherwise have data connectivity. Specifically, it should be possible to retrofit either “dumb” devices (that is, devices without a microchip) with some way to convey some complex properties about themselves, or devices that are semi-smart — devices that have simple LCD screens but no network connectivity.

Use Cases

There are several use cases for such data expressions:

Fitness Equipment

When a user starts their fitness training, or perhaps their rehabilitation, they’re thrust into a complex world of specific requirements that often have little to no training material. To fill this gap there are coaches who teach the correct use of such equipment, but practically speaking there’s a limited amount coaches can influence their charges and their charges will spend the majority of their training alone, or in large groups.

The use of smart phones to track both the training and the results thereof is becoming ubiquitous with an almost overwhelming variety of fitness apps, including the under-construction “Moar Fit” that I and colleagues are developing.

However, in my experience, the training environment is not one that’s well suited to data entry; participants can be:

  • Too new to know which data to enter, and why
  • Too tired to care
  • Too energized to work a cell phone

The process of expressing this tracking data is simply too difficult to continue to do sustainably. However, it might be possible to reduce the level of interaction required to track the data to an absolute minimum by expressing properties of itself in a way that can be directly intercepted by applications.

In this scenario, as opposed to a user needing to enter the time, distance or average speed of a run into their device they’d simply have to take a photo of a device displaying a correctly formatted QR code with a device assistant or QR app. The device can then decide what to do with that data; whether to open it in a specific app, push it to a URI etc.

Additionally, with devices expressing complex data about themselves users could look up how to use a device in such a way that fits into their training program, allowing new users to be reassured in the correct way of using such tooling or more experienced users alternative exercises that still work within their training requirements, but allow a certain level of variety.

Retail

There is an inherent disconnect between the retail environments that are “in store” or “online”. Both are useful in their own ways for evaluating products, but at this time there’s not a super elegant way of connecting the experience of in store with that of online.

If a product was able to express a limited amount of data about itself, such its EAN, manufacturing date, model number or reference to an online information repository users would be able to quickly save, search or compare various products easily.

Such things are already possible via barcodes that store the EAN, however there is only a limited amount of information that an EAN can provide, and more complex data may be useful for consumers looking to learn more about, store or search for products.

State of the art

Currently this sort of expression is already in place in a number of areas. The most common is the UPC barcodes attached to nearly every product, but there are examples of QR codes on Gymnasium equipment and references to standards in storing complex data in JSON in a QR code.

However, there are several limitations with these implementations:

  1. The data formats are always bespoke implementations. There’s no “standard” way of expressing data.
  2. There are some practical limits on encoding JSON data in a QR code.
  3. Such data is not readily “discoverable”, and requires a level of knowledge about the format beforehand

Accordingly, there’s a high level of “buy in” required for consumers to use this sort of rich data, and the because of the high cost there’s a limited investment in the tooling around this data and thus low utility for consumers. Broadly, investment remains too high and utility too low.

Protobuf as a solution

The problem of allocating a large amount of data in a small payload is one that mirrors a similar problem of using network bandwidth efficiently. Technology giants such as Google, Facebook, Microsoft etc. have a vested interested in ensuring that their bandwidth costs are low, as it directly reduces the cost of running multitudes of services. In 2008 Google released its RPC library for managing the communication between software services. It has a number of properties that make it excellent as a medium to store complex information in limited bandwidth, particularly as compared to JSON:

  • Its payload is smaller, as it is a binary format and does not need to store full references to keys / values. Gzipping the JSON payload offsets some of this advantage, but not all of it.
  • Because its an “interface definition language” it requires that an API be specified up front, versioned and managed. This means there is always a central definition that all software should implement, but one that can be nicely upgraded, changed or otherwise handled over time.

Implementing discovery, inspired by gRPC

This gives an objective payload format, that in theory applications can implement. However, Protobuf is a “wire format”, but does not specify how such a format should be communicated. For that, we instead need to look at something like gRPC.

Normally, gRPC servers implement an RPC framework that users Protobuf as it’s communication payload. This allows a robust, yet efficient RPC framework; especially given the properties of HTTP/2 designed to maximize the TCP connection bandwidth.

It is not directly suitable for expressing the payload data. However, there is precedent to implement the transfer of protobuf data as part of a URL:

https://www.google.com/get/cardboard/download/?p=CgxNYXR0ZWwsIEluYy4SGFZJRVctTUFTVEVS4oSiIFZSIFZJRVdFUh0xCCw9JWiRbT0qEAAASEIAAEhCAABIQgAASEJYATUpXA89OggpXA8-SOE6P1AAYAM

The protobuf is encoded as “base64” in a URL parameter. Extracting it with:

# Base64 manually corrected for `+` and `=` charsecho 'CgxNY...AYAM=' | base64 -d | xxd

gives

00000000: 0a0c 4d61 7474 656c 2c20 496e 632e 1218  ..Mattel, Inc...
00000010: 5649 4557 2d4d 4153 5445 52e2 84a2 2056 VIEW-MASTER... V
00000020: 5220 5649 4557 4552 1d31 082c 3d25 6891 R VIEWER.1.,=%h.
00000030: 6d3d 2a10 0000 4842 0000 4842 0000 4842 m=*...HB..HB..HB
00000040: 0000 4842 5801 3529 5c0f 3d3a 0829 5c0f ..HBX.5)\.=:.)\.
00000050: 3e48 e13a 3f50 0060 03 >H.:?P.`.

The data seems arbitrary, but at least contains some ASCII data. Looking over the definition we can see data that (probably) corresponds to:

  • Vendor: Mattel, inc
  • Model: VIEW-MASTER VR VIEWER

The other data is harder to pick out, as it’s expressed in relation to the compiled protobuf.

Lastly, viewing the URL shows a page that encourages users to download “Google Cardboard app”, which is something like an app store specifically for VR applications. The application FAQ reads:

Q: Does the viewer work with other Works with Google Cardboard apps?

A: Yes, the View-Master® VR viewer is compatible with the hundreds of Google Cardboard-compatible apps available on Google Play and the App Store.

http://www.view-master.com/en-us/troubleshooting

So, we have all the components required for users to be able to use the data, even without specialized data:

  • A mechanism to express a specific protobuf payload
  • A fallback mechanism to describe how to read this payload

However, there still ways to improve this type of data serialisation. Specifically:

  • There should be support for ${N} types of data
  • The data should be parsable by any application
  • There should be a better fallback for that data

To this end, we can take further lessons from gRPC. Specifically, gRPC defines both a URL format (for the most part) and allows us to send data to an endpoint should the user not have any application they have better designated to handle it.

gRPC defines a URL structure of:

${SCHEME}://${HOST}/${SLUG}?/${SERVICE_NAME}/${METHOD_NAME}

For example,

https://in.healthydata.com/healthydata.anthropometrics.v1.WeightService/CreateMeasurement

This gives directionality or intent to our protobuf definition, as before implied by the URL structure of the “made with cardboard” QR code. Unfortunately here we reach a limitation of HTTP and passing data around without a HTTP body, and need to add it as the card example as a query parameter:

https://in.healthydata.com/healthydata.anthropometrics.v1.WeightService/CreateMeasurement?payload=c2FkamtzYQo...

However, this URL can be intercepted by any application, and if the user chooses, an application can intercept and read this data.

If the user does not choose, the in endpoint could authenticate the user with traditional browser based authentication (cookie or JWT), and if that user is authenticated with the default service, persist the data in the browser.

If the user is not authenticated, the user could be shown the data within the browser window, and encouraged to log in to persist that data.

In Conclusion

There is currently a disconnect between the virtual and the physical world. While this is being somewhat bridged by “internet of things” devices, such devices carry a swathe of their own risks and are often complex. By allowing a low bandwidth transfer capability directly from the devices and in an opt-in capacity from the user, it is possible to roll out electronic integrations with dumb devices at extremely low cost.

Such integrations could be useful in a wide range of areas, though I am most interested in the physical fitness side of things as part of the https:///moar.fit/ project.

If this works for you great! I’m glad you liked it. If you know of something better already, leave a note in the comments!

--

--