IoT on Google Cloud at scale

How fast can you retrofit a factory with “data science ready” IoT?

3 easy stages to verifying your IoT business before major investment

1. Hardware for $10?

2. Secure TLS 1.2 or 1.3?

3. Unix timestamps & atomic clocks?

4. Cloud under 10 cents/month?

Iteration 1

What is the absolute minimum cloud and hardware needed to get clean data from industrial installations?

We got all the way down to ARM M3 but needed to wipe the flash (solder jumper on the reverse side), do new code in C and assembler - it fits!

Contacted ARM for getting TLS 1.3 into the M3 Cortex portfolio.

No room left for edge analytics — but 3 dumb sensors work fine. Legacy WiFi chips have no SNI, likewise connecting to hot spots in the field with timeouts is useless. Dedicated Wi-Fi or LTE is best. Tested to 1Mio devices; total cloud cost about 5 cents/month per device.

Result: data scientists confirm formats. Cloud OPEX verified before investment.

First Iteration ARM M3 test devices, Second Iteration ARM M4 with sensors and R Pi Zero

Iteration 2

Build a data science lab for industrial IoT

Most sensors on top side of blue board, about the size of a matchbox

12 sensors with filters on board (FFT, FiR, LMS) can measure 3 phase electricity at Nyquist sampling rates (harmonic power monitoring at 4k samples/sec >63rd order). All boards sync with atomic clock pool on boot. Crypto: ATECC608A. Ran Google DataLabs on 8 Billion rows of data (n1-highmem-96). Each data point hashed; potential for asynchronous business models.

On the lab bench: reference designs from ST Micro, Panasonic, Silicon Labs, Bosch Sensortec, AMS, Microchip/Atmel configured (below) then shrunk to one board stacked onto a R Pi Zero (above). Data can be seen here.


$570/month with BigQuery. After data science phase, BigQuery is scaled back just for lab use. Prod system costs $410 a month. Costs prove to be linear or less (these 2 VMs handle 2000 boards nicely). VAR, Markov, k-Means can later run directly in the pipe so there’s no need to store, just extract the data value at source.

Data scientist: “its like a digital combine harvester!”

Crucial for data science is latency — each message gets 3 timestamps to correlate with other sensors or device clusters in different parts of the world at the same moment
830Mio rows of accelerometer data points were collected within a few weeks from sites in USA, EU and Asia

Data scientists get massive computing power on demand at low cost and never need to download data.

Devices deliver streams from locations around the world with low latency, synced to atomic clocks: if something happens first in Singapore and then New York, it correlates.

Iteration 3

Remove R Pi Zero, integrate micro server onto SoC, do system hardening. Scale to 100Mio data points per day. Add hardware crypto accelerator, hash data science outputs to a blockchain (Nano) for sale in a market place. “Its easier to sell cakes (results of data science) than milk, flour and eggs (raw data). If data scientists (chefs) get good ingredients to cook with!

Lessons learnt in second iteration:

1. Doing data science with 9 billion rows of data every month needs power to analyse — even using hi-mem GPU VMs. Goal is to harvest machine learning labels for later transfer to Edge TPUs. Viable for industrial applications with central AI.

2. Google IoT Core is not fully needed until production — but try provisioning like this for keys and this for tokens. Data redaction and tokenisation can be planned upfront at scale.

3. Verify each business case at >$5 profit in the first year per unit. Predictive maintenance for factories, real time insurance adjustments for industrial sites. An entire IoT lab like this (once coded) can be cloned for each new business area within 4 hours.

4. Certified industrial modules are $30 in production and allow retrofit of factories and other installations in under five minutes per unit. They can be adapted quickly to accept interfaces from OEM products with little change to the cloud (cloud “publishes” whatever it gets). Production runs of 2000 modules can be turned around in 10 day lots.

5. Ten units in 60m² need one AP or LTE hotspot uplinking at 100kbps each. Once machine learning migrates to the edge, data rates drop to 10kbps uplink. Connection costs will therefore be low even at remote locations.

Will Gillette make such devices of the future? Millions of units, billions of data points per day eventually compressed into edge chips on site.

Edge TPU currently in alpha access release — AI at the edge

I can recommend the lab approach: Pharmaceutical companies work with raw ingredients from the Amazon jungle for years without first knowing exactly what they will be able to synthesise into medical products.

It is no different for data — yet the industry still expects miracles from bad data with business cases upfront. If 3 star chefs have wonderful kitchens (platforms) but the raw ingredients are rubbish, they cannot cook. Like kings in medieval times genuinely believing the wizards or jesters can do magic, most corps still don’t invest in data sources because “hardware” is not “digital”. That’s why there’s still so little clean and useful data on the platforms.

Have lunch with data scientists sometime from any DAX company. Listen to the 20 different pains about “data”. It is an eye opener! This lab was created to solve those pains.

“It seems the hardware engineers at industry level still don’t include the future customers of that hardware’s output — the data scientists” Senior Data Scientist.

Transformational companies take data creation seriously. They know data scientists need pure ingredients to work with. Installations and factories produce the raw ingredients for the cooks — but the cooks must be allowed to define the ingredients, not the kitchen builders! Then the witches and wizards (read: data scientists) may be able to create new data driven services. Real growth via data, not just cheaper IT.

Google has been nurturing its data farms, raw produce, for over ten years. It’s astounding that other companies don’t. They have lots of platforms though! (big kitchens, no food).

P. S. Tried this on Azure / AWS — failed on cost (see also HFT guy views, not just mine). Latency, ease of implementation, lack of machine learning, security and no DLP. Google does data as its core business with 3 billion Android phones in real time. Maybe that’s why?