Build the world’s largest IoT with RasPi and Google BigQuery

Kaz Sato
Google Cloud - Community
7 min readJul 17, 2015

This is my weather station built with RasPi. It took only several hours in my weekend to build this, but it’s already capable of deploying the world’s largest IoT platform. Why? Because it directly sends the metrics to Google BigQuery, the massively parallel query engine from Google Cloud Platform which is able to collect one million rows of metrics every second and execute query on terabytes of data in 10 seconds.

So, you can start deploying millions of this box to collect temperature, humidity and atmospheric pressure (or any metrics if you add sensors) from everywhere in the world, right now. I don’t have to do anything more to build a large distributed frond ends, load balancers, app servers and super-fast database cluster with scaling out, fail-over, replication and etc — the complications required to build a large production IoT platform. BigQuery has them all in Google’s largest Datacenter with Google scale and quality.

The RasPi box runs a simple Python code to send the metrics to BigQuery that will be aggregated and shown as graphs on Google Spreadsheet.

The total cost for building this is less than $100. You only need to buy the RasPi box and buy some sensors and that’s it. Google Spreadsheet is free. Google BigQuery is outrageously inexpensive: the storage cost is 20 cents / GB / month and the query cost is a few cents for querying on 100M rows each time. Most importantly, it’s a fully managed service. You don’t have to hire tens of senior engineers to build and operate the world’s largest big data cluster.

Let’s take a look at how you can build this box in your weekend.

Connecting Sensors to RasPi

The followings are the parts you need to buy:

By using the NOOBS SD card, it’s so easy to setup RasPi. The USB WiFi should also be easy to set up with the OS’ config tool.

For connecting the sensors to RasPi, you may refer to the following page:

From DHT Humidity Sensing on Raspberry Pi

No special circuitry is required. You can connect GPIO pins of RasPi to the sensors with a breadboard. One caveat is that you have to keep DHT22 away from RasPi box to avoid the heat from CPU. For LPS331, you may use any other pressure sensors that can be attached to RasPi.

Installing Drivers with Ansible

It’s not so easy to write drivers for the sensors. Especially, DHT22 takes a little effort. But the cool thing about using RasPi is that you can find them on GitHub.

Another cool thing is that RasPi is a Linux. You can use Ansible Playbook like the following to install drivers on GitHub for the sensors.

    # Adafruit DHT drivers
- git: repo=git@github.com:adafruit/Adafruit_Python_DHT.git
dest={{ dht_dir }} accept_hostkey=yes
sudo: no
- command: python setup.py install chdir={{ dht_dir }}

Although you may need to wait a while until finishing the Playbook execution :)

Decode Sensor Values by Python

The LPS331 pressure sensor uses I2C bus protocol for communication. With Python, you can use i2ctools command to read the values from the sensor.

def cmd_exec(cmd):
p = subprocess.Popen(cmd, shell=True, stdout=subprocess.PIPE, stderr=subprocess.PIPE)
stdout, stderr = p.communicate()
if stderr != None and len(stderr.strip()) > 0:
raise IOError("Error on executing cmd: " + stderr)
return stdout.strip()

def i2cget(reg):
return cmd_exec("i2cget -y 1 " + LPS331_ADRS + " " + reg)

The code let you execute i2cget command as a sub process. So that you can read values from I2C registors to calculate pressure values.

def read_lps():

# reading from LPS
out0 = i2cget("0x28")
out1 = i2cget("0x29")
out2 = i2cget("0x2a")

# decoding the value
return (int(out0, 16) + (int(out1, 16) * 0x100) + (int(out2, 16) * 0x10000)) / 4096.0

DHT22 temperature and humidity sensor could be read easily by using the Adafruit Python driver.

# read humidity and temp from DHT 
humidity, temp = Adafruit_DHT.read_retry(Adafruit_DHT.DHT22, DHT22_GPIO)

Use Fluentd to Send to BigQuery

Now it’s ready to send the metrics to BigQuery. You can use Fluentd, the popular open source log collector, to do this. The following is the Ansible Playbook for installing Fluentd and its BigQuery plugin:

    # Fluentd
- command: aptitude install ruby-dev
- command: gem install fluentd

# pip, fluent-logger-python, fluent-plugin-bigquery
- command: aptitude install python-pip
- command: pip install fluent-logger
- command: fluent-gem install fluent-plugin-bigquery

On fluentd.conf, you can add the following config to receive the event log from fluent-logger-python and forward it to fluent-plugin-bigquery.

<source>
type forward
port 24224
</source>

<match weather.**>
type bigquery

method insert

auth_method private_key
email YOUR_SERVICE_ACCOUNT_EMAIL
private_key_path YOUR_PRIVATE_KEY_FILE_PATH

project YOUR_PROJECT_ID
dataset YOUR_DATASET
table YOUR_TABLE_NAME

time_format %s
time_field time

fetch_schema true
field_integer time
</match>

In Python code, use fluent-logger-python API to send the metrics to Fluentd.

    # write metrics to local fluentd
event.Event("metrics", {
"atmos": atmos,
"hum": humidity,
"temp": temp
})

Create a Table and Key on BigQuery

Next, set up BigQuery to receive the event log from Fluentd. If this is the first time for you to start using the query service, please take a look at the getting started guide to prepare a Google Cloud Platform project and BigQuery dataset.

For this demo, I’ve created a BigQuery table to receive the metrics. You may need to add a column to hold IDs of each device if you want to collect metrics from multiple devices.

[
{
"name": "time",
"type": "INTEGER"
},
{
"name": "hum",
"type": "FLOAT"
},
{
"name": "temp",
"type": "FLOAT"
},
{
"name": "atmos",
"type": "FLOAT"
}
]

By using bq command, execute the following line to create a table with the schema above.

> bq mk -t <your-project-id>:<your_dataset>.weather_report wr_bqschema.json

Then, create a private key to connect from the RasPi box to BigQuery. At Google Developers Console, open APIs and auth menu and create a new Client ID for a service account.

This will start downloading of the private key. Copy the key to RasPi box, and edit fluentd.conf to set private_key_path field to the path of the key file. Also, set email field to the email address of the service account.

Execute a Query on BigQuery

Now it’s ready to try. On RasPi, execute the following to initiate Fluentd.

> fluentd -c fluentd.conf

And open another shell to execute the python code.

> sudo python weather_report.py

If it runs successfully, you would see anything on the console. Go to BigQuery Console and execute the following SQL.

SELECT 
LEFT(STRING(SEC_TO_TIMESTAMP(time)), 15) + '0:00' as time,
AVG(temp) as temp,
AVG(hum) as hum,
AVG(atmos) as atmos
FROM [YOUR_PROJ:YOUR_DATASET.weather_report]
GROUP BY time
ORDER BY time DESC

Sounds like it’s working! This query aggregates the 10 min average metrics and sort them with the timestamp.

Because it’s running on BigQuery, you could get the result of this query in 10 seconds even when you collect 100 billion rows of metrics. See this white paper to learn why it could be possible. In short, the service runs thousands of servers in parallel for every single query.

Now you are ready to send any metrics of any number of IoT devices — sensors in appliances, cars, mobile devices, factory machinery and etc — to BigQuery without any concerns about the scalability and availability.

Draw a Graph with Google Spreadsheet

By using Google Spreadsheet and its Google Apps Script, you can execute the BigQuery query from it and draw a graph from the result. Please look at Real-time log analysis using Fluentd and BigQuery to learn how to do this. No need to write your own script. You can just copy the sheet and script in the document and use it to draw a graph like this.

As you have seen on this article, it’s so fun to play with RasPi, especially when you are connecting it with the powerful cloud toolings such as BigQuery and Fluentd. Even though it’s just a hobby in weekend, you could already get the fully managed, highly scalable and available IoT platform without spending much money.

The sample code for this demo is available on my GitHub repo.

Originally published at qiita.com.

--

--