The Sensor Service

Published in

Dwelo Research and Development

5 min readJun 1, 2020

This is the 4th and final part of a series of posts about how Dwelo software uses Twilio Sync as its IoT communications platform. If you missed part 3, you can find it here.

Recap

Previously in this series, we shared how Dwelo uses Twilio Sync to manage the sensor state of the IoT devices in our ecosystem of smart apartments. We explained how the sensor state is managed using document-based Sync maps, and how desired state is also expressed with those maps. We also looked at how each sensor state change was reflected from the Sync cloud into a set of AWS entities — collectively known as the Sync-Reflector service — that publishes those changes to various agents in the Dwelo cloud, as well as to a pipeline that collects those state changes into the Dwelo data lake for analysis.

In this post we’ll look at how a key agent in our system — the Dwelo sensor service — receives and handles state changes propagated to it from Twilio Sync, and shares those to customer apps.

The Sensor Service

One of the SNS subscribers for the events published by our sync-reflector service is a service we simply call, the Dwelo sensor service. The sensor service is responsible for tracking the sensor state of all devices in the system. We use the term “sensor” to describe any part of an IoT device that reports state, even if it isn’t strictly a hardware sensor. For instance, in our terminology a thermostat has an ambient air temperature sensor, an HVAC operating state sensor, a humidity sensor, a heat setpoint sensor, and a cool setpoint sensor.

The sensor service consists of the AWS entities pictured below. The cache is a Redis store in AWS ElastiCache.

The sensor service is broken into two Lambda functions: put-sensor, and get-sensor.

Put-Sensor function

The put-sensor lambda function makes sure that we have stored the latest sensor state of each device in the cache.

The lambda subscribes to the SNS topic that the sync-reflector publishes Sync events to, and it configures the subscription to filter out all Sync messages except those with sensor keys such as DoorLocked, BatteryLevel, and ThermostatMode. This is necessary because we use Sync maps for more than just sensor events, but that is a topic for another time.

When the lambda function receives an event, the event['body'] looks like this:

{
  "MapUniqueName": "abcd-1234-efgh",
  "ItemKey": "DoorLocked",
  "ItemData": "{\\"state\\": \\"true\\",\\"timestamp\\":\\"2020-01-02T18:00:00Z\\"}"
}

When the put-sensor lambda receives the event, it first retrieves the previously stored value from the cache and checks the timestamp for when that value was recorded. If the newly received event has a timestamp that is more recent than the timestamp of the cached value, the service writes the new event to the cache. This timestamp check is important in case sensor reports are received out-of-order.

The hash_name used by the cache is composed of a string prefix + the device_uid for the device that is reporting the state. For example, device:abcd-1234-efgh

The field_name is the same as the Sync map ItemKey (e.g. DoorLocked), and the field_value is extracted from the Sync map ItemData, with the exception that the value also stores the timestamp of the event in a meta field.

{
    "meta": {
	"timestamp": "2020-01-02T18:00:00Z"
    },
    "value": "true"
}

Using this simple logic in our put-sensor lambda we can be sure that the most recent sensor state of a device is always available in our cache for other services to retrieve.

Get-Sensor function

The get-sensor function is responsible for responding to requests for sensor state from other services in the Dwelo cloud, such as our backend API.

When it gets a request, the get-sensor lambda retrieves the sensor value from the cache.

def lambda_handler(event, context):
    # Generate hash_name using device_uid in event data
    ...    if not cache.hash_name_exists(hash_name):
        hydrate_cache_from_sync(device_uid)    ret = list()
    values = cache.get_all_hash_field_values(hash_name)
    for key, value in values.items():
        # Prepare list of values for response.
	...    return {'statusCode': 200, 'body': ret}

The only complexity here is that if there is no value for the sensor in the cache, the get-sensor function makes a request directly to Twilio Sync for the entire map for the device. Then it writes all of the sensor-related keys and values from the Sync map to the cache. This hydration process is a little time-consuming, but it only occurs in the rare event that the cache has been flushed. Once the cache has been hydrated, get-sensor returns the value of the sensor to the requesting client.

def hydrate_cache_from_sync(device_uid):    map_items = get_map_items_from_twilio(device_uid)
    for item in map_items:
        if item_key_is_sensor_report(item.key):	    sensor_event = SensorEvent()
	    # Initialize using `device_uid`, `item.key`, 
            # and `item.data`
	    ...
            value = json.loads(sensor_event.field_value)
            cache.set_hash_field_value(
                sensor_event.hash_name, 
                sensor_event.field_name, 
                value)def get_map_items_from_twilio(device_uid):
    device_map = twilio_client.sync\
        .services(TWILIO_SYNC_SERVICE_SID)\
        .sync_maps(device_uid)\
        .fetch()
    return device_map.sync_map_items.list()

Having the most recent sensor state available in AWS ElastiCache means that requests for that state are responded to extremely quickly for our consumers.

Traffic and Load

To give you a real-world sense of smart home appliance traffic, for every 10,000 devices installed, we see 750,000 to 1 million state changes per day, and 40–50,000 thousand requests for state. We have observed ~95% cache hits, meaning each of those is faster than if we had had to make the trip to Twilio from our backend. We pay for the privilege of having those state changes managed and ensured by Twilio, but we offset the cost of the subsequent reads by maintaining our own cache.

With the sensor service receiving state changes from the sync-reflector and storing the new state in ElastiCache, we now can build a passthrough service for pushing sensor state changes in real time, or allow clients to subscribe directly to Sync through native protocols.

Currently we have opted to keep our mobile clients polling against a service backed by the sensor cache as it saves money compared to a direct Sync get. Essentially, we benefit from having a universally accessible cache with the ability to overlay a flexible protocol and low cost of access, or a more performant connection with Twilio Sync with a cost tradeoff. Ultimately, these are both good worlds to be in, and as we fine tune our use cases, we could branch our access patterns. For instance, where state reflection is most time sensitive is right after command dispatch, where as occasionally glancing at your app to check on state doesn’t need to be as real time.

Conclusion

As you can see, Dwelo has greatly benefited from the managed state platform provided by Twilio Sync. Using the platform has allowed us to rapidly create a highly-scalable, highly-available system for collecting IoT device state changes, and publishing those changes to our large consumer base via web and mobile apps.

Thanks for reading! If you would like to learn more about engineering at Dwelo, check out our IoT in Rust series. We’re also always hiring!

The Sensor Service

Recap

The Sensor Service

Put-Sensor function

Get-Sensor function

Traffic and Load

Conclusion

Written by Greg Cooper