Perform real time anomaly detection using Google Cloud’s Timeseries Insights API — Part II

Overview of an easy to use API to scale billions of timeseries with low latency anomaly detection and forecasting

Nishit Patel
Google Cloud - Community
6 min readOct 12, 2022

--

Photo by Chris Liverani on Unsplash

This is the second part of the performing real time anomaly detection series. If you haven’t already, checkout first part of this blog to setup basics and familiarize yourself to the use case and setup details.

In this article, I’ll focus on ingesting and appending streaming data in an existing Timeseries Insights API dataset and query for anomaly using newly added event in an existing dataset. I’ll also cover how to delete unused API datasets which is frankly a very straight forward process hence the major focus of the article is appending new data and query for anomaly on same appended event.

As discussed in the part I of this series, Timeseries Insights API is very efficient, low code option for real time anomaly detection as well as forecasting. The API is also provided very low latency for queries which makes it good option to use in where real time business are crucial.

As you know by now, that there are four main methods to interact with Timeseries insights API. These are:

  • Create and load dataset ✔️
  • Query dataset ✔️
  • Update dataset (append streaming/new events in an existing datasets)
  • Delete dataset

In the part I, I went over creating an API dataset and querying for anomaly. In this article, I’ll start by appending events in an existing dataset. Before we start with ingesting new data, lets consider below end-to-end architecture for this use case.

Figure 1 — Anomaly Detection using sensor data architecture

Services used in this architecture include:

  • Pub/sub
  • App engine/Cloud run (not a focus for this tutorial)
  • Cloud function
  • Timeseries Insights API
  • BigQuery
  • Data Studio/Looker (not a focus for this tutorial)

Architecture flow:

  1. First, sensors or IoT devices from customer end records the reading such as temperature, humidity, light and hydrogen values and send this data over to a pub/sub topic in real time (every x seconds or so)
  2. Pub/sub ingests the incoming data payload and triggers the cloud function using http endpoint to process the event.
  3. Cloud function authenticates incoming user request using an app engine service.
  4. Once the request is validated, it performes operations such as data preprocessing, feature engineering
  5. Cloud function calls the Timeseries insights api for anomaly detection.
  6. Timeseries insight API returns the anomaly results back.
  7. Cloud function also calls the BigQuery streaming API to ingest the raw events and anomaly result to a table for further analysis on raw events such as usage report generation, anomaly result analysis etc
  8. End user access the reports/visualization using tools like data studio or looker

For the purpose of this tutorial, I’ll only focus on ingesting and anomaly detection by calling the API. For full cloud function code, check out the resource section at the end of this tutorial.

So let’s get started!

3. Append streaming/new events in an existing datasets

This is my favorite part of the Timeseries Insights API that you can actually add data into an existing dataset in streaming fashion and API does real time indexing on that data. After appending the new events in dataset and indexing completion, it is available immediately to query for anomaly and forecasting. This is a very common IoT use case where data is being streamed in and we need to make business decision in real time using that data stream.

appendEvents API method allows you to add new events into an existing dataset. The steps for appending a new event are as follows:

event append timeline

Also, while appending new events in the API you need to be aware of a few things:

  1. You cannot add events too far in the past using appendEvents method. The appendEvents method is not to be used for backfilling the data in the timeseries and therefore the timestamp of the events being appended needs to be around current datetime. The way to determine on how far you can go back in history for appending the data is using following method.
New event append chronology

ttl here stands for time to live. This is the optional attribute that you set when you create dataset in Timeseries API and if the dataset is going to be used for appending new events in streaming fashion. If your dataset is not going to accept new events then you do not need to set this. The value for ttl is set in seconds and this indicates which incoming events will be accepted by API. If the incoming timestamp value is higher than ttl value, then that event will be added to dataset and older events are dropped.

2. When you append an event in an existing dataset, you can only see that event using evaluateSlice method. The number of rows using list dataset does not reflect new number of rows in dataset which can be confusing for user if one is looking for total number of rows in a dataset using list method.

To append a new event in the API, we need payload with all the dimension (similar to what we created for dataset creation method) and call API’s appendEvents method.

Once the event has been added evaluateSlice method can be used to evaluate data slice in a loaded dataset. This is an option you can use to view the timeseries built by the API.

Following is the sample json payload illustrating an event that we want to evaluate. The returned object after calling the evaluateSlice method is an instance of EvaluatedSlice

Note: It is important to note here that appending new events in an existing Timeseries Insights API dataset does not allow direct access to this event i.e. if you wish to perform additional analysis on this data, you need to keep raw events into a separate table along with appending in API dataset. For example, if your historical data that is used to create the API dataset lives in a BigQuery table, then you can use legacy Bigquery streaming insert API or storage write API depending on the use case to write this incoming event in that BigQuery table.

4. Delete dataset

Deleting an unwanted dataset in the Timeseris Insights API is very easy. Each of the dataset is created and has indexed is being charged and it becomes easy to create many datasets for testing purposes so at some time we need to delete those unwanted and unused dataset from the API. The API has delete method to delete the datasets. All you need us empty body to pass in to http API endpoint.

Optional: End to End Cloud Function Code

Below is some optional code to create cloud function, pub/sub topic and publishing new events to pub/sub to trigger the cloud function that you can use to create end-to-end flow illustrated in the architecture diagram in figure-1 above. This cloud function performs following operations:

  • Parse incoming payload (published to pub/sub topic)
  • Creating appropriate json format for new incoming event
  • Append new event to an existing API dataset
  • Perform anomaly detection or forecasting on new event
  • Insert result into a BigQuery raw event table for further analysis
  • Return the anomaly detection or forecasting result to consume in downstream analysis
  1. requirement.txt for cloud function

2. main.py for cloud function

3. Create pub/sub topic and cloud function and publish message to pub/sub topic to trigger cloud function

Resources

--

--