My son wiring up a hygrometer

Last week was hackathon week at work. I decided to do an Azure Internet of Things (IoT) project to learn more about Azure’s IoT offerings.

Project overview

My son and I built a hygrometer, and then I connected it to Azure IoT Central so we could see its telemetry in a real-time dashboard. A hygrometer is a device that measures both humidity (actually, relative humidity, as I discovered during the course of this project) and temperature. It’s good for monitoring attics, crawlspaces and other hard-to-reach locations. …


Photo by Ferd Brundick

App instrumentation generally involves significant manual effort, with application code invoking logging/metrics/tracing SDKs when something interesting happens. This is useful, but not without its challenges. For one, it’s a lot of work. It also leads to a lot of code cruft. The most consequential challenge, however, is that it mostly results in an inconsistent treatment of observability data (e.g., free-form log messages, metrics data embedded in log messages, unconventional metric and dimension names). There’s little leverage, and it’s hard to do anything systematic with the data.

While manual instrumentation isn’t going anywhere, we can automate more than we typically do…


weatherLayers_002 by Scott Brown (flickr)

1. Start an InfluxDB container

$ docker run -d --name influxdb -p 8086:8086 influxdb

2. Start the InfluxDB shell

$ docker exec -it influxdb influx
Connected to http://localhost:8086 version 1.7.8
InfluxDB shell version: 1.7.8
>

3. Create a database

> create database mydb
> use mydb

4. Insert some time series data

> insert bookings value=102
> insert bookings value=108
> insert bookings value=95

5. Query the data

> select * from bookings
name: bookings
time value
---- -----
1571211243950013300 102
1571211245822776400 108
1571211247850693200 95
>


This post presents time series from a technical perspective, and presents two key challenges for time series analysis. It is based on the dense theoretical treatment in Mathematical Foundations of Time Series Analysis: A Concise Introduction*, by Jan Beran. But here the treatment is less dense since I aim to make the information more accessible to practitioners like myself.

First we’ll define time series and related concepts. Then we’ll use this foundation to understand the two key challenges for time series analysis.

Understanding time series

When we talk about time series, sometimes we’re talking about time series data (observations) and other times we’re…


Photo by Vimal Kumar

On teams, decision-making by dictator and by committee both suck. Dictators generate mediocre decisions quickly, and committees generate mediocre decisions slowly if at all. Over time both approaches kill team morale.

I had a manager, Joe Natoli, from whom I learned an effective balanced approach. The idea is that every decision has an owner. If there’s unclarity about the decision, start by identifying the owner. The rest of us support the owner by offering perspectives to inform the decision.

The team lead can override a decision, but this should happen only in extreme circumstances. I’ve never had to do it.


My team at work is building a time series anomaly detection system that automatically creates anomaly detectors to monitor application health. We started with the humble constant threshold detector, which uses a constant threshold to perform the normal-vs-anomaly classification task. We want to create constant threshold detectors for stationary time series, which are, roughly speaking, series whose statistical properties (e.g., mean, variance, autocorrelation) don’t change over time.

We can use the Augmented Dickey-Fuller (ADF) test to identify stationary series. In this post I’ll show how to do this in R using the tseries package. …


When building models for forecasting time series, we generally want “clean” datasets. Usually this means we don’t want missing data and we don’t want outliers and other anomalies. But real-world datasets have missing data and anomalies. In this post we’ll look at using Hampel filters to deal with these problems, using R.

For the Jupyter notebook, see https://github.com/williewheeler/time-series-demos/blob/master/hampel/removing-outliers-from-time-series.ipynb.

What is a Hampel filter?

A Hampel filter is a filter we can apply to our time series to identify outliers and replace them with more representative values. The filter is basically a configurable-width sliding window that we slide across the time series. For each window, the…


This post is by my guest writer, Lucy Wheeler.

Hello. Today at this blog you will be learning how to make an easy scratch racing game that should be do able in less than an hour.

Start by creating an airplane sprite.

Then create a sky backdrop.


In my post Reducible vs irreducible error, I briefly explained how you can decompose prediction errors into reducible vs irreducible components. This time we’ll push the decomposition a little further, breaking the reducible error into error due to bias and error due to variance:

prediction error = error due to bias + error due to variance + irreducible error

Prediction errors are closely related to the extent to which a model-building method is sensitive to the details of the training set:

  • Error due to bias. Sometimes the method is too rigid, failing to capture key features in the training set (“underfitting”) and thus yielding models that are too simple-minded. …


Suppose that we want to predict a value Y based upon a set X = (X1, X2, …, Xp) of variables. For the predictions to have any chance of being good predictions, X needs to contain the core set of variables that drive the behavior of Y. But there will almost always be lesser variables, not included in X, that nonetheless exert some minor influence on Y. We capture the situation as follows:

Here, f is the function describing the relationship between X and Y, and ɛ is an error term that accounts for all the unmeasured influences on Y

Willie Wheeler

Interested in applying machine learning and data science to problems in operations. For my stats course and tutorials, see https://learnstats.io.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store