Google DeepMind-style datacenter optimization AI model (on the cheap)

6 min readAug 17, 2016

There was news recently in bloomberg about how google was able to cut electricity usage in its datacenter by using an AI scheme made by DeepMind (of AlphaGo fame). Earlier this week, i decided to make a quick-and-dirty implemetation in python and share it here for anyone interested in a practical example of what exactly they did. First lets take a quick look at why one would want to make such a thing...

Motivation

Datacenters (and indeed any other large scale structures that use a lot of energy) need to be carefully optimized for efficiency as even a 10% - 15% saving on the electricity bill can add up to millions of dollars a year. The biggest challenge here is that even though there are certain simple steps that anyone can take to reduce energy use (don’t use a very low server room set-point, use free-cooling when possible, etc…) one can never actually predict quantitatively what the effect of changing variable x by z% will have on total consumption. This is because there simply are too many variables that affect the net consumption of a datacenter (chillers, AHUs, compressors, condensers, fans, outside conditions, latitude, etc…) and its impossible to actually write down a formula that can quantify all these relationships.

However, as long as you have a lot of data, ML is perfect for learning complex relationships between multiple features and outcomes. So, first let’s have a look at the available data. (I work in the industry so I was fortunate enough to have these data handy).

Data

For my feature set, I decided to use 9 input variables from my datacenter, which are: chiller load, pump load, AHU load, condenser load, IT load, outside air temperature, outside air humidity, wind speed, and wind direction. Loads are in kW, temperatures in degree F, humidity in percentage, wind speed in miles/hour and wind direction degrees. Furthermore, all metered data is measured at 5 min intervals, while outisde conditions are recorded on-change. Note: the DeepMind model uses 19 inputs, the list of which can be found in their technical paper. While DeepMind chose to optimize the data center’s PUE, I decided its easier to just optimize the total data center electricity consumption. But one can use any relevant time series data here, depending on what one wishes to model.

Data Preprocessing

Its important to have good quality data. At the very least, data points should line up (time wise)and be of the same size. IoT sensors are typically bad at this, so you’ll have to do some wranggling to get everything nice and orderly. My sensor data have random missing values and false peaks at different time stamps. For the purposes of this simple demo, I decided to resample the data to a common index of timestamps from start_time to end_time spaced 5 minutes apart, which I generated algorithmically using the datetime package in python 3.5. Then all sensor data were resampeled into a pandas dataframe to this common index by using nearest matching values from the raw data, and using interpolate/ffill/bfill (take your pick) where data is missing. Here’s sample code:

chiller = pd.read_csv(‘chiller_load.csv’, sep=’,’, index_col=0, names=[‘date’, ‘chiller_load’], header=0, parse_dates=True)
chiller = sample_to_std(chiller)

where the sample_to_std() function implements something like the following:

n_series = []
for j in range(len(new_index)):
    c = new_index[j]
    i = frame.index.get_loc(c, method=’nearest’)
    val = frame.ix[i][0]
    n_series.append(val)
return pd.DataFrame(n_series, index=new_index, columns=frame.columns.values.tolist())

note: there are more efficient ways of writing these functions, this is only meant for clarity.

Next, we combine all input data into one data frame using concat:

df_list = [prod, chiller, pump, hvac, cond, OAT, hum, windspeed, winddir]
table = pd.concat(df_list, axis=1)X = table.ix[:,0:9]
Y = mains

We can then confirm that the input dataframe behaves as expected:

Baseline

Before we actually build a deep learning model, its worth running a simple regression to try to predict the output just so we establish a baseline score. Here I use a simple linear regression using the linear_model module from scikit-learn. First let’s setup the cross-validation scheme using 10 iterations, and a train/test split of 80/20:

shuffle_validator = cross_validation.ShuffleSplit(len(X), n_iter=10, test_size=0.2, random_state=0)

next, we’ll set up a function that will allow us to test and score different types regression algorithms in just one line:

def test_algo(rgr):
    estimators = []
    estimators.append((‘standardize’, StandardScaler()))
    estimators.append((‘regressor’, rgr ))
    pipeline = Pipeline(estimators)    scores = cross_validation.cross_val_score(pipeline, X, Y, cv=shuffle_validator, scoring=’mean_absolute_error’)
    print(“MAE: %0.4f (+/- %0.2f) kW” % (scores.mean(), scores.std()))

The main thing to note here is that we’re using the Pipeline function from scikit-learn to string together a StandardScaler and a regressor. Now all we have to do is setup a regressor and pass it to the test_algo() function like so:

# linear model 
rgr = linear_model.LinearRegression(fit_intercept=True)
test_algo(rgr)

You’ll note from the test_algo() function that we are using mean absolute error to score these functions. For the linear model, I get a mean absolute error of 16.48.

Let’s try another quick one just for fun. This time, a K nearest neighbor regressor from scinkit-learn:

# K nearest neighbor model 
n_neighbors = 5
rgr = neighbors.KNeighborsRegressor(n_neighbors, weights=’uniform’)
test_algo(rgr)

The plot shows a scatter plot of predicted Vs actual consumption on the test portion of the data, and overlays a simple y = x line as a visual guide to show the match. This nearest neighbor model returns a mean absolute error of 5.78 kW. Not bad given that total consumption is in the range of 300 to 500 kW. With the baseline established, let’s move on to the neural networks.

Multilayer Perceptron

I will use the Keras neural networks library to build my model. It is dead simple to work with, and uses a Theano backend (TensorFlow backend also available). Unfortunately, Keras doesn’t accept data in the pandas DataFrame format, so we’ll have to convert our data to numpy array format. Thankfully, pandas makes this exceedingly easy. It takes just one line of code:

XX = X.as_matrix() ; YY = Y.as_matrix()

Aside from changing variable names, our cross validation and testing scheme will not change at all. Let’s start by building a basic perceptron. Keras makes this easy by providing a Sequential() class that easily allows one to stack on many layers of neurons. Since we will use a scikit-learn wrapper to make the scoring easy and compatible with the rest of our code, we’ll first write a container function for our Keras model:

def simple_MLP():
    model = Sequential()
    model.add(Dense(9, input_dim=9, init=’normal’,    activation=’relu’))
    model.add(Dense(1, init=’normal’))
 
    model.compile(loss=’mean_absolute_error’, optimizer=’adam’)
 return model

The first layer we added is a Dense (i.e. fully connected) layer with 9 units, input shape of 9 (same as out feature length), initialized using the ‘normal’ scheme and with relu activation. We then simply add the output layer with 1 unit with no activation (since we’d like to see the regressed raw predictions). The function simply returns this model after a model.compile() step where we specify the loss function to optimize and a suitable optimizer. The different options are enumerated on the Keras guide. The sk-learn wrapper class we use is called KerasRegressor and has the following configuration:

rgr = KerasRegressor(build_fn=simple_MLP, nb_epoch=50, batch_size=5, verbose=0) 
test_algo_ndarray(rgr)

Running this model yields a mean absolute error of 14.07, which isn’t too bad but seems to be outperformed by the nearest neighbor estimator. This suggests that there might be room for improvement. I leave further optimizations to the reader, but one obvious way to do this would be to expand the topology of the neural net by adding more layers (depth) or by using wider layers (width) or both, like so:

def deep_MLP():
    model = Sequential()
    model.add(Dense(30, input_dim=9, init=’normal’, activation=’relu’))
    model.add(Dense(40, init=’normal’, activation=’relu’)) 
    model.add(Dense(40, init=’normal’, activation=’relu’)) 
    model.add(Dense(30, init=’normal’, activation=’relu’)) 
    model.add(Dense(1, init=’normal’))
 
    model.compile(loss=’mean_absolute_error’, optimizer=’adam’)
 return model

This model has 4 layers (30, 40, 40, 30 units respectively). With this configuration, I was able to achieve a mean absolute error of 8.34 kW.

update: here are results for a model with 4 hidden layers with 50 units each.

It seems that the nearest neighbor estimator is hard to beat in this case.

Note: I’m only using 5 months of data (44065 time stamps in total) on 9 features, whereas the DeepMind model uses 2 years worth of data (same 5 min resolution) on 19 features. Their neural net uses 5 hidden layers with 50 units each. So I expect their code to be more robust. They also clean their data and take care to eliminate feature collinearity. But even with faily unclean data and a quick-and-dirty ANN model, its nice to get single digit errors :-)

Cheers,