Deep Learning on Structured Data: part 1

I’ve spent the last year learning as much as I can about machine learning and trying (with varying degrees of success) to apply what I’ve learned. One of the goals I set for myself in 2018 was to apply deep learning to tackle a day-to-day problem in my job. Two months into the year I have made progress and faced some unexpected challenges. Here’s the story so far…

I lead the team that provides support for one of IBM’s premier database products, Db2. This team spans the globe and helps our customers tackle dozens of technically challenging problems every day. For every problem (or ticket) a set of metadata is captured, including the team member who is handling the ticket, the severity of the ticket, a text description of the problem, the date the ticket was opened, etc. I wanted to find a way to harness deep learning to get useful insights out of this metadata.

I greedily consumed everything I could about deep learning, including four fifths of Andrew Ng’s Deep Learning Specialization and a large slice of Jeremy Howard’s Practical Deep Learning for Coders. See this article for a comparison of the benefits of each of these approaches to learning about deep learning.

Midway through the above courses I came across this article on structured deep learning that in turn led me to this very elegant Kaggle competition entry that was exactly what I was looking for — an end-to-end deep learning example that incorporated the following characteristics:

  • Keras-based — I thought that Keras would be the right framework for what I wanted to do because it’s widely used (and thus has a large community contributing answers & ideas) and at the Goldilocks level of abstraction. Tensor Flow was more complexity than I wanted to stomach at this stage, while more abstracted frameworks (like the fast.ai library featured in version 2 of Jeremy Howard’s Deep Learning course) had too much black box.
Table in -> deep learning result out
  • Table in -> deep learning result out. I was looking for a working, end-to-end example that started with structured data as input and output a useful result from a deep learning framework
  • Deal with three classes of data: continuous values (like elapsed time or temperature); categorical values (like country names or days of the week); and text. In particular, I wanted an example that would show how to deal with embeddings for categorical values and text.

Armed with what the deep learning courses had taught me and a working end-to-end example of applying deep learning to structured data, I selected a problem to attack: predicting the Time to Relief (TTR) for a ticket as soon as it was opened. This means predicting how long it would take (24 hours or more) for my team to initially address (either with a workaround or a permanent solution) a customer’s problem.

Using ticket metadata for the last 6 years I was able to assemble a corpus of 180k records and run it through an adaptation of the model I cited above. I used Python in DSX Public Cloud as the development environment, with the data being pulled from a table in Db2 Warehouse on Cloud.

To get the model working I attempted to build it up slowly:

  • start with a subset of features, excluding text features and categorical features, to work out kinks in the data, such as columns that I assumed to be numeric including strings
  • introduce categorical features with embeddings
  • introduce text features with embeddings
  • add additional input data to get the corpus up to close to 1 million records

Each of the above steps improved accuracy but I would have struggled to get a working model if I had attempted to incorporate all the features at once.

The biggest step in validation accuracy (from less than 5% to over 60%) came when I corrected the type of the label column (Time to Relief — TTR) to be floating point.

A slow start on validation accuracy

So far, I have been able to get validation accuracy of about 80%. That is, 4 times out of 5 the model can predict whether a ticket will get relief within one day based solely on the information that is available when the ticket is first opened. I had hoped to get a better result in the end, but to misquote Meat Loaf, 4 out of 5 ain’t bad.

Here are some of the lessons I have learned through this experience:

  • Problems with the input data manifested themselves in unexpected ways. For example, I spent a couple of days struggling with the type of one of the features before I realised that commas in 3 out of the 180 k input records set the column type to string in the resulting pandas dataframe.
  • Don’t make assumptions about what values a column can contain. I did several iterations of the model before I realised that a type error meant that almost all the label (TTR) values were being set as NaNs. I also wasted several iterations because I assumed that one of the features (initial ticket severity) would always be 1, 2, 3, or 4, when in fact the input data also included values of ‘1’, ‘2’, ‘3’, and ‘4’!
  • Tuning hyperparameters helped out until it stopped helping out. I learned a lot about the impact of learning rate, dropout rate, lambda for regularization etc. by varying each in isolation, and by trying out different optimizers (Adam & SGD). However, a scattershot approach to varying hyperparameters didn’t get me any better validation accuracy after a few leaps.
A visualization of the model — not as complicated as it looks!

This foray into applying deep learning to structured data has been a great learning experience. In addition to digging into a simple deep learning model I have learned a lot about Python. I am in the process of cleaning up the code (in particular removing the hardcoded references to columns in the code responsible for categorization, filling in missing values, and feeding inputs to the model). You can see the results in Deep Learning on Structured Data: part 2.

I am looking forward to applying different architectures to this problem and related problems (such as predicting whether a the resolution for a ticket is going to require a code fix). While the results I have been able to achieve so far are not earth-shattering, I am more convinced than ever that there is massive potential to solve practical problems by applying deep learning to structured data.