Small data, big value

How simple and quick machine learning projects can beat out massive technology investments

Jacqueline Nolis
T-Mobile Tech
Published in
5 min readNov 18, 2019


(Hate reading? Head straight to the demo and test the model yourself!)

This post was coauthored with Joel Werdell, a Principal Product Manager at T-Mobile.

Machine learning journalism focuses solely on massive endeavors: giant datasets being loaded into intensive computing machines and months of work to produce results. When the only stories told are ones of massive budgets and GPU clusters that use a town’s worth of power, it creates a belief that for machine learning to be effective it needs to be a Grand Investment That Will Change the Business. But this is ridiculous! Just like you don’t need to be a bonafide tech company to create software, you don’t need to undergo a massive machine learning transformation to meaningfully drive your business using AI. Models can be created and used in small ways that provide simple victories for companies, immediately lowering risks and costs.

This blog post is the story of one such project at T-Mobile.

At T-Mobile, many people prefer to connect to us through text-channels such as text messages, Twitter, and messaging through the T-Mobile app. The people who message us may be T-Mobile customers interested in topics like how to upgrade a phone or pay a bill. They also may not yet be a customer and want to switch to T-Mobile, so they are looking to have a conversation about joining.

As we looked for ways to improve the messaging experience, we noticed that lots of conversations had an initial message from the person with a question in it like: “how much to switch 4 lines to T-Mobile?” or “how do I get Netflix on my account?” Our messaging software would respond to customers by asking: “are you a customer?” This was absolutely absurd, since the text of the customer’s question usually answered ours — for instance someone asking about changing their plan is clearly a customer. It seemed like a problem a computer should be able to solve, so we set out to do it. The AI @ T-Mobile team immediately dived in to make this better.

Thankfully, we had a great set of training data for a machine learning model. We had plenty of historic logs from conversations where people messaged T-Mobile, then immediately selected from our UI if they were a customer or not. Each person’s first message would be the input data to the model, and the selection they made would be what our model was trying to predict. Since we had historic logs of conversations and people’s customer selections (with customer data removed), we were in great shape.

An example of a message a person sent to T-Mobile and the picker the person selected their choice from.

Since the data was a set of strings (the messages) for the input and a yes or no variable (if they selected customer) for the output, the data was not large in size per data point. Further, the amount of data points we ultimately needed for the model was in the tens of thousands range, not the millions range. All together, this meant that we could make a machine learning model without having to worry much about big data.

We decided to train a neural network model on this data. Neural networks are especially well suited for text input since they can easily parse the sequence of words in the text. By using a neural network approach we could trust that the model would be able to find the important parts of the message and we didn’t have to spend time engineering features like we would with a different type of model. Neural networks are built on layers of nodes, where each layer provides a different function for the overall model. Out network had three key layers:

  1. An embedding layer. This layer places the words in a small-dimensional space so that similar words in the message are treated similarly by the model (like “payment” and “payments”)
  2. A convolution layer. This layer creates features out of sets of five words in a row. By doing so it looks for important phrases in the message (like “pay my bill”)
  3. A dense layer. This layer reduces the network down to a single number which is used as the probability for the person being a customer.

We used R to train the model with the Keras and TensorFlow packages for training the neural network. Since our model was only three simple layers we were able to be quickly train and test it. We left a test set of messages out of the training data to validate the model worked well enough for our messaging channels.

We then deployed our R neural network as an API in a Docker container. By using Docker containers with the T-Mobile standard engineering pipeline our model was soon put into production and helping people get routed faster. For an introduction to R in production check out our blog posts, GitHub repo, and RStudio::conf() video.

With our tool in place we were able to reduce the number of people who saw the automated “are you a customer” selector by half. This was a great business success, removing a point of friction from the customer experience and getting customers to the experts who can help them faster. At T-Mobile we have the core belief that the customer should come first, and this machine learning model allowed us to drive towards that.

What’s more interesting than the technical approach we took was the things we didn’t need to successfully launch this product:

  • Big data platforms like spark. Because out data was reasonably sized, we were able to train our models locally rather than having complex platforms.
  • Setting up virtual machines with GPUs for training. The simplicity of the problem meant that training on CPUs was fast enough to be practice, no graphics cards necessary.
  • Any feature engineering on the data. Since the input data was just text, there wasn’t anything to engineer. We fed it into the neural network and saw immediate results.
  • Cross-validation and other training techniques. Doing the traditional method of leaving a small percentage of data out of our model when training to use in testing had almost no impact on performance and got us to market more quickly.
  • Fancy retraining methods. when we looked at the data we found almost no change to it over time. Because of this we didn’t have to worry about auto-retraining or complex deploying strategies.

By doing without the complexities that most people discuss when they talk about machine learning, we were able to quickly get to market and easily maintain our work. As others look for use-cases for machine learning at other companies, consider what you can do if you limit yourself to simple projects.

Thanks for reading, and check out our live demo of the “are you a customer” model to try it out for yourself.