Detecting Malicious Requests with Keras & Tensorflow

Security is a concern for any public facing web application. Good development practices can assist with defending against attempts from users looking to expose data or bring down an app. However, sometimes not all attack vectors are handled and new exploits are bound to be discovered. This is where security software can assist with monitoring and preventing unforeseen attacks.

So what if you could use the power of Google’s Tensorflow engine to decide on whether a given request is considered malicious? Well that was the question I was looking to answer while participating in Slalom’s recent AI hackathon. The following post outlines the technical details of a PoC for a security monitoring application which was built with the help of a couple other Slalomites.


The objective was to build an application that can analyze incoming requests to a target API and flag any suspicious activity. The app would also host a simple UI to display these flagged requests and provide the ability to take precautionary action when necessary. The team dubbed the name to this malicious request detection application ‘SecuritAI’.

As with any Machine Learning problem, the data is the valuable resource for making an intelligent model. In this case access logs from an API is what I needed. A mock API had to be built to produce a good dataset of access logs to process. Loggers would need to be added to the mock API to accumulate access logs in batches for training purpose, as well as the ability to stream logs for real-time processing.

Planned Proof of Concept

Getting Started

The first step on this MVP was to focus on choosing a prediction model and prove that it was a good fit for our problem. Detecting if a given request is intended to do harm or expose sensitive information was the problem to solve. To start we decided to narrow our focus on determining if a request contains content that can be considered an injection attempt. Injection attacks can come in various forms: by clever insertion of SQL, XML, JSON, or source code into requests. We were interested in only knowing if given request is an attempted injection or not, this is simply a binary classification problem. There are many models which can assist with this problem: K-NN, Naive Bayes Classifier, Support Vector Machines (SVM), Neural Networks, and unsupervised models.

Deep Learning/Neural Networks are getting a lot of attention with the latest breakthroughs in academia and practical uses in the field. They have proven to be exceptional at image recognition and natural language processing (NLP). What if we could tap into the NLP capabilities of a neural network for this classification problem? This is what we wanted to test out.

We’re provided access log entries in the form of JSON from the mocked API, like so:

"timestamp": 1502135820943,
"method": "get",
"query": {
"query": "Lawn & Garden Buying Guides"
"path": "/search",
"statusCode": 200,
"source": {
"remoteAddress": "",
"userAgent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_3) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/59.0.3071.115 Safari/537.36"
"route": "/search",
"headers": {
"host": "localhost:8002",
"connection": "keep-alive",
"cache-control": "no-cache",
"user-agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_3) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/59.0.3071.115 Safari/537.36",
"accept": "*/*",
"accept-encoding": "gzip, deflate, br",
"accept-language": "en-US,en;q=0.8,es;q=0.6"
"requestPayload": null,
"responsePayload": "SEARCH"

The JSON log entry as is cannot be used as input for a neural network model. Models require numeric inputs to work with, therefore text preprocessing is necessary. Since contents of a request log contain various character strings, symbols, and digits we choose to preprocess each entry into a sequence of characters.

Processing the log entries as a sequence of characters is performed by mapping the each character of the request log text to numeric values within a populated word dictionary. The associated numeric values represent how frequently a character is seen. The dictionary is initially created and fitted using the training data, that way subsequent characters can be mapped to previously seen character values. E.g In this segment of the word dictionary, “,” character is the 7th most frequent character from the training dataset:

{" ": 39, "(": 77, ",": 7, "0": 8, "4": 26, "8": 28, "<": 64, "D": 71, "H": 59, "L": 61, ...

A great choice for learning a sequence of characters is a Recurrent Neural Network (RNN). More specifically a Long Short-Term Memory (LSTM) variant of RNNs was chosen because of it’s wide use and success in learning sequences. LSTMs are a bit complex and it is not the focus of this post, but having a high-level understanding helps when tinkering with a model. More on sequence learning and LSTMs in practice can be found in this great write up by Andrej Karpathy — Unreasonable Effectiveness of RNNs.

To quickly develop the neural network model, we opted to use the high-level Keras API running on top of Tensorflow, as opposed to directly interacting with Tensorflow. Keras allows us to quickly prototype our model and save time by providing a boilerplate setup for Tensorflow.

Neural network models in Keras are very easy to work with, simply instantiate a model object and start adding layers! The initial embedding layer specifies the expected dimensions of the vector inputs, the LSTM hidden layer is defined with 64 neurons along with separate dropout layers to reduce variance, and finally a dense output layer to produce the classification confidence.

LSTM RNN in Keras


The approach to narrowing our focus on binary classification with a LSTM RNN means we are performing supervised learning with our model. Therefore every log entry in the training dataset needs to have an accompanying label to describe if that logged request is normal or an attempted injection attack.

0 = Normal request log entry
1 = Request log indicates attempted injection attack

As for generating data to train the model, the mock API was utilized to simulate a very simple e-commerce application. E.g. mocked out endpoints for a /login, /search, /checkout. Since the we don’t have a legitimate flow of users to the mock API we incorporated a couple runtime options to run the server and execute requests automatically (e.g. npm run attack inject). These start commands ran through a typical user flow, as well as randomly performing attempted injection attacks. We tuned these automated request flows to run ~100 API requests per minute to quickly accumulate logs for a training dataset.

A custom logger was plugged into the server framework to output the desired dataset format. In this case, a csv containing rows of request log JSON and associated label was all that was needed. In order to differentiate between normal and malicious requests an ‘attack’ header was appended to all malicious requests performed from the automated client. The label for each request log entry would then be determined by checking for the existence of that header. If ‘attack’ header exists then label the log with ‘1’, and remove the ‘attack’ header before writing the log, otherwise label the log as ‘0’.

Due to some limitations of using a mocked API to produce a training dataset, and also to ensure we have a more generalized classifier only necessary log fields were extracted during preprocessing. These properties included: ‘method’, ‘query’, ‘statusCode’, ‘path’, ‘requestPayload’. Including varying headers introduced some randomness which hindered the sequence learning, so that was left out of scope for this phase. Additional data, preprocessing, and possibly a separate classifier could be used to analyze the header contents.

An automated client was left to run for several hours to accumulate a decent amount training data. To train the model quickly an AWS p2.xlarge Deep Learning EC2 instance was used to perform the computations. AWS’s Deep Learning AMIs are preinstalled with most of the popular AI libraries and APIs you need to start performing heavy training on GPUs. The 90 cents per hour this EC2 instance cost is not only well worth the time saved training a model on a large dataset, but also worth saving time on installing and configuring Tensorflow to properly run on a GPU. We found the setup and configuration on a new machine proves to be quite time consuming to get the dependencies and GPU configuration just right for the version of Tensorflow desired.

To reduce bias, the dataset generated contained ~50/50 normal and malicious request logs. The dataset was split into 75% training and 25% evaluation subsets. After a couple iterations the model was able to obtain a pretty high accuracy. This was somewhat expected due to the limited variations of request combinations that were generated, but nevertheless it was sufficient to work with in our short timeframe.

Training results from ~23,000 request log samples

The accuracy and loss metrics depicted above are also captured in logs via an attached Tensorboard callback within the Keras training script. These logs are an accumulation of checkpoints recorded during training which are useful for visualizing the model’s performance during and after training.


With a trained model it was time to implement the SecuritAI UI application that would host the model to perform the real-time predictions. As with the mock API, the team stuck with what we knew best, JavaScript. A React UI was built as a dashboard to monitor the activity coming from the stream of request logs. The stream is managed by AWS Kinesis to bridge the communication between the mock API and SecuritAI UI.

At this point the tech stack is looking good, but one slight issue… Keras is a python library, how is the model suppose to make predictions in JavaScript? Luckily, there’s an npm package that’ll help for this need, that would be keras-js! Keras-js allows JavaScript apps to run saved Keras models trained on the Tensorflow engine. Utilizing this library enabled us to run everything from our Node.js app servers.

<More info on keras-js:>

In Action

SecuritAI Demo

​Enhancements and Lessons Learned

Overall this was a fun application to develop to gain experience building out a model in Keras with Tensorflow and using it in a practical way. Other more simplistic ML models or platforms could have been used in place of the LSTM RNN chosen, but where’s the fun without some hackathon experimentation? Ideally, an unsupervised anomaly detection model could likely have been a better suited algorithm for this application, as training data would simply be a set of normal request logs for a set period of time. It is easier to collect a large amount of normal activity and hypothetically allow an unsupervised model to detect any anomalous activity given the initial baseline API activity. New vectors of attack could also be discovered with this method, but that is an enhancement for another day.

As for short term improvements, the model could be productionized by translating to run solely on the Tensorflow API, expanded and trained to better analyze contents of headers, and perform multiclass classification to categorize suspicious requests. In addition, the UI could be expanded to provide more abilities to tune activity notifications and react to unwanted requests.

Team Effort

Thanks to the team mentioned throughout this post! Which included: Panagiotis Psimatikas — the guy behind the SecuritAI idea, Miheer Munjal — to provide business acumen, and myself — a Machine Learning enthusiast.

Checkout the code on GitHub: