India Hackathon: Streaming Inference For Real-Time Scoring and Anomaly Detection
By Som Satapathy, Mandeep Gandhi, and Per Andreasen.
This is a part of a series of Adobe Experience Platform innovations from our recent hackathon in India. In this post, a team of Adobe Experience Platform professionals describes how they used their time at Adobe India Hackathon 2019 to develop a solution to stream inference for real-time scoring to help our enterprise customers improve personalization.
Imagine that $5,000 was just transferred to your bank account. Your bank would need to assess swiftly if the transaction was fraudulent or legitimate in order to comply with money laundering laws. In order to make your bank's security procedures truly efficient they have to be able to evaluate thousands or millions of transactions like yours in real-time. But your bank is not the only company that needs a swift security system. Most businesses need it in order to protect them and their employees from fraud, health risks, and many other threats.
Machine learning models can score any kind of event. These models can even score events in real-time, when they base their decisions, also known as inference, on training and experience with the same kind of events. We would probably all prefer to eradicate any delays in monetary transactions, email delivery, and other events that need to go through security procedures. Traditionally, however, these procedures have scored events in batches. Monetary transactions have been gathered for manual review and emails have been sent through spam filters in batches creating significant delays.
That is why we, engineers and developers from the Adobe Experience Platform team in Bangalore, decided to create a proof of concept for real-time scoring during Adobe India Hackathon 2019. For the hackathon, we chose the most common use case we could think of — binary anomaly detection like the one that could be used in a spam filter for email servers. Instead of sending emails through in batches of 100, for instance, we built a model capable of classifying emails as “spam” or “not spam” on the fly. The end goal of our hack was to enable Adobe Experience Platform to offer real-time scoring as an innate capability.
Real-time anomaly detection is so useful because it empowers security systems to act fast in handling anomalous events. Based on real-time inference, security systems can detect and predict unwanted scenarios without delay.
Our team consisted solely of experienced Adobe Experience Platform engineers and developers. Unlike the other attendees, who have specialized in other Adobe technologies and platforms we already knew the platform well. This allowed us to essentially focus on streaming connectors and work in close collaboration with the Adobe Pipeline team on Adobe Experience Platform’s connection to sources and destinations.
Detecting and predicting unwanted scenarios
Our goal was to build a system that can power real-time intelligent applications on Adobe Experience Platform and integrate with Data Science Workspace to power real-time scoring for custom machine learning models. The result we aimed for was considerably faster processing of any kind of data source to benefit all security systems.
Adobe Experience Platform currently lets you prepare and explore your data sets in several ways before ingesting data into machine learning models as part of Data Science Workspace. The current cycle of building models for evaluation and publishing is batched, but the recent development of real-time machine learning will help transform streaming segments to the desired inference. When this inference is applied to a security system, it can quarantine spam emails as soon as they come in without delaying important work emails that might need immediate action or responses from the recipient.
Landing data in streaming sinks
First off we wanted to source the streaming data into a sub-sink. From there we would stream the inferred data out into a streaming sink that can generate alerts or reports. For the purpose of enabling a streaming source and streaming inference, we enabled machine learning functions that can consume models on streams. At the hackathon, we built streaming connectors for ElasticSearch as that is easy to build from.
Ideally, we wanted to build a proof of concept around the existing services on Adobe Experience Platform to be solved using Pipeline and Pipeline-Smarts. Unfortunately, this turned out to be too complex a use case for our three days of work at the hackathon. That is why we decided to go with the anomaly detection use case, which is more common and easy to explain. We have some more sophisticated and time-consuming artificial intelligence use-cases on our long-term wish list.
Our single biggest challenge during the hackathon was time. We had a tight timeline in which to deliver a new set of services on Adobe Experience Platform in the days before and after the hackathon, so we did not have much time for fine-tuning. While we were pleased that the hackathon let us get a good sense of what is feasible, we are now itching to work more on this solution.
Putting the mighty Pipeline Smarts and Kafka Connect to good use
The new Stream Computing solutions Pipeline-Smarts and Streaming Connectors on Adobe Experience Platform played a big part in our choice of hack to work on. By streaming events into Pipeline and applying pre-built models using Pipeline-Smarts we were able to use deep machine learning to detect anomalous events in real-time. We then proceeded to egress the inferred streams into sinks and outbound them into ElasticSearch for dashboarding. We created the dashboard in Jupyter interface.
We built a template to select a sink, model, and parameters to execute the streaming inference and real-time scoring. It was automated to be packaged and injected into a streaming context. For the artifact generation, we used Smarts machine learning functions and eventually streamed the data out through a streaming sink connector. Without the assistance we received at the hackathon we would not have been able to get to this detailed concept. A lot of brainstorming and exciting discussions with other attendees helped us move forward with our work. Everyone was really supportive and enthusiastic about our solution as well as the other work that was going on around us. The support of our colleagues has made us even more eager to make this a first-class solution in Data Science Workspace.
After our three days spent hacking Adobe Experience Platform our solution was voted the “Most Innovative Hack”. Our solution is a powerful model for a generic streaming inference flow and can be used with a range of models for inference. It can support any machine learning model built using any framework like Spark, scikit-learn or TensorFlow. When this solution is made available, it will let you select your source of data, a sink, and screens in your dashboard to let you execute the model for your purpose. When executed this will prepare a kind of artifact to use to start a streaming flow. It will also be able to offer automated runtime artifact generation and model embedding into a streaming context. And, in the end, you will be presented with a chart to give you a nice overview of the flow of streamed data.
Since the hackathon, we have been collaborating with other teams of Adobe engineers and developers in two different regions. Our next steps are not concrete yet, but our goal is to build an extended architecture that Data Science Workspace can leverage. We will be working closely with the Data Science Workspace and Adobe Experience Platform teams to take our learnings from the hackathon further and create a solution that will eventually be generally available. Streaming Connectors will be generally available in May and we plan on going into detail with them in later blog posts.
Follow the Adobe Tech Blog for more stories from the Adobe India Hackathon 2019, and check out Adobe Developers on Twitter for the latest news and developer products. Sign up here for future Adobe Experience Platform Meetups.
- Machine learning — https://www.adobe.com/sensei.html
- Adobe Experience Platform — https://www.adobe.com/dk/experience-platform.html
- Adobe India Hackathon 2019 — https://medium.com/adobetech/hacking-adobe-experience-platform-to-create-more-valuable-applications-than-we-had-dared-to-hope-b40bc9116fe
- Data Science Workspace — https://www.adobe.com/dk/experience-platform/data-science-workspace.html
- Streaming segments — https://docs.adobe.com/content/help/en/experience-platform/segmentation/api/streaming-segmentation.html
- IoT devices — https://theblog.adobe.com/technology/iot/
- Streaming sink — https://docs.adobe.com/content/help/en/experience-platform/ingestion/streaming/kafka.html
- Streaming connectors — https://docs.adobe.com/content/help/en/platform-learn/tutorials/data-ingestion/understanding-source-connectors.html
- ElasticSearch — https://www.elastic.co/elasticsearch/
- Pipeline and Pipeline-Smarts — https://medium.com/adobetech/stream-processing-at-scale-within-adobe-experience-platform-909ed502da71
- Streaming Connectors — https://docs.confluent.io/3.0.0/connect/
- Artificial intelligence — https://www.adobe.com/dk/sensei.html
- Jupyter Interface — https://jupyter.org/
- Spark — http://sparkjava.com/
- Scikit-learn — https://scikit-learn.org/stable/
- TensorFlow — https://www.tensorflow.org/