Saving Santa’s IT Architecture With AI and Cognitive Services
Automatic email detection, text and image processing, Logic Apps, Functions, Notebooks & more!
On the last few days of work before Christmas our internal architect team hosted a “Save Santa” hackathon where we were given some pain points that Santa and his elves may have to deal with in their IT system.
As these pains may be transferable beyond the North Pole, hereby my team’s complete Azure solution to automate and add AI capabilities to an email-triggered insight gathering pipeline.
The pains we are targeting:
- Overwhelming number of emails coming in that have to be answered / sorted / processed / dealt with. (Santa’s wish lists being sent over from all over the world)
2. Getting insights out of data at a first glance such as:
- Dealing with data in different languages coming in (that needs to all be converted into one source of truth before further processing)
- Understanding the sentiment of text (deal with sad kids first, or take different actions based on different readings)
- Get a quick summary of important details in messages (if kids take forever to get to the point)
- Classify images (picture attachments of toys kids might want)
- Aggregate all these insights into one place, automate all these steps, have it all running in one platform with one authentication method (Azure Active Directory).
Overview of solution
- Detect emails with attachments and key word “wish list” in subject.
- Process email to remove html and store the body (text) and images of presents
- Create a notebook using cognitive services & custom vision APIs to process the text and images of the email request. (A. Translate the email and gather more insights from the wish like sentiment, key phrases. B. Classification of the received image into pre-set gift categories)
- A. Connect the notebook to the data lake, and B. Run a blob trigger when new email items have been stored to start text.
- Centralize all information and create dashboard
1 & 2: Email detection with Logic Apps and cleaning with custom Azure Functions
- This section handles the email detection by monitoring my inbox (or Santa’s) for wishes from children.
- We assume kids would send images of the toys they want, and use a type of language in the email subject.
For the azure solution of this part we adapt this tutorial. (see this for step-by-step instructions)
Hereby the complete picture of the Logic App we created
- We are checking for an email with the key phrase “wish list” and including attachments,
- We then store the html-cleaned email body as well as each attachment picture in our blob storage.
- As you can see in the picture on the left, the email and two images from our email example have been stored in the appropriate formats.
3. A. Text analytics notebook
- First we will create the Databricks notebook that we wish to run. See github for my example using cognitive services.
- We run use two text-based cognitive services: translator (language detection & translation), and text analytics (sentiment, key phrase extraction).
See more documentation on how to use the cognitive services SDKs or APIs structure here.
Example of the result on the Notebook run on previous sample email:
3. B. Custom vision module
- The only unfilled field from the previous notebook is the Toy recognition. For this we can train an call a custom vision module.
- You can deploy a custom vision module in the Azure portal and use the portal on customvision.ai to train this model. You can follow the steps here to create a classifier model.
- For my toy detection module I’ve added at least 5 example of the “car”, “teddy bear” and “negative” classes and done a quick train on this.
- The result can classify teddy bears and cars, which we will run on the images we got from the email attachments.
- This can be done in the portal, or to continue our automation pipeline we can make the call to this service with the custom vision sdk.
4. A. Get Databricks data straight from the data lake
To integrate the notebook operations into our automated pipeline, we need to make sure the notebook can read from our data lake, and create a trigger. For this we need to create a service principal and give it access to the data lake so that Databricks can make use of it to read the files.
See this tutorial for step by step info on implementing these steps. (with linked tutorial for the service principal creation)
- I’ve made a santa-access app on my Active Directory that I then give Storage Blob Data Contributor access on the data lake. Then I mount the data lake in Databricks to be able to access the files.
- Then you can mount your folder and read the files from the notebook.
- You can use other alternative ways to access the data in Databricks as well. The notebook I put in github does all the cognitive services operations on an example string so you can adapt to whatever way you choose to bring over your files.
4. B. Run the Databricks notebook as a blob-triggered pipeline from ADF
Now we want to automatically run our AI modules once the new email data has been stored in our data lake.
We create a Databricks compute linked service and a Databricks step pipelien in Azure data factory adapting this tutorial.
After we create the Linked Service we will create the Notebook step with parameters to point to the data location.
You can then create a “Blob trigger” as the way to start your ADF pipeline
5. Aggregating the results
- All the information can be stores in a central file that can there either be directly sent to PowerBI or the data can be store in a database that is then connected.
- For this demo we also added fields like “delivery status” or “address” to make the visuals more insightful and showcase other extensions you can add to this solution.
Possible extensions to this solution:
- Add a Logic App module to gather data from social media as well
- Use OCR modules to also scan hand-written letters in the same way
- Add security layers like using Key Vault to store secrets and keys and transfer these in-between the different applications
- Share data with different personas / users who may need access to the PowerBI reports or the raw data for further processing.
Big thanks to Remy Ursem and Jesse van Leth for being on our awesome, winning(!) hackathon team!
The End.
Merry Christmas!!!