Infuse AI in your Apps with Microsoft Cognitive Services

Challenges to smarten up your apps, following our October 13th, Bucharest AI Workshop

Alexandra Petrus
Bucharest AI
11 min readOct 25, 2018

--

Paul Graham, in his ‘Six Principles of Making New Things’ has a very brief and good approach to the pattern of making any new things:

Find (a) simple solutions (b) to overlooked problems (c) that actually need to be solved, and (d) deliver them as informally as possible, (e) starting with a very crude version 1, then (f) iterating rapidly.

We based our workshop on this pattern. Let’s have a look at the challenges (you can go through them by yourself some scrolls below) and their respective requirements. In the meantime, tune in our Github, LinkedIn, Facebook and stay updated to our events and initiatives.

Wanna join us for our next workshop on ‘Pre-Trained Models* for NLP’? Register & see you on October 25th @ Impact Hub Timpuri Noi.

*Language modeling is the task of predicting the next word in a text given the previous words. It is probably the simplest language processing task with concrete practical applications such as intelligent keyboards and email response suggestion (Kannan et al., 2016)

Intro to Microsoft Cognitive Services

Azure Cognitive Services are APIs, SDKs, and services available to help developers build intelligent applications without having direct AI or data science skills or knowledge.

Microsoft Cognitive Services expands on Microsoft’s evolving portfolio of machine learning APIs and enables you to easily add intelligent features — such as:

  • emotion and video detection
  • facial, speech and vision recognition
  • speech and language understanding

into your applications.

The goal of Azure Cognitive Services is to help developers create applications that can see, hear, speak, understand, and even begin to reason.

The catalog of services within Azure Cognitive Services can be categorized into five main pillars:

  • Vision — Image-processing algorithms to smartly identify, caption and moderate your pictures.
  • Speech — Convert spoken audio into text, use voice for verification, or add speech recognition to your app.
  • Language — Allow your apps to process natural language with pre-built scripts, evaluate sentiment and learn how to recognize what users want.
  • Search — Add Bing Search APIs to your apps and harness the ability to comb billions of webpages, images, videos, and news with a single API call.
  • Knowledge — Map complex information and data in order to solve tasks such as intelligent recommendations and semantic search.

Bucharest AI partnered with Microsoft & Sorin Peste, Data & AI Technical Solutions Professional to deliver participants a 1-day hackathon demonstrating how to build intelligent algorithms into apps, websites, and bots so that they see, hear, speak, and understand your user needs through natural methods of communication. The main challenges to solve real problems using cognitive services counted:

  1. Recognise faces in images while protecting people’s privacy;
  2. Automatically inspect components on a factory production line;
  3. Automatically detect tools and help workers become more efficient.

It’s been all about impactful applied AI use cases.

Audience: Web & Mobile Application Developers looking to make their apps more intelligent

Tools: The programming language and development tools were chosen by participants. These challenges can be completed by using plain HTML and Javascript, Node.js, .NET, C, C++, Java, Python, Go, Rust, Ruby… basically, any development framework which allows to make REST calls to existing services.

Check our Github repo of the workshop and go through the challenges yourself, at your own pace or bookmark it for later.

Read further for the real life problems. Y’ready? Here’s a little intro video where Sorin goes through a few real examples.

Challenge #1. Recognise faces in images while protecting people’s privacy

Here’s how not to do it:)

The Challenge

Your challenge is to enhance an existing CCTV system with the capability to:

  • receive training to recognise specific people (employees) and distinguish them from other people (visitors) inside a facility
  • recognise faces in the footage it captures
  • visually mark the faces of people it has successfully recognised (employees)
  • blur (or otherwise obscure) the faces of the people it doesn’t recognise.

To keep things simple, you will be working with still images rather than video.

NOTE: There is no image dataset for this challenge. You could either take pictures of yourself and your team members to use on the Face API, or search online for publicly available images.

Tips

  1. The Face API should be useful for this challenge.
  2. You can create a PersonGroup and then create Persons to define the list of recognisable people (employees). You will need to provide at least 2–4 photos for each person in order for them to be correctly recognised.
  3. If you’ve got a Windows 10 machine, you can use the Intelligent Kiosk app — either the store version or by building it from source — to quickly configure your Person Group and upload training images.

Useful Links

  1. Face Service Documentation
  2. Face API Reference and Testing Console
  3. Example: How to identify faces in images
  4. Example: How to Analyze Videos in Real-time
Original image
Processed image

Challenge #2. Automatically inspect components on a factory production line

Quality assurance in manufacturing is demanding and expensive, but also absolutely crucial. After all, selling flawed goods results in returns and disappointed customers. Harnessing the power of image recognition and deep learning may significantly reduce the cost of visual quality control while also boosting overall process efficiency.

According to Forbes, automating quality testing with machine learning can increase defect detection rates by up to 90%. Machines never tire, nor lose focus or need a break. And every product on a production line is inspected with the same focus and meticulousness.

Yield losses, the products that need to be reworked due to defects, may be one of the biggest cost-drivers in the production process. In semiconductor production, testing cost and yield losses can constitute up to 30% of total production costs.

Traditional quality control is time-consuming. It is manually performed by specialists testing the products for flaws. Yet the process is crucial for business, as product quality is the pillar a brand will stand on. It is also expensive. Electronics industry giant Flex claims that for every 1 dollar it spends creating a product, it lays out 100 more on resolving quality issues.

Since the inception of image recognition software, manufacturers have been able to incorporate IP cameras into the quality control process. Most of the implementations are based on complex systems of triggers. But with the conditions predefined by programmers, the cameras were able to spot only a limited number of flaws. While the technology may not yet have been worthy of the title game changer, the image recognition revolution was one step further.

Artificial intelligence may enhance the company’s ability to spot flawed products. Instead of embedding complex and lengthy lists of possible flaws into an algorithm, the algorithm learns the product’s features. With the vision of the perfect product, the software can easily spot imperfect ones.

The Challenge

You are working inside a factory producing components which happen to have the shape of LEGO bricks. There are various shapes and sizes of components, for multiple purposes. The components are travelling individually on the assembly line; however, not all of them are positioned correctly for the next step in the assembly process.

All components should be positioned face up, with the bindings facing the camera or at the very least clearly visible by the camera. Components which are facing down or sideways should be flagged by the visual inspection system so that they are correctly repositioned by a robot or human operator.

We have prepared a set of images to assist you in the training and validation process.

CLICK HERE to download the image archive for this challenge.

NOTE: The image dataset is based on the openly available Images of LEGO Bricks dataset published on Kaggle.com.

The image archive contains two folders:

  • training — images which you have available during the project implementation, and can be used for training a Computer Vision model
  • validation — images from the production phase, which the system should be able to correctly classify despite never seeing them before during training.

Your application should allow the user to upload one or more test images of components, then display next to each image whether that component is correctly or incorrectly placed. Below is an example of a possible user interface you could create.

NOTE: In order to make sure that your Computer Vision model can respond to unseen images, when testing ALWAYS use images from the validation folder inside the dataset. Images from the validation folder should NOT be used for training.

Tips

  1. The Custom Vision Service should come in handy for this challenge.
  2. Since this is a classification problem, you will need to create a Classification project in the Custom Vision service.
  3. There are many images available inside the training folder, but you might not need to use all of them for training. Indeed, it may prove counterproductive to do so!
  4. Although you could opt to create just two tags (for example Correct and Incorrect), the model may perform better if you create more than two, in order to capture various possible component positions.
  5. The minimum number of labeled examples for each tag (for example, Correct) in Classification is 5. However, the more examples you can provide, the better the model will generally perform. We recommend to provide about 50labeled examples for each tag.

Useful Links

  1. What is the Custom Vision Service?
  2. How to build a classifier with Custom Vision
  3. Test and retrain a model with Custom Vision Service
  4. Use the prediction API
  5. Custom Vision Prediction API reference

Challenge #3. Automatically detect tools and help workers become more efficient

Computer Vision is the field that deals with empowering computers the ability to ‘see’ things like humans. Object Detection is a basic visual perception task and one of the key areas of applications of Computer Vision. It essentially deals with finding and locating specific objects within an image.

For detecting generic objects (like car, person, table, tree) there are open-source and pre-trained models available. However, if you want an algorithm to detect very specific objects (like a small raw tomato or a large ripe tomato), you will need to train an object detection algorithm of your own.

Object detection in Manufacturing

Finding a specific object through visual inspection is a basic task that is involved in multiple industrial processes like sorting, inventory management, machining, quality management, packaging etc.

Quality Management

Till date, the quality control part of the manufacturing cycle continues to be a difficult task due to its reliance on human-level visual understanding and adaptation to constantly changing conditions and products.

With AI, most of these complications can be handled. AI can automatically distinguish good parts from faulty parts on an assembly line with incredible speed allowing you enough time to take corrective action. This is a very useful solution for dynamic environments where product environments are constantly changing, and time is valuable to the business.

Inventory Management

Inventory management can be very tricky as items are hard to track in real time, something is always added, removed and moved every day.

Poor Inventory management can hurt the company both in terms of capital and time. AI system can perform automatic object counting and localization that will allow you to improve inventory accuracy.

AI automation removes human error from the equation by accurately counting your holding and outgoing inventory. When automated, businesses will order the right quantity of products at the best possible price, ensuring that no money is wasted on inaccurate or extraneous orders.

Sorting

Manual sorting involves high cost of labor and accompanying human errors. Even with robots, the process is not accurate enough and is still prone to a discrepancy. With AI-powered Object Tracking the objects are classified as per the parameter selected by the manufacturer and statistics of the number of objects is displayed. It significantly reduces the abnormalities in categorization and makes the assembly line more flexible.

For example, in Agriculture industries Sorting plays a critical role in the assembly line. It is imperative for the company to identify and discard damaged fruits/vegetables which can affect the finished product.

AI-powered Object Detection can help transform this tedious and manual process into an efficient and automated process while maintaining the same if not better level of accuracy.

Assembly Line

Today we have fully automated assembly lines even for complex products like cars. However, each movement of robotic arms and raw materials/components are defined and played as per a script.

To give the modern automatic assembly line more flexibility, it is important to teach machines to locate and move different products/components accurately. AI-powered object detection opens the doors towards this possibility.

The Challenge

You are tasked with adding object detection capabilities to existing surveillance cameras. You need to make it possible to detect several commonly-used tools like

  • hammers
  • ratchets
  • drills
  • screwdrivers
  • chainsaws
  • wrenches.

You have a dataset of images available, which you can use for training the model.

CLICK HERE to download the image archive for this challenge.

NOTE: The image dataset is based on the Open Images Dataset V4 dataset published by Google. Only a subset of data is used for this challenge.

The image archive contains three folders:

  • train — images which you have available during the project implementation, and can be used for training a Computer Vision model
  • test — images which you have available during the project implementation, and can be used for testing the performance of the model. This data is NOT used during training.
  • validation — images from the production phase, which the system should be able to correctly classify despite never seeing them before during training.

Each folder also contains text labels inside a Label folder. These text files describe the rectangle bounding boxes for each object, in pixels.

Chainsaw 327.575552 360.854994 782.249984 529.166559

Your job is to build an object recognition model and a small application which will allow the user to upload a photo containing one or more tools. The application should highlight the tools it recognized in the image, with a confidence level for each tool.

NOTE: In order to make sure that your Computer Vision model can respond to unseen images, when testing ALWAYS use images from the test and validation folders inside the dataset. Images from the test and validation folders should NOT be used for training.

Tips

  1. The Custom Vision Service should come in handy for this challenge.
  2. You will need to create an Object Detection project in the Custom Vision service.
  3. There are many images available inside the training folder, but you might not need to use all of them for training. Indeed, it may prove counterproductive to use some of the images!
  4. The minimum number of labeled examples for each tag (for example, Chainsaw) in Object Detection is 15. However, the more examples you can provide, the better the model will generally perform. We recommend to provide about 50labeled examples for each tag.

Useful Links

  1. What is the Custom Vision Service?
  2. Tutorial: Build an object detection project with C#
  3. Tutorial: Build an object detection project with Python
  4. Tutorial: Build an object detection project with Java
  5. Use the prediction API
  6. Custom Vision Prediction API reference

Thank you for joining us at the Cognitive Services Workshop

What’s next?

A NLP Workshop and The Meetup of Extraordinary AI practitioners.:)

Wanna join us for our next workshop on ‘Pre-Trained Models for NLP’? Register & see you on October 25th @ Impact Hub Timpuri Noi.

Or tune in to our meetup.

As always, powerful things happen when like-minded people connect. Join us on: Github, LinkedIn, Facebook.

--

--

Alexandra Petrus
Bucharest AI

New Tech Product Strategist & ENFJ-T | @BucharestAI |@Women_in_AI | ex-VP Products @reincubate | ❤#products #innovation #emergingtech #AI