Hertz AI

At Hertz AI we modulate your business with AI and offer a cutting-edge tailored, secure, in-house…

Engineering A Deep Learning Project

--

Every Solution needs to be engineered to solve the problem

As technology progresses it overcomes some of the previous roadblocks in business operations. Adoption of new technology is always challenging and it’s one of the number one concerns for the business. They evaluate the cost of adopting new technology versus the business gain. AI has generated a considerable amount of attention due to its success among big technology companies. These businesses have portrayed that a huge volume of data and infrastructure as a requirement. The resources, infrastructure, cost, the complexity of the model & lack of clear solution have been a roadblock for the companies trying to adopt it.

AI is a buzzword where businesses are either pulled into it or unfathomed by it. Unlike any other project, it has a caveat that needs to be overcome when engineering an AI project.

The Problem You’re Trying To Solve

Factors to consider

Having a clear goal sets up the deep learning project. It involves establishing a timeline, resource & infrastructure requirement, business problem that you’re solving, how complex is the problem to be resolved by normal methods, availability of the data, potential revenue vs cost incurred, impact of such a project to the business in terms of productivity, automation, etc, time to market, has any competitor that have already created the solution and their performance, what are the near-term revenue opportunity and long-term etc.

These are some of the pins to keep the tab on for deep learning projects. Formulating the problem in terms of mathematical formula provides ways to vary the solution depending on requirement. Constant evaluation of the solution you’re offering in terms of business and engineering standpoint provides assertion about the efficiency of the solution. You can also benchmark the results against the competitors or the state of the art in the field to evaluate the performance of the model.

Volume Of Data

Depending on the problem that you’re trying to solve there may be an exponential requirement of data. There are a considerable amount of data that are available as open source. There also cross domain ways to pre-train your model not directly on your task but the model understands subtlety in real-world distribution that can be utilized for your modelling.

Source Of Data

Having access to real-world data is a big deal but real data is crucial because it has all the subtlety one want to understand. It’s time-consuming or either too costly if you’re not the one owning the data. If the problem is in Computer vision then we can apply a dual strategy of synthetic data and real-world. With the advancement in GAN, we can create new background environment that is photorealistic. There is a caveat with synthetic data that the neural network may learn the noise difference between the foreground and background image to classify it rather than understanding features/subtlety. When we start combining it with real-world data the model learns more distribution.

The advantage of using a synthetic data is you can see the performance in the real-world image and if it gives you good result then use the model itself create the ground truth for your real-world data. This way it’s less time consuming for creating the dataset for training. In synthetic data, we’re trying to mimic the real world things like lighting variations, noise variations, artefacts, blur, different viewpoint angles, scaling, low resolution, high resolution etc. We can use cyclegan to mimic a real-world dataset we can use a synthetic image as input and real-world image as output. By this method, we are trying to learn real-world distribution in the data. Such a distribution space is quite big to learn for the neural network it would consume a ton of data.

It all depends on the target environment and if it’s constrained and controlled then good for you and if it’s the wild type of detection then brace yourself. The fact Amazon still uses barcode scanning on the app shows the complexity of recognizing products in the wild. Don’t let that stop you, there is always little technique that you can apply to overcome the parallels. It could as small as adding a loss function or conditioning the model on some constraints.

Pre-trained Model

Imagenet is a well-known case of pre-trained model availability. It classifies around 1000 image types but doesn’t restrict it from using it on your problem even though you’re classification scenario may be different. Neural networks that understand data distribution in the real-world scenario is an asset. The model can understand the lighting variation, foreground and background differentiation, resolution, pixel level noise variation, scaling of objects in the image. Training using a pre-trained model is always beneficial as it saves up training time. We humans always use prior knowledge even if it’s applied in a small proportion it's still beneficial.

Meta-learning

It’s a method to learn weight initialization for multiple tasks. Here the logic is training the neural network on multiple tasks instead of a single data distribution. We are trying to model the loss function across task so the neural network would have learned the weight initialization across task so that the inference can happen faster.

GAN

GANs are given less attention when it comes creating data for Deep learning projects. GANs have a generator and discriminator where the task of the generator is to create data that matches the input distribution while the task of the discriminator is to classify between real and generated image. Ultimately GANs optimize for generating realistic data. One of the main aspects of GAN is image translation where you can condition the GAN on certain parameters to vary it. This approach is interesting because you can alter certain features in an image. New features are also from the real-world distribution of data. This is what makes GAN so compelling choice for creating a new dataset for deep learning projects. They create photorealistic images that resemble the distribution of real-world data. Conditional GANs are well fit for this type of task as they allow you to alter certain characteristic in the image. GANs require data we can find either cross-domain data from the internet. The best way to get maximum of GAN is, using a pre-trained model that was trained on coco dataset or imagenet. Pre-trained weights are advantageous because they have learned a lot of real distribution like lighting, noise, foreground and background variation, structural information of objects.

Labelling

Labelling is a time-consuming task while we can leverage some of the pre-trained models to do the lifting for you. Semantic segmentation is a good way to differentiate a foreground and background. It’s a time-consuming task but we use some techniques like grab cut, edge detection. If you have white background images then grab cut can be an effective tool to automatically find boundaries for the objects and overlay images on different backgrounds. There are techniques like weakly supervised semantic segmentation or using object detection to create baseline labelled data.

Semi-Supervised Labelling

Here we use a dual strategy of creating synthetic data and training the network on the real-world data. With the model trained we use it to predict real-world images and generate labelled data for training.

Refining The Labelling

Combining techniques with deep learning creates better results. One such technique is semantic soft segmentation where they combine spectral matting with the semantic segmentation images to produce high-quality labelling. Having high-quality ground truth produces good results in deep learning so techniques like laplacian matting, Whitening and colouring transform to produce better semantic segmentation.

The Complexity Of The Problem

How do we represent the objective of the problem creates the solution. Most of the deep learning solutions revolve around adding skip connection, new loss functions, conditional, depth information, semantic information. Although there are many other techniques to fasten the converge. Let’s break down the solution that you’re after. Does the loss function provide enough feedback to the network to learn faster? Where does the model fail to generalize?

In a computer vision classification task, a model may fail to recognize smaller objects. There are multiple scenarios/edge cases that may present in an image classification/object detection/semantic segmentation task like varying resolutions, lighting, the scale of the object, etc. The problem to solve depends on end goal if you’re cameras are fixed inside a store with constant lighting and image resolution you have eliminated some of the scenarios. Type of network/model to use will rely on whether the prediction is real-time or can happen on the backend.

Problem Statement

For example, take an object detection task

1. Varying Lighting

2. Camera Resolution

3. Background variation

4. Object Size Variation due to the camera position

5. Noise

6. Video/ Still Images

7. How distinct are the products to classify among classes? Do they have overlapping features? As a human how would you have classified those objects based on features? What features do you want the model to classify?

8. Orientation Of the object in the images.

9. Do you require Depth of the object as well? How much of improvement would it make by adding depth maps? Am I classifying objects based on features alone or do I need to evaluate the geometry of the product to classify?

A similar structure would narrow down the architecture type to go after.

Neural Network selection

The first option is to look at the state of the art models in the task you’re trying to accomplish. Network selection depends on the objective. Start with a shallow network then progressively add more layers. Always try to find a base model that is already tested out. If you’re trying to solve computer vision task try utilizing VGG, Resnet networks. Then look at the short-coming in the paper and validate how the model fits you’re objective. Use these networks to establish a baseline.

There are many parameters to tweak here and there is no one solution. If you’re training accuracy or test accuracy is stuck at a certain value. There are multiple ways to resolve it by varying learning rates — Cyclic learning rates, superconvergence, increasing neural networks size, Newer architecture, skip connection, adding more inputs, reshaping the loss function etc. In DL projects it is good practice to progressively improve as you get more data from real-world.

Loss Function

Classification losses and pixel-wise l2 loss should be a good fit in the above scenario. Some problem would require considering the geometry of the object so an add-on loss function would be required. There are multiple ways to overcome false positive like using a focal loss or online hard negative mining.

Optimizer

Choice of optimizer can affect the outcome of the model. SGD and Adam work well on most deep learning projects. Use scopes for the model and use var_list in the optimizer if you’re freezing graph and training last layers or a new layer. Cyclic learning rates can help in finding optimal learning rate to make the converge faster while superconvergence uses a technique of large learning rate to produce faster converge. Pruning can be an effective tool if you’re trying to reduce the model size to fit for a mobile environment.

Generalization

If you used a triplet strategy of using synthetic data, GAN data and real-world data then you’re model will be able to generalize as you have covered distribution. There may be a failure in recognition due to the limitation in the neural network model. If you haven’t done inference at multi-scale then likely you may find objects that are smaller undetected. You have resolution variation during inference and the object is too bloated on the image and neural network may fail to recognize although this can be overcome by using augmentation/Varying Filter Size. Even in the COCO dataset most objects only occupy 10% of the image size that relatively small size.

Ability to reason out the cause of failure to generalize will quicken the process of having a robust network. Visualization plays an integral part of the establishing the generalization.

ROI Vs Time & Cost

Deep Learning involves cost that has to be modelled from time to time. Servers and number of request that is to be estimated time and time again to stay on top of the cost. Source of revenue shouldn’t be constrained, the model should be diversified enough to solve other allied problems. If it’s B2C consumer-facing app then it relies on how much the customers will pay for the service. Then certain market dynamic to be evaluated like how better is your product over other, what are your points of distribution, customer acquisition strategies and cost associated with it, customer retention cost, level of usability of the app(How often do they use the app), cost that the customer is willing to pay(price points).

Find other strategies of creating revenue like crowd-sourcing customer behaviour(Amazon does it all the time, the caveat with the Prime sale, big discount is to have you as a customer and learn your purchase pattern and provide product and service that are of benefit to you). Till now customer behaviours were based on textual searches made on the internet, video-based behaviour analytics can open up the whole new level of customer modelling. Text-based behaviour modelling uses preference exhibited by them while they visit the website or purchase history since these e-commerce and retail have a large volume of customer data to model their purchase pattern and create recommendation engines to influence customer behaviour or if they’re a retail store they engineer the store in accordance to the customer behaviour. People usually hand engineer features and it’s a multiple objective optimization problem. It’s not a single point optimization problem here as brand diversity is to be considered. Coca-cola uses cash back codes on the bottle caps and engages with the customer to know their purchase behaviour.

The idea behind finding new source revenue is the same as the previous topic where we saw how an objective to a problem is represented creates the solution. Rethinking objectives open up channels of revenue options. Beware every strategy has the real world difficulty to implement so clear objectivity is to be evaluated.

Multiple product and service don’t strain the relationship with the consumer as they feel the cost they pay is validated by the gains. Staying close to the consumer provides insight into their pain points and how new solutions can be engineered to solve them.

B2B

When it comes to solutions to business. The game becomes how beneficial is the product to the existing business and what kind of value-add it can bring over time as well as what the impact on revenue after adding these services. How easy is to integrate the model into their existing framework. Most companies have a budget and often they don’t take up new services/solutions. If the solution offered carries the potential to generate revenue then businesses are more interested in integrating the service.

Contents bring contracts in a B2B environment. Establishing domain expertise in the marketplace and different ways of expressing content light up the minds of the organizational decision makers. The internet is filled with contents but the key factor is extracting contents that are relevant and creating a business value around it. The space to innovate is so huge so small variation to existing models bring better results. We are seeing the cross-domain infusion of technology that is producing better results.

Resource

Having a statistical modeller, data scientist, data science engineers would be a necessity. Cross-domain knowledge can come handy when trying to create solutions. Some domains would have overcome the same problem in a different format. Adapting such ideas produce better products and services. For example, we have seen aircraft engineering applied to household products that have produced better mouse traps that solved consumer problems. Cognitive diversity prevents organizations from being boxed down with a fewer solution.

Inference

Inference method provides insight into how well a model has generalized. Finding precision, recall and f1 scores can provide an overview of the misclassification happening during inference. A number of misclassifications should be evaluated against the merits of the model. Sometimes misclassification can be expensive like classifying a fraudulent transaction, insurance claims, cancelling loan application, etc

What Does A Neural Network Learn Inherently

The surprising part about deep learning projects is that a neural network learns far more features than just the solution to the problem.

  • Semantically segmentation models learn depth maps of the image by differentiating foreground and background variation.
  • Classification models create linearly separable euclidean spaces of the classified objects.
  • Learnt features in a neural network can be used to create product design with constraint optimization since the feature space is learnt.

Challenges in generalization

Modelling real-world distribution is challenging but progressively improving accuracy & F1 score will make the model to generalize to most scenarios. Augmenting data increases the generalization and produces faster inference.

Remaining Future Adaptive

Deep Learning is an open field of study and almost 90 research papers are published so the change in technology can happen so quickly. Remaining future adaptable is a key parameter in deep learning projects. Businesses need to evaluate the adoption of AI and figure out ways to collect data that ultimately gives them an edge in the marketplace. There are many ways to engineer a deep learning with a small amount of data and progressively improve as the access to the data is available. Extracting information and converting them into behaviour patterns increases the productivity and aid the sales in the organization.

Thanks For Reading Through. At Hertz AI we build reasoning modules in NLP and Computer Vision For Enterprise.

--

--

Hertz AI
Hertz AI

Published in Hertz AI

At Hertz AI we modulate your business with AI and offer a cutting-edge tailored, secure, in-house solution for every industry. We build reasoning modules in NLP And Computer Vision. We talk about trends shaping the DL community and its influence on the businesses.

Responses (1)