The world is experiencing an explosion of digital assets. Digital experiences composed from images and videos are now stored and served by companies and brands to their customers, and they are being produced faster than ever before. Managing and searching for relevant digital assets is becoming an overwhelmingly difficult task for content managers, and Adobe is at the forefront of technological innovation to enable efficient management of this digital content with Adobe Experience Manager (AEM)Assets.
Enhanced Smart Tags is a new service introduced with AEM 6.4 that enables library managers to automatically tag their image assets with their own taxonomies. It is powered by Adobe Sensei using machine learning to ease the pain of manually tagging images.
Normally, it would take a library manager hours to tag every image in their digital asset management systems. AEM 6.3 introduced the Smart Tags feature that enabled automatic tagging of images with generic keyword tags. With Enhanced Smart Tags in AEM 6.4, the automatic tagging process is further extended to now learn the taxonomy of the customer’s business.
Not entirely a new problem
The task of automatic image tagging has been well studied and, with the advent of powerful feature extractors like neural networks, it has advanced significantly in recent years. These systems, although very efficient, suffer from the issue of only generating tags out of a fixed taxonomy.
Enhanced Smart Tags fixes this issue by enabling training on a business-specific vocabulary. Figure 1 illustrates the requirement.
Enhanced Smart Tags works by training models on images tagged manually by a user. The user must tag a few images with tags from their business-specific taxonomy (henceforth referred to as custom tags). They then trigger a training workflow on those images. Once the model is trained, any new images added to the repository can be automatically tagged with the appropriate business-specific tags.
Training with little data
The key challenge in automatically adding custom tags is the paucity of training examples. The training data is exclusive to a specific user/organization and not universally applicable. Thus, it is difficult to employ powerful classifiers based on deep neural-networks that require large amounts of training data for best performance. Enhanced Smart Tags tackles this problem by means of a clever combination of neural network features, heuristics, and multiple lightweight classifiers.
Customer taxonomy evolves
Another important problem is that a classifier-based tagging system must work with a fixed number of classes, while the vocabulary of tags in a real business is continuously evolving.
With a regular classifier, the model would have to be re-trained for each new custom tag added to the taxonomy. Enhanced Smart Tags employs an algorithm that selectively trains a subset of the multiple lightweight classifiers, thus reducing the compute time for training for new custom tags.
Look the same but aren’t
Yet another challenge with accurately predicting custom tags for images is that many custom tags correspond to visually similar images. Figure 2 demonstrates this problem. In order to distinguish between them, the models must be trained with good negative examples for each custom tag. With Enhanced Smart Tags, a user need not identify images as hard-negative examples for particular tags. Enhanced Smart Tags uses a proprietary algorithm (patent pending) to automatically determine accurate hard-negatives for training and produce high-quality tags for images.
Users are humans
Training images and tags for Enhanced Smart Tags come from customer users. Sometimes the images are not fully tagged. For example, a training image might be associated with two custom tags, but the user only tags it with one. In this case, without any checks, the training would end up using the image as a negative for the missing custom tag. This would result in poor accuracy of the corresponding lightweight classifier model.
The Enhanced Smart Tags service has checks to determine if a tag may have been missed by the user. It works best if the missing tag already has a corresponding trained model. Sometimes, the training images have too many distracting objects, besides those that are represented by the associated custom tag. These images do not make for very good training examples and result in a drop in the accuracy of the model. The Enhanced Smart Tags service ensures that it uses only the best available training examples for training and ignores the noisy ones.
Finding balance in examples
Another important challenge for training an accurate model is that of unbalanced example classes. Normally, a binary classifier is trained with nearly equal number of positive and negative examples. But in the case of Enhanced Smart Tags, usually the negatives far outnumber the positive examples. The service employs a sampling mechanism to choose the best negative examples for training.
How it all works
Figure 3 illustrates the overall training and tagging process for Enhanced Smart Tags. For training, a few images are tagged manually with the appropriate custom tags. Fixed sized renditions of these images are passed to the Enhanced Smart Tags service, along with the custom tags. The service assigns the images to specific custom tag models as either positives (if the image is tagged with that particular custom tag), or negatives. It computes activation values of a deep pre-trained classifier neural network for these images as features. These are used to look up generic tags in an inverted index. Finally, it trains binary classifiers for each tag using the features of the corresponding positive and negative examples.
For custom tag prediction, features and generic tags are used to identify the potentially applicable custom tag models. The selected binary classifiers then determine which tags are applicable to the image. The automatic custom tagging module in the Enhanced Smart Tags service is made more robust by adding heuristics that filter out noisy predictions and competing custom tags. The service chooses to be conservative about predicting false custom tags and therefore produces tags with high precision whilst remaining at acceptable recall levels. Also, training for a custom tag does not happen until there are enough positive and negative examples for it.
Custom tag prediction is fast because individual custom tag models work very quickly, and only a small subset of all the stored set of custom tag models is used for the actual prediction. Serialized models are cached in memory to speed up prediction and the lightweight binary classifiers used typically train in a fraction of a second. Most of the compute time is spent identifying good negative examples, prior to the training. Negative example are made available through the example collection process. This is done automatically without any manual intervention and is dependent on the rate at which the examples trickle in to the service.
Putting it into practice
Figure 4 illustrates a typical usage journey of a customer. There is an initial onboarding step to enable and configure the service. If the customer does not already have a tag taxonomy, they may define one by creating new namespaces and tags. In the initial training phase, customer users manually tag training images (the recommended numbers are 25–30 images per tag) and trigger a training workflow.
After this, the service can be used to predict custom tags for new images ingested into the repository. Training the service is a continuous process because the new images sent to it may already be pre-tagged which serves as ongoing training data input for the service.
For tagging images, the users may trigger the “DAM Smart Tag Assets” workflow in AEM 6.4+. The workflow associates both generic keyword tags and custom tags with the assets.
A training report (Figure 5) provides detailed information on custom tag models that the Enhanced Smart Tags service has trained and can be used to understand the current behavior of the automatic custom tagging process. It also provides useful recommendations to the users on how to improve the training quality for a specific tag.
Enhanced Smart Tags fills the gap between generic image auto-tagging and enterprise specific custom auto-tagging requirements and is designed to enhance the productivity of the asset library manager for efficiently managing digital content, saving them time and money, and creating a more accurate library.