Exploring the scope of Automation and Artificial Intelligence in Image data
I discuss about information, which can not be found in a picture, especially the contextual information. Later, I have mentioned some business benefits from taking such an approach.
The purpose of this article is to give direction of what next can be done in the field of image analysis and how to consider context, and to answer such question as: “Can we consider human inputs to marry Automation and AI?”
Looking at the picture above (figure 1), could anybody have predicted what is the essence of the story? The context is that the skiing partner of Sebastien de Sainte Marie had been taken away by an avalanche a few meters away from a couloir that they both had skied together the day before, which had prompted Sebastien to ski alone. Perhaps, the prediction was not so difficult because the only thing being shown in this picture is a man staring several feet (and maybe hundreds of feet) down the mountain gorge.
Does this ring a bell in your head? A world famous sports person, Michael Schumacher, had almost got killed when he had last indulged in his similar insane drive for the conquest of the useless stunt (see the image below). However, it is stories like these that go on to become fables and help label such people as brave-hearts and invincible role models.
It is due to this impact that such images or photos (or we can later also consider videos) should be analyzed with a bit more care by using technology such as artificial intelligence, machine learning and deep learning.
Next, look at the figure 3 below, which is similar to the one shown above, with the element of risk of dying still there, but maybe somewhat lesser.
Figures 1 and 3 are taken from an Outlook Journal Spring 2015 magazine’s cover page.
Now, depending on the time of the day, your mood, your state of mind, there can be several thoughts that can come to your mind. For instance, if you are joining a new project and a new team on a Monday morning (which can supposedly awaken and even inspire you!), then you may think that this photo is an illustration of how to be driven at the workplace, and how to steer ahead of all the ‘roadblocks’ or in this case, ‘water-blocks’.
You would almost certainly need to know the context too. For instance, was the machine that had produced this image or photo an Ecology institute? Or was it an Immunology department? Or was it taken by Haryana Roadways Authority? If it was taken by an Ecology institute, then it would matter to you why there are no trees or greenery in this picture (or that the water is not looking so pure as it should have been). If it was taken by an Immunology department, then you would be bound to think or worry about the virus or bacteria that the person driving this boat is carrying to the clean and fresh water, especially if that water is the source of all potable water in that locality. If Haryana Roadways Authority took it, then you would conclude that there may be a person with a helmet, but then it is a garbage photo because there is only water and no land in this image.
There could also be a drone that the person may be carrying with himself and is not visible (perhaps, purposefully) to predict and recommend to him (maybe he has got his earpiece plugged in his ear to receive the audio signals from that drone) to alert him from any impending danger, such as a waterfall ahead. Depending on that machine (drone), there may be a level of error as well as a certain degree of faith (or lack of it thereof) that this person may be having on that drone, which will be a factor in his decision making at crucial and potentially life-threatening points in that water.
The important point here is that upon seeing the photo, there can be several thoughts that can come to your mind. Some such thoughts are “what is the level of risk involved?”, “Who is this daring person?”, “Is he near Himalayas?”, “Is he already fatigued?”, “Has he got his life insurance policy done?”, “This water looks like smoke”, “Why am I seeing yet another image with water, when there is already so much scarcity of water in Cape Town at South Africa?”, “Is this an image that is meant to promote some brand or a company such as a soft drink company?”, “Is this a motivational image?”, and of course, “Is the water ahead too deep for him to drown?”, and “Is there a massive rock ahead of him?” In a nutshell, the mere thought of the person soon to lose his life due to a death-defying stunt can grip all your thoughts.
There can be multitude of other questions that can arise in your head.
It is the slope (or gradient) of the water at the point where the person is being shown, as well as the depth of the water ahead of him, that can be the most important things to consider for this person.
If there are up to six thoughts that come to your mind in the first few seconds after looking at the photo, then you can categorize those based on the context, the time of day and week, your state of mind (happy, angry, thoughtful, depressed, anxious, preoccupied, soporific, etc.). Otherwise rank order (from the tens of) thoughts that pervade your mind and then categorize those thoughts. We can have humans intervene in devising a library or a catalogue of all such hierarchies or categories of concepts.
Yet another interesting thing here can be that you can think of phrases like “In the air”, “Bird’s eye view”, “Road to perdition”, “Time and tide wait for no one”, “Invest now before it is too late”, “Ahead of fear lies victory”, etc. We can have humans intervene in devising a library or a catalogue of all such phrases and consider assigning those to hierarchy.
Before doing the final step of categorization of words or phrases that come to the human brain upon seeing the photo, we can also consider combing similar words, identifying and algorithmically discarding duplicate words or phrases, etc.
The business benefits of taking such an approach can be manifold.
1. If you give a photo to a person and ask him(her) to give the first thing that comes to his mind and to write it on a piece of paper, then you repeat this exercise for the next photo(s), then it is a manual way of judging the psychological fitment of a candidate for a job. Can we apply AI to automate this process, especially considering that AI can do a better job than judging a person’s psyche from just one word written upon seeing a picture?
2. Also, are we talking about natural inference, or a forced (and thus analytical) inference when we are asking an individual to look at a photo vs. looking carefully at a photo?
3. Now that we are considering phrases along with words, can we create a richer database or corpus of natural language to perform machine learning and deep learning from?
4. Can we ascertain the analytical and creative power of a human who came up with the most number of phrases upon seeing images?
5. Can we apply this thought process or approach to videos, and hence consider transfer learning?
6. Do random thoughts come up in the human mind upon seeing an image? If yes, then what is the proportion of such thoughts? Can we design an AI algorithm measure such randomness in thoughts and to filter those upon seeing a similar image in future?
7. What advise (advisory) or recommendation should be given upon seeing such an image, can be automated using AI or Artificial Intelligence.
8. Finally, can AI be used to predict whether some other person had died a few days ago by hitting a large rock a few meters away from the point shown in this image, and thereby to update the algorithmic model with this insight?