Steve Jobs & His Storytelling Bot
It was the year 1985. Steve Jobs was giving a lecture on the computer as a new medium for receiving the latest software. This was almost 20 years prior to the invention and growth of the social medium on the personal computer. In words that are prophetic in some sense, he proclaimed that an enticing application for the computer was the ability to ask questions about a particular type of writing or story and be able to get insightful answers. Why does this still resonate so many years later? Because we don’t have all the answers if we read all the books as John Mayer stated in his song. If pictures are worth a thousand words, then is the opposite true? Can a thousand words capture the essence of the spirit of the scenery of a play or movie. Also, can we query easily the properties of the scene of the play or movie because the sum is more than the means and the means when spliced up into n-grams or logical notation do really garner the full meaning of the scene and mood of the setting? Should take another take on the meaning of translation from word to picture or vice versa in terms of how much information is lost in the process. Once point to see is that the screenplay of a major motion picture along with the cinematography, art design, and costume design can capture the notes written in the terse description as written by the screenwriter and pushed to film via the director and the procurement of the producer. So, couldn’t we just take the screenplay and train it to be in a key and value relationship with scene in the movie? There are techniques available already that do the opposite e.g. NeuralTalk from Stanford University? From there, we could hopefully flip the process to automate the creation of screenplay from video in a very terse and imaginative recollection of the events in the film as a narrative soundtrack to the video to make the whole series of events interesting and fun to watch. Furthermore, adding the ability to ask questions about scenes or events in the movie will be better if the description is in a form that can be independent of the language and can be easily computable. Any thoughts? P.S. The natural extension would be Video/Narration two-way lossless translation. That would make storytelling fun and exciting!