Hey Google: Designing AI to Be Culturally Inclusive Will Take Time and Teamwork!
This week, Google will reportedly publish its ethical guidelines on the use of Artificial Intelligence (AI). It’s a critical moment for the industry and the public to pause and understand what’s lacking. In 2018, world cultures, traditions, and global voices are missing from algorithms that will shape the future of AI.
At IVOW, we believe that we can train AI software to be much more inclusive — but it will take time and teamwork. Our journalists and data scientists are exploring the future of storytelling and culture in AI. We are sharing our early approach and looking for your feedback and support.
AI is used more and more to help meet our ever growing appetite for on-demand content. Yet, the data that drives our narratives represents only a fraction of our story. A report today in the Financial Times notes that a simple Google search on “CEO”s leads you to mostly white faces. And the recent book, Algorithms of Oppression, illustrates how search engines foster a dominant male, Western-centric point of view.
Several groundbreaking works on AI, culture, and storytelling prove that it is possible and critical to bring cultural consciousness to the digital world, including the works of Boyang “Albert” Li and Mark Riedl on The Scheherazade System; Mark Finlayson annotating a corpus of Russian folktales; Wolfgang Victor Yarlott’s Old Man Coyote Stories, and D. Fox Harrell’s Imagination, Computation and Expression Laboratory at MIT.
At IVOW, we have been seeking advice from AI researchers across the world as we develop our prototype focusing on cultural images and global voices. In addition, our academic partnership with Morgan State University has allowed us to collaborate with Professor Mahmudur Rahman on image recognition and captioning work in AI. It’s important to note that we are an early stage startup but we are confident that together we can incorporate cultural data into future automated stories.
To do this work, we are relying on these main components:
- Story narration templates
- Culturally sensitive image captioning Deep Learning Model
- Natural Language Processing (NLP)
Story Narration Templates
Cultural story narrations are not straightforward like stories related to weather and sports. Every different culture or ethnicity not only has its own style of storytelling, but each has different histories, experiences, and traditions that must be incorporated. This would require multiple story narration templates for a particular culture that would include different ways to start and end a story. Our team of journalists and international partners will collaborate on the story narration templates.
Deep Learning Model to caption cultural image
We started training our Culturally Sensitive Deep Learning model with a relatively small set of images related to Hispanic culture, provided by world renowned ethnographer and photographer Miguel Gandert. Gandert has spent his career documenting the various cultures and traditions of the Indo-Hispanic community in the American southwest. Although this model with only hundreds of images is small, by accurately tagging each photo, we can train our model to produce culturally sensitive captions.
The first step in creating a culturally sensitive story based on a photo is coming up with tags for the image. Those can be explicit to the specific culture and event being depicted: Indo-Hispanic, dance, feathered costume, headpiece, Festival of Our Lady of Guadalupe, etc. Those tags can be combined to start the caption-making process. We first trained our deep learning model with thousands of generic images to help it to come up with sensible captions. These are some real caption examples that our Deep Learning model produced for the given images -
At IVOW, story generation with tags is a work in progress. We believe an extended caption can be the first form of story we tell. The culture examples seen are story generation with images only. So, the model is capable of producing the sentences or the entire story with image as the only input. We are working on the extra input of tags so it can potentially make the story generation more robust.
Here is an early example produced by our model:
Note: The punctuations are filtered by current setting of the tokenizer but can be added back easily. The “word-by-word” approach is to learn the distribution over all possible captions. This allows variations of language when more data comes in.
Natural Language Processing (NLP)
We will use natural language processing algorithms to generate short sentences, using tags provided either by the user or generated via automated AWS services. Here is an example that shows how tags can be converted into brief sentences.
We are considering using the EventRank or PlotShot to identify normal events, optional events and conditional events that can be marked with Typicality. This can help in selecting, omitting, and arranging the sentences. It creates a short summary of a situation by choosing the most typical events. This method can be applied to cultural narratives. There will be multiple short sentences generated using multiple methods:
- The story will start by drawing on available story templates for a specific culture, provided by our journalists.
- Captions are generated from Images, from our Deep Learning model.
- Sentences are generated with Natural Language Processing based on user-generated tags.
- The story will end by drawing on available story templates for a specific culture, provided by our journalists.
In this way, sentences generated from the above method can be arranged using algorithms (for example — EventRank or PlotShot), that will be explored in detail, to give a sensible flow to the narration.
Share your thoughts and collaborate with us!
The frontiers of artificial intelligence and journalism require that we apply deep learning from our past and our traditions. So far, society has gathered enormous varieties of data and leveraged AI algorithms to foster innovation in self-driving cars, cybersecurity, healthcare applications, and facial-recognition. We now need to focus on creating culturally conscious data for future AI consumer storytelling applications.
We need systems that empower communities to participate in the emerging digital narrative. We must understand that the majority of human diversity does not reside in our genes but in our cultures, languages, and traditions. IVOW will bring global narratives into AI & IoT with multiple applications.
It is worthwhile to remember something the late Stephen Hawking once said, “We should shift the goal of AI from creating pure undirected artificial intelligence to creating beneficial intelligence. It might take decades to figure out how to do this, so let’s start researching this today.”
We’d love to hear from you. Critique us, share your feedback — email Davar@ivow.ai
Our IVOW colleagues Vishal Raj, Ziheng Lin, Robert Malesky, and Kee Malesky contributed to this report.
IVOW is a group of multimedia storytellers and data scientists merging timeless principles of storytelling with Artificial Intelligence and culture. It’s our vision to build AI storytelling applications that comb through data on world cultures, history, and traditions to tag more meaningful datasets for machine intelligence. We vow to design the next generation of AI storytellers to be culturally conscious; to promote the dignity, health, and wellbeing of all life in ways that respect and celebrate cultural heritage and identity; and to embrace inclusivity.