When AI Meets Nowruz
From the 14th edition of my newsletter, Humane AI
We Iranians, along with more than 200 million others — mostly from Eurasian countries — celebrate Nowruz, the first day of Spring, and the beginning of the Persian New Year. This is a special edition of Humane AI covering AI in countries which celebrate Nowruz and anything related to their culture and languages. Read on to learn about Natural Language Processing in Farsi, the Digital Silk Road, early warning systems, AI and Nowruz, and more. Nowruz Mobarak!
✏How does OpenAI’s language model describe Nowruz?
Here is the text that I entered into the GPT-2 language model: “Nowruz is the first day of Spring. The exact moment of Spring Equinox. It’s when Iranians, Uzbeks, Afghan, Tajiks celebrate and start their new year.” And here’s how it continued [GPT-2 🤖starting] Nowruz means that we celebrate with joy because of spring and because we want to make sure that we are happy. The first time I saw Iranians celebrating was in the village of Nangarhar. I was very surprised to see that they were celebrating with fireworks and singing songs and dancing. So I thought that maybe they were just trying to show off.” I used GPT-2 Explorer by the Allen Institute for AI to make this. It’s fun, play around.
GPT-2 is OpenAI’s latest language model. It has a simple goal: “predicting the next word.” OpenAI decided not to release the full trained model. They have a legitimate concern that this model has a potential to be used by malicious actors to “generate misleading news articles, impersonate others online, automate the production of abusive or faked content to post on social media.” But their decision created controversy due to the fact that AI research communities — all around the world — have benefited significantly from the culture of open research. In this blog post, Partnership on AI reflects on the culture of openness in the AI community and the possibility of setting up community-wide norms: “There is not a consensus on proposed AI research review norms… However, there is a consensus that norms and review parameters should be standardized across the AI community if the community chooses to restrict or reduce research openness (in contrast to organizations independently designing their own review processes).”
+ This talk of standardization, process, review, and moving slowly reminded me of this amazing piece by W Patrick McCray, a historian of technology:
“At the same time, when we step out of the shadow of innovation, we get new insights about the nature of technological change. By taking this broader perspective, we start to see the complexity of that change in new ways. It’s then we notice the persistent layering of older technologies […] We start to see the intangibles — the standards and ideologies that help to create and order technology systems, making them work at least most of the time. We start to see that technological change does not demand that we move fast and break things. Understanding the role that standards, ideologies, institutions — the non-thing aspects of technology — play, makes it possible to see how technological change actually happens, and who makes it happen.”
✏Keeping people safe with AI-enabled flood forecasting
In the past month, right at the height of the Nowruz celebrations, many cities in Iran suffered massive flash floods. It was heartbreaking. Are early warning systems mature enough to help mitigate the damages of natural disasters? Google has been working on developing ML models for flood forecasting and crisis response. According to a recent blog post and workshop, Google is using “AI and significant computational power to create better forecasting models that predict when and where floods will occur and incorporating that information into Google Public Alerts. A variety of elements — from historical events to river level readings, to the terrain and elevation of a specific area — feed into [their] models. From there, [they] generate maps and run up to hundreds of thousands of simulations in each location.” However, neither Google Public Alerts nor SOS Alerts seems to be functional in Iran.
+ On a related topic, I’ve been collecting Tweet IDs relating to the recent flash floods in Iran. My goal is to conduct a research project on information control during times of crisis. Please get in touch if you are interested in getting access to the dataset or interested in collaborating.
✏“Farsi Shekar Ast”: Natural Language Processing in Farsi
We Iranians hold the language of Rumi, Khayyam, and Hafiz close to our hearts, as exemplified by a saying, “Farsi Shekar Ast,” or Farsi is sugar! As I watch my fellow Iranians “caramelizing” one another on social media, I’ve been thinking a lot about the challenges of automated content moderation in Farsi. How is content moderation done in the context of addressing hate speech and disinformation? This led me to look into the state of NLP and, in particular, sentiment analysis in Farsi. Here I listed some of the available datasets and crowd-sourcing projects (If you are working on this topic or you know resources, please let me know.)
In addition, this is a great paper listing challenges of Natural Language Processing in East African languages. Some challenges include: handling rapid code-switching (switch between languages) with models trained on single language corpora, difficulties of location extraction from Natural Language, different accents, and the idiomatic nature of East African English.
✏Eurasian countries, the Digital Silk Road, and “smart” cities
“As a global leader in developing and using surveillance technologies, China is exporting its high-tech brand of authoritarianism to its neighbors in Eurasia. All of the systems are part of the ‘smart city’ and ‘safe city’ concepts that have been gaining popularity around the world as a response to rapid urbanization. Closed-circuit television cameras and facial recognition technologies are particularly sought products. Chinese tech companies have the strongest position in Eurasian markets in these types of capabilities thanks to their regional presence, long-term credit structures, and a high tolerance for financial risk. Eurasian countries looking to leapfrog into the digital future are adopting virtual authoritarianism, with a lack of oversight legislation, which risks suppressing political dissent and further marginalizing already disadvantaged groups.”
+ Here in the Bay Area, the Bay Area Surveillance Project looks into different smart city technologies which have raised concerns regarding privacy and equal protection. Technologies such as CCTV camera, Automated License Plate Recognition (ALPR), Shotspotter, Cell Site Simulator (Stingrays), and social media monitoring have proven to further surveillance. Check out this very interesting interactive website to see “how each system can be used by organizations from the city level — for example, Oakland — to the federal level like DHS’ agencies.” You should also read Ben Green’s The Smart Enough City.
✏Afghanistan’s all-girl robotics team and the future of STEM in the country
In Afghanistan, the all-girl robotics team has been a source of inspiration for many. This NYT article profiled their journey to major high-school robotics competitions and their future plan to start a robotics club in Afghanistan. “Fatemah [the captain of the robotics team] and Ms. Mahboob [Founder and CEO of Afghan Citadel Software Company] met with Afghanistan’s president, Ashraf Ghani, about a plan to start a robotics club in Kabul, as well as a technology-focused school, which the Yale Engineering Department has offered to design.” In addition, through a partnership with Ada-AI and Digital Citizen Fund, they have started a crowd-funding campaign to “build a high school in Afghanistan that provides students with critical technical skills and exposure to emerging technologies and AI.” Read more here.
✏Preserving historical manuscripts and the art of calligraphy with AI
“The Library of Congress has made 155 rare Persian manuscripts, lithographs, and books dating back to the 13th century available online for the first time. Check out the beautiful passages in Persian, Turkish, and Arabic “written in fine calligraphic styles, and illustrated with miniatures.” While browsing these passages, it occurred to me how difficult it is for me to read different Persian calligraphies. I guess I have to blame it on the force of technology, but can the same force preserve our connection to our old manuscripts and cultural heritage? Here are some of the recent advances in using machine learning for preserving and understanding old manuscripts: “Word Spotting” in Historical Manuscripts, Image enhancement of old manuscripts using machine learning, Handwritten Text Recognition System. For all you out there: can you think of doing anything with all these beautiful Persian manuscripts? All pages are public domain :)