Zhuangfang NaNa Yi: Building Machine Learning Applications that Empower Policymakers with Insights to Support Vulnerable Communities

A conversation about the nuances of applying machine learning algorithms to Earth observation for global development organizations.

Radiant Earth
Radiant Earth Insights
13 min readOct 5, 2020

--

It is our pleasure to Dr. Zhuangfang NaNa Yi, a machine learning engineer at Development Seed, supporting international development organizations like UNICEF, the World Bank, and USAID in making data-driven policy decisions. She has extensive experience in applying machine learning algorithms to geospatial and satellite data, from building applications that farmers can use to track crop types and changes to water bodies, mapping forest and measuring food security, and more.

Born as Dai/Tai Lue — an ethnic minority in China and Southeast Asia, Dr. Yi’s hometown borders China, Laos, and Myanmar. Dr. Yi decided to pursue machine learning, driven by a desire to understand the changing social and environmental landscape in her native homeland. Dr. Yi holds a Ph.D. in Ecological Economics from the Chinese Academy of Sciences. To date, she has co-authored 24 publications, including peer-reviewed scientific papers and technical reports. Dr. Yi

Fresh out of her Ph.D. program, Dr. Yi worked for the World Agroforestry Centre (ICRAF), studying forest cover and hydrology balance, and detecting changes to livelihoods and ecosystems. Today, she leads a team that builds machine learning algorithms that help support vulnerable communities, aids international development efforts, and responds to natural disasters worldwide.

In this Q&A, Dr. Yi talks to us about the nuances of applying machine learning algorithms to Earth observation for global development organizations.

You have progressed from researching the interdependence of social economics and natural ecosystems to using technology to understand (and predict) the interactions. Why the change?

I was born particularly poor in a tiny, remote, ethnic minority village in China that borders Laos and Myanmar. I didn’t own a pair of shoes until my second year of primary school. Our school was a shed built by parents in the village. The kids dug and built the school playground because there was a small hill outside of our classroom. We were financially poor, but in other ways, we were actually rich. My childhood was never boring. We climbed large trees, swam in rivers, caught fish, and picked fruits, nuts, and vegetables from the nearby tropical jungle.

However, life changed dramatically in the early 1990s because of one crop, the rubber tree. Globally, it was the second-biggest agricultural commodity that lifted millions of smallholder farmers across Southeast Asia out of poverty. However, something went wrong; rubber trees were planted as a monoculture crop. This practice required clear-cutting the rainforest. Rubber trees became the dominant tree across the landscape for hundreds and thousands of hectares. They would remain there for the next 30+ years. Today, rivers are siltier; fish, insects, and bird populations have fallen, and villages run out of drinking water during dry seasons.

Ecologists and conservationists blamed local governments for incentivizing locals to expand rubber cultivation while failing to protect the environment. Governments blamed farmers for their shortsightedness, irresponsible investments, and clear-cutting forests. All the while, farmers didn’t have the luxury of prioritizing the environment over their livelihood. When such problems emerge among multiple stakeholders, finger-pointing is easy, but finding a common solution is a struggle.

A global map of natural rubber producers, v.s. major consumers. (Credits: Yi & Cannon, 2017).

I was the first female Ph.D. candidate of my ethnicity and spoke the languages of the three major stakeholders. So, I was motivated to answer several scientific questions:

  1. What is the total area of monoculture rubber plantations?
  2. What is the value of the ecosystem services of the tropical forest?
  3. What is the cost of losing these ecosystem services?
  4. How much do farmers earn from their plantations, and how do these economic returns look across the landscape?
Natural rubber profits v.s. biodiversity in Southeast Asia at the county level (Credits: Yi & Cannon, 2017).

Long story short, my studies widened my lens. I started to look at the monocultural rubber economy in the whole of Southeast Asia. I began to build data tools and models for direct-policy making, for instance, a prototype tool to support sustainable and responsible natural rubber cultivation and investment. I became more interested in large-scale geospatial modeling and open source tools for open science. This was the turning point when I decided I would rather help others get closer to the data and more involved in building tools, than become a scientist focused on publishing peer-reviewed journal articles or books.

“The most efficient problem-solving happens when the problem solvers are closer to the root of the issue. This needs to be done using open and easy-access data and open-source tools to incentivize multiple stakeholders to engage in conversation and find common-ground solutions.”

You work primarily with open data to build your applications. What are the advantages of using open data, especially in an international development context?

Even if we suppose our collective goal in the geospatial industry is producing data and building tools for the right people at the right moments to solve their data problems, open data itself is not enough. Open data is not only open-sourced and openly licensed data, but also requires easy access. To ensure the right people have access to open data also requires a lot of outreach and capacity building, especially in the international development context.

It’s also an interesting and exciting time we are all experiencing right now. Many great leaders from the private sector, research groups, and large space agencies are now pushing open data as another international developmental milestone. These leaders include NASA and ESA — all of whom are working towards processing, archiving, and hosting Earth observation data on the cloud. AWS, Maxar, and Planet have open data programs. CosmiQ’s SpaceNet challenges, Radiant Earth’s MLHub, Zindi Africa, Kaggle, and more, have created high-quality, standardized machine learning training datasets for our sector, helping us produce more analysis data that is more easily accessible and available through Cloud Optimized GeoTiff, Spatio Temporal Asset Catalog, Datacube, and other open-source community tools, like Pangeo.

Our sector has adopted a Western-centric viewpoint in training data creation and ML model development and fitting. However, we need to be honest with ourselves and admit that we can’t solve many local problems in the international development context this way. Thankfully, we see fresh minds and young tech communities pop up in all developing countries. The most efficient problem-solving happens when the problem solvers are closer to the root of the issue. This needs to be done using open and easy-access data and open-source tools to incentivize multiple stakeholders to engage in conversation and find common-ground solutions. Unfortunately, I don’t think we are quite there yet.

The projects you’ve helped launch are very impactful: From finding unmapped schools in Liberia, Colombia, and Eastern Caribbean nations using machine learning and satellite imagery for UNICEF to supporting equal access opportunities for school; and more recently, helping to create the algorithm that powers the COVID-19 dashboard for NASA. What are some of the most significant challenges you face in creating global products using machine learning and Earth observation?

Training data create challenges. When it comes to machine learning for Earth observation, the challenges are:

  1. Most of the time, training data doesn’t exist;
  2. It’s rare to find data for training and prediction that have the same statistical distribution spatially and temporally (e.g., geo-diversity, temporal, image color space).

I can use the projects you mentioned as examples. When it comes to A, our machine learning engineers at Development Seed are quite ‘lucky’ since we have a team of expert mappers who can create high-quality training data on demand. They are certified super mappers that generate more than 2.5 million objects in OpenStreetMap. They designed their data tools and highly-efficient workflow to produce high-quality map data for our machine learning workflow.

When it comes to B, we can assume schools, hospitals, other buildings, roads, and trees are more static and won’t move for months or years. Therefore, creating training data for these classes is more straightforward, though we still need to find many image features that are visually distinct from other objects (e.g., what features make a school a school). However, measuring economic impact and people’s mobility under COVID-19 requires quite a different R&D process. Training data for high-temporal and high-spatial-resolution (e.g., plane, cars, and ships) is harder and slower to create. To solve this issue, we are using ML open benchmark datasets (e.g., Xview (Maxar WV-3 imagery) and DOTA (aerial imagery)) as training datasets, and PlanetScope and SkySat as the imagery sources for model inference. Figuring out how to make WV and DOTA imagery in the training set ‘look like’ PlanetScope and SkySat at the model inference was the first problem we tackled. We applied image color and contrast shifts and gaussian blurring to the training data. The implementation was effective for improving model performance.

A second challenge is building scalable machine learning training and MLOps orchestration is challenging. To find a best-performing model, we may end up training hundreds of models for a single machine learning problem. Hyperparameter search and tuning are pretty computational and time-consuming, which require to be run on the cloud 100% of the time. Since we work with multiple partners and bring different cloud providers’ sponsorship, we need to build our MLOps orchestration to be flexible, portable, and scalable and deployable to multiple cloud environments. That creates another layer of challenge. I will share how we overcame challenges and built the MLOps pipelines at our GeoAI team soon.

A third challenge is producing timely and insightful machine learning outputs is also a challenge. Disaster relief, humanitarian aid, and international development interventions are sometimes very time-sensitive. ML inference at a large scale goes beyond ML itself, and into fields of engineering and cloud infrastructure building. At Development Seed, we built open-source tools like Chip N Scale, Fastai serving, and ML-Enabler. Essentially, these tools free up our ML engineers’ time for more R&D so that they have a deeper understanding of the science and business perspectives of the problem, find the best performing ML model, and containerize the models. Therefore, we can present timely end-to-end solutions to our clients and partners.

“If we imagine producing meaningful, insightful, and timely ML outputs, we must focus on the data value chain for either quick decision-making or long-term policymaking.”

Can you give an example of a project you helped create with real-world policy implications? In your opinion, how have that affected communities on the ground?

If we imagine producing meaningful, insightful, and timely ML outputs, we must focus on the data value chain for either quick decision-making or long-term policymaking. The ML role and functions sit right in the middle of this value chain. Sitting upstream in the value chain are Earth observation providers and researchers/scientists from computer vision, ML, and other domains (e.g., climate change, agriculture, or conservation) are. Policy-makers and on-the-ground data-users are downstream. Given where the ML role sits, it’s hard to evaluate and measure how exactly we are changing the policy downstream. Here, however, are a few things we do to get it right:

  1. We find the right partners or domain experts who understand the problems deeply to contextualize the science and business sides of problems/issues;
  2. We ensure the designers’ mindset is geared to understanding how our end-user will query, use, or extract the information we produce for policymaking. This can be done through in-depth user surveys and interviews, and product/data vis tool design.

When it comes to real-world ML model development, I’ve enjoyed working with The World Bank Global Facility for Disaster Reduction and Recovery (GRDRR) to deploy our ML model to identify houses vulnerable to natural disasters using street-level imagery in developing countries. Local governments coordinate with the World Bank to retrofit homes and other vulnerable buildings, saving lives, and strengthening local economies. Another fun challenge has been working with scientists in wildlife conservation to deploy machine learning models that identify wildlife, human activities, and livestock in Tanzania and map potential human-wildlife conflicts. Personally, working with the Chinese Ministry of Commerce to find the tradeoff between economic returns of smallholding rubber farming vs. biodiversity conservation in Southeast Asia to direct international natural rubber investors to be more sustainable and responsible, brings me much gratification.

I grew up poor, but lucky enough to earn a Ph.D. degree. I feel connected to kids, especially girls living in less developed environments with little to no educational resources access. Access to education for kids in remote and underdeveloped countries is critical. Since my first day at Development Seed, I’ve been working with UNICEF to use machine learning and high-res image search to expand mapping schools in many more countries in Africa, South America, and Asia. UNICEF’s GIGA mission is well-aligned with mine and Development Seed’s goals, and I take much pride in contributing to the mission.

Working with UNICEF to find unmapped schools from space with AI. Red dots represent school before AI-assisted workflow, yellow dots represent after applying ML with high-res Maxar Vivid imagery.

Research reveals that women in the geospatial industry are gradually progressing into management positions, but the gender gap remains wide. What advice do you have for young women — especially from developing countries — navigating the career ladder in your profession?

This issue has been long existing in our society, and it’s not only in the geospatial industry, so how can we solve it together? Over the course of my career, I’ve worked with so many brilliant women scientists, and now at Development Seed, women engineers, and developers. Though I do NOT observe differences between men and women intellectually, women do present a feeling of severe imposter syndrome more than men, especially women of color. In other words, women tend to beat themselves up too much in the workspace, telling themselves, “I am not good enough to deserve certain opportunities,” which can have a huge impact on career development.

Interestingly, I’ve been writing and thinking about this since 2015 (My Chinese blog). I don’t think things have progressed much. To start, companies, research institutes, and organizations need to address how to be more diverse and inclusive, and how to help new hires, especially women, face and challenge imposter syndrome. I love Development Seed because many open-minded and thoughtful individuals are proposing and pushing these important conversations at an organizational level and working to implement change. We need to approach issues of gender equality and solve it together collectively.

“I am the queen of imposter syndrome. So what?”

Growing up as an ethnic minority girl in China was not a smooth journey. It took me six whole years to learn to read and write in Chinese at school. Then I had to use this second language to compete with 6.8 million high-schoolers to get into top universities.

As a competitive individual, beating myself up and thinking I was helpless and hopeless was not new. For a while, I thought I was good at tackling my imposter syndrome. However, when I came to the US and settled down here, the imposter syndrome was like an old friend who followed me everywhere. I had a voice to raise but felt that I was not articulate. I knew how to solve a problem, but I didn’t know how to say it aloud. To survive (evolutionally), I am getting good at being honest with myself, asking myself, “so what?” and leaning on positive thinking and problem-solving. From a colleague’s perspective, being a patient listener, less judgemental, and open-minded helps many people out there become a confident person. Such kindness can go a long way in our society.

When it comes to building a career, especially for young women and women of color, creating a trustworthy reputation and finding a purpose in day to day work is more important than how successful you can make yourself. I get a lot of inspiration from the late Ruth Bader Ginsburg. From her, I’ve learned about being your best self and working on something larger than yourself to improve society and the lives of the most vulnerable. I am inspired by how much you can accomplish in a day if you can set aside unproductive emotions (e.g., anger and jealousy).

Good people management is very demanding work. If you know that’s what you want to pursue, be a respectful and thoughtful leader. If you’re going to be a brilliant engineer and developer technically, try to code every day and learn from the best. Sometimes, it’s ok to rely on your team members. Allow them to lead the activities you don’t know a lot about; Be honest with yourself. Find tangible solutions. Work as a team player. Ultimately, nothing is new under the sun. Problems exist in every corner of our workspace, so turn on your creative, problem solving, and optimistic mindsets for your team.

In true renaissance form, you are an academic, scientist and a talented artist who paints, draws, and can converse in six languages. How does your creativity support your work?

Everyone finds different ways to de-stress. Painting, drawing, and dancing are my forms of therapy. My great-grandpa, my uncles, and my brother are painters and wood carving artists. They work in many spaces, including in local Buddhist temples and tourist shops. One of my grandmas is a traditional Tailue singer, and the other was a Dai/Tai Lue dancer. I grew up dancing and performing on big and small stages quite a bit. I used to dance and perform at my mom’s restaurant after school, then wash dishes afterward. I taught Dai dance and the choreography through different forms. Dai dance is one of the well-known ethnic dances in China. There are plenty on the internet and here I’d like to share one of my and my friends’ performances back in 2010.

When it comes to painting and drawing (my Instagram art profile, if you’re interested in taking a peek), I think color and brush strokes carry a lot of de-stressing powers. I like mixing colors and mixing mediums and find the process very therapeutic. When I feel super inspired or disappointed with humanity, I like to draw women of different ages and backgrounds. Through them, I see hope, determination, curiosity, toughness, and softness. When I feel small and soft, I like to draw mountains, landscapes, and dancers. I think creativity is in my blood, and I feel close to my roots and family by creating. I am not 100% sure if my art/hobbies support my work, but they have made me a happier, reasonable human being who is continually searching for beauty, caring, and self-support through art.

Zhuangfang NaNa Yi is a “self-taught weekend painter.”

--

--

Radiant Earth
Radiant Earth Insights

Increasing shared understanding of our world by expanding access to geospatial data and machine learning models.