How Gene Editing Can Be Used to Stop Both Famine and Economic Loss in Africa

The Story of How I Researched and Found a Way to Modify Whiteflies (from complete Novice in AI and gene editing to creating a validated solution and having knowledge in this intersection in 2 months): A New Way of Combating Cassava Mosiac Disease

Josh Roy
17 min readMay 31, 2024

Most of us eat carbs in our daily diet. And we love it. Bread, pasta, rice, etc. For the average person (in the U.K. and the U.S.), this makes up about 65% of our daily calories. Well, could you take a guess at what is the third largest source of carbs in the world? It’d probably be one of these three I mentioned (especially if you’re from the Western world), right?

Wrong.

It’s… Cassava.

Have you heard of it? Comment below 👇

Cassava chips I bought from my local vegetable market in England!

Assuming you live in the West, you probably won’t find them in your major retailer. But if you go to a vegetable or a foreign farmer’s market, you may be lucky to get something like cassava chips (which I can tell you, is yum yum!).

Well, what if I told you that one of the most damaging crop viruses in the world, Cassava Mosiac Disease (CMD), causes annual economic losses in East and Central Africa that are estimated to be between US$1.9 billion and $2.7 billion. Moreover, on a human level, this disease can lead to yield losses of a staggering 50–100%, which ruins a primary source of income for many smallholder farmers. Such significant losses reduce the availability of cassava, which is a crucial source of calories for over half a billion people in the tropics. That’s crazy!

In the regions that are most heavily dependent on cassava, malnutrition rates are the highest due to the crop’s poor nutritional profile caused by CMD. For example, in some parts of sub-Saharan Africa, where cassava is a main staple food, the prevalence of ‘stunting’ (a sign of chronic malnutrition) among children under the age of five — can exceed 30%.

The Problem — 4 reasons

Let’s focus on Nigeria for a second as it’s the world’s largest producer of Cassava. The country accounts for 26% of total production worldwide, and faces an annual loss of USD 2.3 billion due to CMD. In 2024, 26.5 million people in Nigeria will suffer from a food crisis. CMD annually reduces cassava yields by 70%, of which 47.5% are infested when CMD is transmitted via whitefly, exacerbating food insecurity for over 65% of the population dependent on this staple food.

1. Supply is Unable to Meet Demand

The annual demand for cassava starch in Nigeria is 485,000 metric tonnes.
To fulfil the country’s demand, Nigeria requires 28.3 million metric tonnes of fresh cassava root planted annually.

However, CMD has led to decreased production, prompting the need for land expansion. Yet, with increasing demands for land and a growing population, expanding cassava cultivation is not sustainable.

2. Barrier to Economic Development

Cassava comprises 45% of the agricultural GDP in Nigeria. Which has the potential to generate $427.3 million from domestic value addition and $2.98 billion from exports, yet the current domestic value at 59.5 million metric tonnes falls short.

Low productivity, particularly due CMD, decreases Nigeria’s economic growth, food security, and development, leading to missed opportunities in the global cassava market.

3. Nutritional Deficiencies, Poisoning and Deaths

Due to food insecurity, Nigerians consume CMD-infected cassavas, which causes the risk of cyanide poisoning, contributing to 2.5% of deaths among affected individuals and also causing nausea, vomiting, headache, dizziness and neurological damage.

Additionally, food insecurity stemming from cassava production results in malnutrition, hunger, and disparities in food access for 138.71 million people.

4. Cassava Deficiency

There are 30 million farmers who depend on cassava cultivation for their livelihoods, with smallholder farmers accounting for 95% of Nigeria’s agricultural workforce.

The 47.5% decline in cassava yields due to CMD transmitted through whiteflies directly affects these farmers. Not only decreasing the productivity of their crops but also their incomes.

So you ask — “we can surely use gene editing, right”? What been done about this and why has nothing happened so far?

In 2017, Nigeria approved the cultivation of genetically modified (GM) cassava, but it encountered challenges. Research on cassava has spanned 28 years due to difficulties in breeding better varieties caused by unpredictable flowering, resulting in limited genetic improvement.

In fact, 87 organizations objected to the release of GM cassava.

Here are the disadvantages:

  • Inserting transgenic lines which are resistant to CMD showed results but due to unpredictable prolongation, it hasn’t been further developed.
  • Genetic modification involving the insertion of foreign DNA and CRISPR-Cas9 technology into cassava to enhance nutrient content, has also been ineffective due to the same issue.
  • The population remains critical to agricultural systems and policy processes regarding targeting new innovations. The 25% of farmers are highly opposed to cultivating GM cassava.

Ok, so modifying the cassava won’t work. But what about… the whitefly!

Bingo.

Through research a few team members and I did during The Knowledge Society’s (TKS) annual focus hackathon, we found a way to counter this big problem, and in fact it can be deployed in the next 1–3 years.

Introducing… Modified Whiteflies!

Editing Whiteflies

Here’s what we found in our research:

Whiteflies transmit viruses when they feed on plants and deposit virus particles in their saliva. By modifying genes involved in the production of salivary gland proteins, we can impact the whiteflies’ ability to transmit viruses effectively without gene editing cassava.

  1. Observation

Creating whiteflies resistant to Ugandan cassava brown streak virus (UCBSV).

i. Induce a comprehensive analysis of the transcriptome of whiteflies before and after feeding on UCBSV-infected plants needs to be conducted.

ii. RNA silencing pathways in whiteflies should be investigated as these pathways are the ones in charge of the defence mechanisms of plants against viral infections.

iii. Salicylic acid (SA) pathways. Examining the expression of genes associated with SA in whiteflies provides insights into whether this pathway can be targeted to enhance whitefly resistance to UCBSV.

2. Preventing

  • Once the receptor that the virus uses to enter and infect whitefly cells has been identified, hairpin RNA to induce RNAi needs to be employed to modify whitefly genes associated with virus transmission.
  • Introducing resistance-conferring genes or RNA interference constructs into whiteflies, is done to disrupt these receptors preventing the virus from effectively establishing an infection in the whitefly, and disrupting crucial steps in the virus transmission process.

And How Exactly Will This Be Done? —Here’s 2 Combined Ways

1. Anti-UBCSV Kit for Whiteflies Study

The Anti-UBCSV kit encompasses essential components for experiments on Bemisia tabaci whiteflies.

It feature tools for precise gene editing and RNA interference, designed to target specific virus receptivity genes in whiteflies

An anti-UCBSV kit

Specialised tubes optimised for the experimental process will be provided, facilitating the collection, and analysis of genetic material.

In this experimental study, we will collect whiteflies from Nigeria to investigate the effects of genetic modification on their interaction with Cassava mosaic virus (CMV), which causes CMD. Using our anti-ubcsv kit, we will genetically modify the whiteflies by targeting specific genes associated with virus receptivity. Post-genetic editing, a thorough observation will be conducted to identify changes in feeding patterns, reproductive capabilities, and, crucially, the whiteflies’ efficiency in transmitting CMV.

2. Data Analysis — Where Artificial Intelligence Can Come In

Enter… artificial intelligence! Wow, that was an unexpected turn.

We found that integrating DNA sequencing into data analysis post-experimentation enhances our capacity to rapidly process and interpret vast datasets, and accelerating the process. While also providing more nuanced insights and precise identification of patterns related to whitefly behaviour and virus transmission.

This is the process of how it would be done:

i. Data Preprocessing:

Clean and prepare the collected data, addressing inconsistencies and missing values to ensure dataset quality.

ii. Feature Selection:

Identify relevant features crucial for the study’s objectives, focusing on key variables related to whitefly behavior and virus transmission.

iii. Machine Learning Model Training:

Train selected machine learning models to analyze patterns in the dataset, allowing for a comprehensive understanding of the impact of genetic modification on whitefly behavior and virus transmission efficiency while avoidant 85% of failure rate. The results will be iteratively refined and validated, informing targeted interventions in disease management and vector control strategies.

A Prototype

To make things easier to demonstrate, post-hackathon, I learnt a bit of python and built a python prototype that replicates this exact process of using AI to identify and predict resistant strains of cassava that can withstand CMD with a Python prototype. The purpose of this was for the prototype to serve as a foundation for the project as a proof of concept, which we could use to demonstrate when we talk to AI and gene editing experts (more on the feedback we got later on).

However, a major hurdle I came across was that I simulated a dataset — so I can replicate what the actual process would be like. The real data would come from labs or fields — or at least that’s what I thought.

It was my program director at TKS (shoutout to Steven Ritchie!) who pointed out that I should be using real data sets. Of course! Working with fake data that doesn’t map to a real data set won’t get you the experience of working with real data. An obvious blindspot I had at the time.

So I found out that there are tons of open-source data sets available to use, from the following websites:

However, during this process I found that real data can be much more challenging (than let’s say, using simulated data) for as the data quality can be bad. Real-world datasets often contain noise, missing values, and inconsistencies that require extensive cleaning and preprocessing. Real data can also be complex and require more sophisticated techniques to analyse and model, especially when dealing with high-dimensional data. Furthermore, real datasets can be HUGE, necessitating significant computational resources for processing and analysis. Top top it all of, it’s very complex — understanding the context and nuances of the data often requires domain-specific knowledge, which was a barrier as I’m not familiar with the field. Not great for a beginner ):

Why I Used Random Forest Classifier

Random Forests are easier to interpret compared to deep learning models. They provide feature importance scores, which can be very insightful. And Random Forests typically require less training time and computational resources compared to deep learning models, making them suitable for initial prototypes. For many tabular datasets, Random Forests provide competitive performance without the need for extensive tuning.

So, as a workaround, that would make sense for my level and suitable for a first prototype, here’s what I built (based on the Cassava Leaf Disease Classification from Kaggle):

import pandas as pd
import numpy as np
from sklearn.feature_selection import SelectKBest, f_classif
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score
import matplotlib.pyplot as plt

# Load the real dataset
df = pd.read_csv('cassava_leaf_disease_data.csv')

# Display basic information about the dataset
print(df.head())
print(df.info())

# Assuming the dataset has the columns 'Gene_Seq_Length', 'Mutation_Rate', and 'Label'
# Data preprocessing
df.fillna(df.mean(), inplace=True)

# Selecting features and target
X = df[['Gene_Seq_Length', 'Mutation_Rate']]
y = df['Label'] # Assuming 'Label' is the target variable

# Feature selection
selector = SelectKBest(f_classif, k=2)
X_new = selector.fit_transform(X, y)

print("Selected Features:", X.columns[selector.get_support()])

# Splitting the dataset
X_train, X_test, y_train, y_test = train_test_split(X_new, y, test_size=0.2, random_state=42)

# Training a Random Forest Classifier
model = RandomForestClassifier()
model.fit(X_train, y_train)

# Predicting and evaluating the model
predictions = model.predict(X_test)
print("Model Accuracy:", accuracy_score(y_test, predictions))

# Visualize feature importance
feat_importances = pd.Series(model.feature_importances_, index=X.columns[selector.get_support()])
feat_importances.plot(kind='barh')
plt.title('Feature Importance')
plt.show()

The Output

Using the above code with real data will result in a display of the accuracy score of your model on the test set.

For Random Forest, feature importance scores are visualised.

This would differ from deep learning, where training and validation accuracy curves would be plotted.

This approach would provide a comprehensive view of our model’s performance and the importance of features, at the same time leveraging real-world data for more realistic and reliable results.

Models built on real data are more credible and can be validated more effectively by experts.

An output example showing mutation rate and gene sequencing length. It would be a graph like this.

Here’s some advice we got from interviewing some super-legit AI and gene editing industry experts as soon as this hackathon was finished (all were conducted between March to May 2024)…

1. Kabir Mathur, Founder & CEO of Leen.dev (ex-Typeform, Kiip and Forbes Development Council) — 15+ years of professional experience

Kabir Mathur. Source: LinkedIn.

Background: “I have been in the start-up space for quite some time now, it’s been about 15 years. Worked in various different industries, everything from gaming to ad-tech. Most recently, I was at a B2B SaaS company called Typeform. And about a year and a half ago, decided to quit my full-time job and start my own thing. Did an exploration across a variety of different spaces. I looked at web3, AI — tried to build a co-pilot for customer success teams with my co-founders. That didn’t quite work out so now, what we’re building is in the cybersecurity space. So we’re building unifed API for cyber-security tools and data, we got some funding, had a team that we hired out. So yeah, I’m just a start-up guy through and through — got a bunch of exposure in different industries, so, hopefully — some of my experience will be helpful to you”.

Q1: “What strategies would you recommend for balancing the comprehensiveness of our model against the need for rapid data processing and interpretation?”

A: “Good question. So I’ll caveat this by saying that, and I told you this in the conversation we had prior, I’m not an expert in the data processing side of AI, but I’ll tell you how we try to do things on our side, which is the breadth of my experience. We were looking into this project for this customer success team. There were a number of different systems we had to integrate with and each of them had different data formats. I don’t think that would be necessarily a problem on your end because you’re probably trying to look at one type of data. Can you actually tell me a little bit about that, like what is the data you’ll be looking at and how is AI going to play a part in this project overall? And what is the ideal outcome for you and based on that I can cater my answer.”

So I answered: “Our source of data, so how we are going to get these massive data sets to then do the pre-processing and then throw it at some type of AI model, would be from analysing data on the field, so from whiteflies, or in the lab (knowing what I know now, I just could’ve said real data set due to my lack of understanding back then). So that would be our two sources of data but it would be DNA sequencing in both cases.

AI will play a role in this in that it will execute data pre-processing — so cleaning and collecting the collected data and addressing any inconsistencies/missing values to ensure dataset quality. The overview of this aspect is to, as I mentioned, integrate DNA sequencing, into data analysis after our experiment stage.

The ideal outcome would be for the whole the process to be more rapid and precise given the vast and inconcistent data we’d be handling.”

A (Kabir continued): “And are these external labs or are these affiliated with you? I’m just trying to understand the supply chain because where you get the data from will kind of determine how much pre-processing you’d need to do. So anyway, going back to the example that I was sharing with you in the early days of the tool that we were building. In our case, there were different datasets from the SaaS tools and what these companies used — anything from Slack, Notion, that kind of thing. Some of it is structured data some of it is unstructured. Google docs, for example, you build from a spreadsheet — that’s structured data where you can expect a specific type of format. Where as, you’re building from Notion it’s unstructured text space data. So it required us to create a type of graph to tie those datasets to together. Um, I don’t think you’d have to do that on your side but if you’re working with multiple labs and they’re all sending you different datasets, you will have to put all of that into a singular format before you can throw it at a model that can then understand that format, right? Because you’d have to do a mapping to help the model understand the context of what the data is before you can get the type of outcome that you want from it.”

Q2: “In the context of using AI for genetic modification purposes, what might be some ethical considerations that we should prioritise during our R&D process?”

A: “That’s really interesting. You’re dealing with a pretty narrow use case as far as gene editing goes so there aren’t massive ethical concerns with what you’re doing. Because it doesn’t go into the human sphere and whatnot, but yeah of course you should have some consideration for what downstream effects these changes that you make to these… it’s a parasite, right? I don’t know much about this. Are they parasites, the whiteflies?”

Me: “Basically insects. Not parasites.”

A (Kabir continued): “So the ethical consideration would be what impact it does to the ecosystem if you genetically modify this one element. They probably have some sort of natural predator or something that will have an impact on those things and has a domino effect potentially. That’s the first thing that comes to mind for me.”

Q: “Are there any emerging technologies or methodologies in AI that you believe could significantly impact our project’s success in the near future?”

A: “Nothing super new. I mean it’s all changing very quickly so it’s just about what the latest models are and they’re probably going to be specific forks of the larger models that are fine-tuned to this type of data. But during my search, when I was talking to potential co-founders, I came across a guy who had fine-tuned whatever OpenAI’s latest model was at the time to the production of new molecules because he was looking into something related to regulations in the EU and the use of certain chemicals in beauty products and how those skewers need to be modified per the country that you’re selling into because certain substances are banned in certain countries. So the same face cream can’t be sold in the same formulation for example in one country versus the other. So it’s more about using those kind of fine-tuned models likely, in your case, is what you’ll see the most benefit from… And people in the research space probably open-source some of there work around this so you can look into that and see what there is. I’m sure there are existing open-source projects on this.”

2. 5 Gene Editing experts — <15+ years of professional experience

People in the field we spoke to:

  • Shea Tough, Co-Founder and CEO of ForestCity Synbio (ex-startup mentor, LOI Accelerator Batch 7, Biotech and pharma consusltant, Research assistant)
  • Gaurav Sanganee, Founder of ClosingDelta (Consultancy for Pharma companies, experience with clients from the top 20 global pharmaceutical companies)
  • Vidushi Valli Surendran, Founder of Metanome
  • Hayley So, Prev. Intern at Medipage
  • Anika Gandhi, Founder of Ultrarice (raised $25k in pre-seed funding)

Given these are relatively newer entrants to the field, I summarised our key takeaways/action items after speaking to them here:

  • Clarify economic impact
  • Know what type of data we’re collecting (clarified this earlier)
  • Do a GAP analysis
  • Get validation from people ideally with 10+ years of industry experience
  • Reach out to professors, uni and pre-uni students
  • Perhaps create a company on this to secure grants
  • Problem understanding is great and clear
  • Include your USP, avoid selling but rather show how your project is unique
  • include scalability, cost/economics aspect
  • speeding up the R&D phase is tricky, we’ll need to consider government regulations, trials, etc.

Companies we’re still in the process of trying to reach out to: Bayer and Monsanto (if you can intro us with anybody, that’d be great!).

I aim to answer these in an upcoming review article, where I researched a ton of research papers to formulate an answer and while we’re still in the process of validation from experienced individuals from the gene editing industry).

The Roadmap

Our proposed roadmap from our hackathon deck.

So, you may ask… what about those who are already working on this?

The typical solution to mitigate CMD would be for the plants should be carried away by farmers from the field and exposed to the sunlight for drying and then burned to kill the viruses. That’s the traditional way. But in this case, old is not gold! This doesn’t always work well, and is of course not a scalable solution.

A farming communty in Nigeria. Source: Cornell University.

While various initiatives have been launched to develop and distribute CMD-resistant varieties​, several limitations hinder the full effectiveness of these.

In fact, we found no companies are working on this problem like we propose. The most similar existing solutions are (i.e studies with aims to demonstrate how GM insects can effectively prevent the spread of diseases):

  • In 2014, World Mosquito Program introduced modified mosquitoes across Brazil to safeguard up to 70 million people from diseases transmitted by these insects. The intervention in Niterói resulted in an impressive 69% decrease in dengue cases, while in Rio de Janeiro, the reduction was significant at 38%.
World Mosquito Program study in Brazil. Source: Our World In Data.
  • Simiralty, Oxitec, released mosquitoes to combat dengue and within 13 weeks, the mosquito population in dengue-infected urban areas of Brazil dropped by 95%.
  • Via paratransgenesis, which involves infecting mosquitoes with bacteria that prevent them from transmitting malaria, and gene drive technology, mosquitoes are replacing the population of malaria-carrying mosquitoes.

Back in February, we created this slide deck as a team and pitched our solution to judges of backgrounds from MIT, United Nations, McKinsey and more… and we were fortunate to have been recognised as a global winner for the annual TKS Focus Hackathon in 2024 and overall best solution for creativity. The solution beat 400+ competitors from 300 cities and 80 teams. Not too bad of a validation for our project! Read my key takeaways from the hackathon in my personal monthly newsletter here. After this, I decided to immediately begin speaking to experts to get from 0 to 1, starting with the R&D phase, and decided to take this on as a full-on project (and potentially later a start-up) due to a few convictions.

A few reasons I believe why we should work on this (i.e. why we’re building this):

  1. This is idea, I believe, is unique and unexplored
  2. Brandon (MIT alumni) encouraged us in his feedback that we should take it on as a proper project and “enter the arena”, which I couldn’t agree more with
  3. I’ve got a great team with whom I get on with and am positive will continue to move this forward as a meaningful project that will save lives.

Vision

Reduce food insecurity by bringing this solution to the market and positively impacting several millions of lives.

If this vision is brought to life…

Nigeria’s cassava production is projected to increase to 77.96 million tons, reducing food insecurity in the country to 49.56%.

Preventing CMD from spreading, there will be an increase of 13.464 million tons of cassava after 11 months. This will improve food insecurity and provide food to an additional 34.94 million people. The cassava plants do not need to be genetically modified, concerns regarding GM food will be alleviated.

My ask to you!

If you’re aligned on my mission, do you or do you know anyone from the following companies: Bayer or Monsanto? Or relevant and super-legit researchers, NGOs, or technology companies that could bring this solution to the market and positively impact millions of lives? Or someone who can write a reccomendation/intro for us from from the following grant programs we found? Would love an intro for further validation and grants! My contacts are below.

Contact me — Project Manager of Modified Whiteflies

If you have any questions or want to connect, feel free to email me at joshroy004@gmail.com or go to my LinkedIn.

Thanks for reading!

Sources & further reading

Can be found here.

--

--

Josh Roy

Innovator @ TKS · Building emerging tech projects