Geek Culture
Published in

Geek Culture

Project Geneva: State Censorship beat by Genetic Algorithms

One of the most impactful usages of Artificial Intelligence in today’s society

The internet is a great tool. It’s not surprising that I am a huge fan of all the potential it provides (I have relied on it heavily for my journey). The internet is an integral part of humanity’s progress and will continue to be a large part of the advancements of the future.

Every country in the world puts large amounts of information in tracking and censoring the internet

Unfortunately, as technology grows, so does the potential for misuse. Machine Learning can be used to automate many complex tasks in ways that would’ve been impossible just 15 years ago. Unfortunately, this opens up censorship and mass surveillance at a scale previously unimaginable. Ironically enough, AI and Machine Learning become one of the best tools for dealing with such problems as well.

I came across one such project in the video: AI against Censorship: Genetic Algorithms, The Geneva Project, ML in Security, and more! This is a fantastic interview with a leader of the project. The interview was particularly interesting because the author talks about how Genetic Algorithms are well suited for this task. As someone who has talked about Evolutionary Algorithms being underutilized for a while, this had me interested. In this article, I will share some interesting (Machine Learning Takeaways) along with some ideas I had looking into this idea. This article is meant to introduce you to this crucial project and bring it to your attention. Make sure you look into them as they are instrumental for the future.

About Geneva

Looking at the website, “Geneva is a novel experimental genetic algorithm that evades censorship by manipulating the packet stream on one end of the connection to confuse the censor.” It has 2 components-

  1. Strategy Engine: The strategy engine is responsible for running a given censorship evasion strategy over active network traffic.
  2. Genetic Algorithm: The genetic algorithm is the learning component that evolves new strategies (using the engine) against a given censor.

The tool is open source, and their Github can be found here. Anyone can set up and run Geneva from their machines. Running it will start the algorithm that will try to test different strategies. It tried more and more refined strategies. If breaks through encryption, the team studies the results in order to extract information about the underlying censorship system.

This disclaimer is so badass. Be safe if looking into this though

Given the nature of the project, there are a lot of networking-related terminologies in their documentation. I know nothing about the field, so I won’t pretend to be able to break those aspects down. If anybody more experienced in these areas wants to get on and talk about those factors, I’d be more than happy to open up my platform to you. But I can explain the interesting AI aspects.

The Range

In the video I shared about Differential Evolution, I talked about how Evolution Based Algorithms have a larger possible search space than traditional gradient-based methods. This advantage really shows itself in problems like this. An evolutionary algorithm (like the Genetic Algorithms mentioned here) can traverse through a much more varied search space.

Sample Strategy evolved by Geneva algorithm: The reason that GAs can build solutions by chaining together components is that they don’t need a smooth, continuous search space.

Take the search space for example. Gradient-based methods need a smooth, continuous search space. Genetic Algorithms can operate in cases when this isn’t the case. This is why they can tweak solutions and chain components of their search space to create new candidate solutions.

Google found great success using Evolutionary Algorithms for their Neural Architecture Search work. Check out my content for more information on that.

The leader actually touched upon this during the interview linked. He mentioned how there is no gradient for ML methods to compute over. This is true for both the search space (we build strategies by chaining commands) and the output (passing through the censorship). In fact, he even mentioned that since Genetic Algorithms can test everything, they actually had to constraint the algorithm to some basic commands.

Search Space Commands

Evolution Based methods always have a valid basic set of commands that they can test out. This is what they use to create candidate solutions, and in the tweaking existing solutions during the recombination/mutation phases. These are domain-specific. For this project, we have the following building blocks.

These are the 4 commands that the algorithm can use/chain to win

It’s important to understand how they came up with these as the basis. The team used a straightforward definition of censoring, Censorship is merely the modification of network traffic. Thus, the strategy is simply a “description of how network traffic should be modified”. From that lens, it’s clear that the block should consist of the possible ways that network packets might be modified.

The goal of a censorship evasion strategy is to modify the network traffic in a such a way that the censor is unable to censor it, but the client/server communication is unimpacted. — From the authors

The Github readme provides a pretty concise description of each of the building blocks.

  1. duplicate: takes one packet and returns two copies of the packet
  2. drop: takes one packet and returns no packets (drops the packet)
  3. tamper: takes one packet and returns the modified packet
  4. fragment: takes one packet and returns two fragments or two segments

It is important to note that duplication and fragmenting “introduce branching, these actions are composed into a binary-tree structure called an action tree. The trigger describes which packets the tree should run on, and the tree describes what should happen to each of those packets when the trigger fires.” An example tree can be here

The simple strategy shared earlier was also an example. I’m sure my followers (who are all very intelligent) will come up with more samples when they look into this.

The action trees are combined together to create full-fledged evasion strategies.

Fitness Function

Fitness functions are crucial to all Genetic Algorithms. They determine what the algorithm will consider good and bad, and will ultimately direct what kinds of strategies are involved. Just like the building blocks, designing this is not trivial. The authors of this protocol had a relatively simple function (which is great because it allows for many enhancements and modifications). In the documentation, they share the following priorities

Taken from the documentation

As mentioned earlier, the possible search space for solutions is infinite. This means that cost-effective performance is crucial. This is especially crucial because the system is running from local machines (not large servers like most ML these days), and is against the computing power of a state. This is why the authors put such a premium on tripping potential deadweight solutions early

This hierarchy accomplishes a significant search space reduction. Instead of Geneva fuzzing the entire space of possible strategies (for which there are many!), it instead quickly eliminates strategies that break the underlying connection and encourages the genetic algorithm to concentrate effort on only those strategies that keep the underlying connection alive.

The fitness function creates learners with an emphasis on minimization of regret (regret here being the useless solutions). This works out for them because the search space is naturally large so we can hope that we are hitting the best viable performance. I would be interested in trying out runs with a “reservation” for a few bad solutions. Sometimes they can introduce strong learners when mixed with strong candidates after many generations. Since the building blocks are simple, it might not be super effective in this case, but I would love to see it.

Closing

This is an interesting example of adversarial design, where the learner is learning by trying to break the censors. During the interview, Kevin talked about how they were trying to design a few censors themselves. It would be interesting to explore an automated censor that evolves with this evader. Not only is this likely to happen in the future, but such a solution would accelerate the process. This kind of approach has seen great success in most ML

Combining this and a setup where we can link the results of all the runs back to the main system (like we do for self-driving cars) would be a great way to supercharge the evasion agent. It would also leverage scale very well. Make sure you look into the project and share what your thoughts are.

The system operations in a nutshell

If you liked this article, check out my other content. I post regularly on Medium, YouTube, Twitter, and Substack (all linked below). I focus on Artificial Intelligence, Machine Learning, Technology, and Software Development. If you’re preparing for coding interviews check out: Coding Interviews Made Simple, my weekly newsletter. You can get the premium version for less than 0.5 USD dollars/day. The premium version will unlock high-quality solutions to weekly coding problems, special discussion posts, and a great community. It has helped a ton of people in their preparations.

To help me write better articles and understand you fill out this survey (anonymous). It will take 3 minutes at most and allow me to improve the quality of my work.

Feel free to reach out if you have any interesting jobs/projects/ideas for me as well. Always happy to hear you out.

For monetary support of my work following are my Venmo and Paypal. Any amount is appreciated and helps a lot. Donations unlock exclusive content such as paper analysis, special code, consultations, and specific coaching:

Venmo: https://account.venmo.com/u/FNU-Devansh

Paypal: paypal.me/ISeeThings

Reach out to me

If you’d like to discuss tutoring, text me on LinkedIn, IG, or Twitter. Check out the free Robinhood referral link. We both get a free stock (you don’t have to put any money), and there is no risk to you. Not using it is just losing free money.

Check out my other articles on Medium. : https://rb.gy/zn1aiu

My YouTube: https://rb.gy/88iwdd

Reach out to me on LinkedIn. Let’s connect: https://rb.gy/m5ok2y

My Instagram: https://rb.gy/gmvuy9

My Twitter: https://twitter.com/Machine01776819

If you’re preparing for coding/technical interviews: https://codinginterviewsmadesimple.substack.com/

Get a free stock on Robinhood: https://join.robinhood.com/fnud75

--

--

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store