If I had 4 months to create AI, how would I do it?

Chester XYZ
6 min readAug 9, 2014

--

Today is August 9th and let’s say you had to create Artificial General Intelligence by November 9th, how would you do it? This is my take on how I would do it.

If you look at Artificial Intelligence today, papers are riddled with mathematical equations most of which I don’t believe I can understand and synthesize into AI in 4 months. So, I will not be trying to figure out how to make Artificial General Intelligence that way; I will use the only approach that has proven it can develop Artificial General Intelligence, evolution.

The problem with evolution is the vast search space. Let’s say you were trying to evolve Artificial General Intelligence using BrainF*ck, a programming language with 8 commands, and let’s say it will take atmost a 100 million commands(Ray Kurzweil estimate 100 million bytes of code to implement the brain) to implement Artificial General Intelligence, that is a search space of 8^100,000,000. That is a huge search space, it is more than the number of atoms in the whole universe. The exact number of the programs in the search space is irrelevant all we need to know it is huge. The universe has manage to traverse this search space in 4.5 billion years to create humans. I think we can do better.

Why I think we can do better? Well, for one we can probabilistically select what code works together. For instance, if we were evolving a language like python, I know it is highly unlikely I would need 4 nested for loops. Evolution doesn’t know that, so throughout its evolution process it will try putting 4 nested for loops together and consequently waste resources(time) evaluating sequences that are unlikely. There are significantly more unlikely sequences of code than there are likely sequences of code. So, we can cut down our search space by informing it of the probability of certain sequence of code being together.

Next, we can evaluate the partial contribution of each line of code and choose whether we will update it or not. For instance, we have the code:

x = x^2 ; (+10)

x= x + 4; (-4)

x = x — 2; (+2)

We can evaluate each line in the code and see how much points it adds or substracts from our goal of creating Artificial General Intelligence. The numbers in the parentheses give the points added by including that line of code in our Artificial General Intelligence program. So, given the points for each part of the code, it will be best that the algorithm works on improving line 2 instead of any other part of the code. Evolution will do this eventually but after countless tries of figuring which part is good and which parts are bad. We can evaluate the good and bad part of the algorithm in one step.

We can improve upon evolution by giving programs that match existing algorithm higher points. So, we can create a list of maybe the top 1000 computer science algorithm if there are so many and check how similar our code is to these algorithms. The premise behind it is that we are more than likely to have parts of the algorithm established for Artificial General Intelligence but the difficulty is in putting it together in some meaningful way. It’s like the backprop algorithm, we had calculus needed to do it before, just that someone needed to connect it to neural networks which took a decade or more.

We can also significantly reduce our search space by writing some of the code ourselves. We have some idea of what code to expect and how the structure of code should be. We know for one there is some mechanism for eliminating hypothesis, there is some mechanism for taking high dimensional input to lower dimensions, there is some mechanism for pattern invariance etc. We know somewhat how the code should be like but not exactly.

We can still further improve evolution by subdivision. Let’s say we were trying to evolve our code with BrainF*ck again, with the eight commands – (<>+-.,[]). Let’s say we wanted to evolve a 10 character Artificial General Intelligence (this is for example purposes), and the final program was suppose to be like <+—-[.],>. The search space is 8^10 programs. If it was a 20 character program the search would be 8^20 and in general if it is an n character program the search space is 8^n, ie the search space grows exponentially with length. Let’s instead of try to evaluate each point in the search space, try evaluating the best character at each point in the program. For instance, for the first character we try to find out which character improves our score to its maximum when the first character is set at it, in this case that will be <. Atmost we will only have evaluate 8 character to find out it is <. Let’s say we do that for each character in the program, in total for a program of length n that is 8*n evaluations, ie the number of evaluation grows linearly with the length of the program. For Artificial General Intelligence, at an estimate length of 100 million bytes, that is 800 million evaluations. If we had 100,000 evaluation running in parallel that is 8000 evaluation each. If each evaluation takes place at 1 per minute that is 134 hours or 6 days, thus making evolution doable. I wouldn’t even call it evolution; it’s more a hill climbing algorithm now.

Well that’s how I think it could be done. Well the last point is not exactly how it would be done. The problem with code is it hard to subdivide because the whole is greater than the sum of its parts, ie evaluating a command element by itself isn’t the same thing as evaluating the command with other elements. So, the question is how can we transform code so that is it is the sum of its parts? I think we can do this partially. The way how I define intelligence is the ability to model functions and optimise them. So, our fitness function for our evolution is simply giving it some functions to model and see how well it can model their behaviour. As you know there is more than one way to write a function, but in this case we are only going to allow the evolving code to model the function one way, ie it must have the same syntax as our functions we are trying to model. This means at some point during the course of running the program it must traverse the same syntax as the functions we are trying to model. A program will be given more fitness depending on how similar it is in syntax to the model it is trying to evaluate. So, now the program being evaluated is the sum of its parts(partially) since a change in syntax will change the fitness function. Not all the code will be like this, where a change in code is a change in fitness evaluation but a significant portion of it will be. For the parts that aren’t like this, the other methods of reducing the search space will prove useful.

But you say, what if the evolved program can only implement those functions it was given as a fitness function? Well, we can prevent that by specifying the smaller the size of the program the better the fitness. This allows for generalization since there isn’t a one to one correspondence in syntax in our evolved program and that of our test cases. Furthermore, we can randomly select functions it can model, so it sees the function so infrequently that it is difficult to build up a memory of that function in particular, maybe we can even generate functions to be modelled on the fly.

Well, thanks for reading. If you want to get involved with this project or generally talk about Artificial General Intelligence, I created a group with 50+ members (very active with sometimes over 10000+ post per week) on slack(this has irc api) and skype. If you will like to join, add me to skype at chester.hercules.grant. I will need your email address to give you an invitation to the slack group. If you don’t want to use skype and want to join the slack group alone, send me an email with your name or pseudoname to chester dot grant at yahoo dot co dot uk.

--

--