A Commentary on the Abstraction and Reasoning Challenge — Kaggle Competition

Mehran Kazeminia
Aug 16 · 7 min read

This competition was hosted by François Chollet.

Image for post
Image for post

This report has been prepared by and .

Currently, Machine learning techniques can only use the patterns that they have already seen. It means that initially certain patterns are set for the machines and then they are exposed to the pertinent data so that they can learn new skills. But like humans, could machine in the future answer the reasoning questions they have never seen before? Could machines learn complex and abstract tasks just from a few examples? This was exactly the theme of the recent abstraction and reasoning challenge, which terminated recently, and it’s one of Kaggle’s most controversial challenges. In this challenge, participants were asked to develop artificial intelligence, within three months, which can solve reasoning questions that they had not seen before. Introducing this contest, Kaggle wrote:

“It provides a glimpse of a future where AI could quickly learn to solve new problems on its own. The Kaggle Abstraction and Reasoning Challenge invites you to try your hand at bringing this future into the present!”

The reasoning questions of this challenge were like the intelligence tests for humans and included simple, medium, and sometimes rather difficult questions. Of course, an ordinary human was able to answer all the questions within an adequate time, and none of the questions were extremely complex. But the challenge was how to train machines all reasoning concepts like; the color change, resize, change the order, etc, to enable them to pass a human intelligence test which they have never been seen before.

The prize for this match was a total of twenty thousand dollars, which was divided between the first three (first three teams). But as guessed; Even the results of whom at the top of the list were not promising. The challenge involved nearly a thousand participants, half of whom did not answer any of the questions correctly. If a team’s algorithm did not work at all, it would get a score of one, and if it could answer a few questions correctly, for example, it would get a score of Ninety-eight hundredths or…. However, only twelve teams were able to score less than 0.90. The following is the final table of match scores for on the top thirty.

Image for post
Image for post
Image for post
Image for post

This match was not a classification challenge, it means that all the answers should be made in the form of the picture (matrix) rather than selected from several visual options that led the competition more complicated. Perhaps, for this reason, those who either thought that they could train machines only by the conventional and classical way, or they could advance the work by speculation, were utterly disappointed. Of course, some participants reasoned out the instances which had simpler solutions and were considered as an exception. It is clear that in the best case they solved only a few numbers of instances and did not have much success.

Although, in this contest, the winners and participants’ ingenuity and effort are admirable, meanwhile at a glance at the scoreboard, it seems that we are still far from the final answer and there was no guarantee that whether the best approach chosen by participants. However, the winners of this contest have generously described their creative method in the following links, and some of them have provided their complete codes.

List of gold medal solutions shared:

by icecuber

by Alejandro de Miquel

by Vlad Golubev

by Ilia

by alijs

by Zoltan

by Andy Penrose

by Maciej Sypetkowski

by Jan Bre

by Hieu Phung

n by Alexander Fritzler

If you are interested in this topic, you can get a lot of information about this challenge on the Kaggle website as well as François Cholet’s Github. Of course, if you want to take your initiative and try your approaches, we have some tips for you; To get started, first study Mr. François Cholet’s article page no.64 on measuring intelligence:

| François Chollet

You can also refer to the Discussion and Notebooks section of this challenge on the Kaggle website and read the recommendations of the host, winners, and all participants directly. Finally, here are some key tips from François Chollet:

How to get started?

fchollet — Competition Host:

If you don’t know how to get started, I would suggest the following template:

Take a bunch of tasks from the training or evaluation set — around 10.
For each task, write by hand a simple program that solves it. It doesn’t matter what programming language you use — pick what you’re comfortable with.
Now, look at your programs, and ponder the following:
1) Could they be expressed more naturally in a different medium (what we call a DSL, a domain-specific language)?
2) What would a search process that outputs such programs look like (regardless of conditioning the search on the task data)?
3) How could you simplify this search by conditioning it on the task data?
4) Once you have a set of generated candidates for a solution program, how do you pick the one most likely to generalize?

You will not find tutorials online on how to do any of this. The best you can do is read past literature on program synthesis, which will help with step 3). But even that may not be that useful :)

This challenge is something new. You are expected to think on your own and come up with novel, creative ideas. It’s what’s fun about it!

Does hard-coding rules disqualify?

fchollet — Competition Host:

You can hard-code rules & knowledge, and you can use external data

Can we “probe” the leaderboard to get information about the test set?

fchollet — Competition Host:

Using your LB score as feedback to guess the exact contents of the test set is against the spirit of the competition. In fact, it is against the spirit of every Kaggle competition. The goal of the competition is to create an algo that will turn the demonstration pairs of a task into a program that solves the task — not to reverse-engineer the private test set.

Further, this is a waste of your time. It is extremely unlikely that you would be able to guess an exact output or an exact task. This is why we decided not to have a separate public and private leaderboard: probing is simply not going to work.

That is because:
1) test tasks have no exact overlap with training and eval tasks (although they look “similar” in the sense that they’re the same kind of puzzle, built on top of Core Knowledge systems)
2) the space of all possible ARC tasks is very large, and very diverse.

So you’re not going to get a hit by either trying everything found in the train and eval set, or by just randomly guessing new tasks. You would have better luck trying to guess the exact melodies of the top 100 pop songs of 2021.

Is the level of difficulty similar in evaluation set and test set?

fchollet — Competition Host:

The difficulty level of the evaluation set and test set are about the same. Both are more difficult than the training set. That is because the training set deliberately contains elementary tasks meant to serve as Core Knowledge concept demonstration.

Can we use data from both the training and evaluation sets in our solutions?

fchollet — Competition Host:

I would recommend only using data from the training set to develop your algorithm. Using data from both the training set and evaluation set isn’t at all against the rules, so could you do it, but it would be bad practice, since it would prevent you from accurately evaluating your algorithms.

The goal of this competition is to develop an algorithm that can make sense of tasks it has never seen before. You’ll want to be able to check how well your algorithm perform before submitting it. For this purpose, you need a set of tasks that your algorithm has never seen, and further, that you have never seen. That’s the evaluation set. So don’t leak too much into information from the evaluation set into your algorithm, or you won’t be able to evaluate it.

Note that the “test” set is a placeholder (copied from the evaluation set) for you to check that your submission is working as intended. The real test set used for the leaderboard is fully private.

Image for post
Image for post
by Maciej Sypetkowski

So everything is ready.
Have a coffee and get started.

Good luck.
Somayyeh Gholami & Mehran Kazeminia

The Startup

Medium's largest active publication, followed by +705K people. Follow to join our community.

Mehran Kazeminia

Written by

Tech Researcher, Solidity Developer, Senior Civil Structural Engineer, Historian, https://www.soliset.com , https://www.farsi.media , https://www.newchains.info

The Startup

Medium's largest active publication, followed by +705K people. Follow to join our community.

Mehran Kazeminia

Written by

Tech Researcher, Solidity Developer, Senior Civil Structural Engineer, Historian, https://www.soliset.com , https://www.farsi.media , https://www.newchains.info

The Startup

Medium's largest active publication, followed by +705K people. Follow to join our community.

Welcome to a place where words matter. On Medium, smart voices and original ideas take center stage - with no ads in sight. Watch

Follow all the topics you care about, and we’ll deliver the best stories for you to your homepage and inbox. Explore

Get unlimited access to the best stories on Medium — and support writers while you’re at it. Just $5/month. Upgrade

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store