Hackathon overview

Tomas Gogar
Rossum
Published in
2 min readJul 15, 2017

Hackathon is over, and we want to thank everyone who came and had a great time with us! More than 20 people participated in the networking part, and we hope they enjoyed it and met some new friends as we did.

Seven brave men and one woman accepted the Focused crawling challenge and formed two teams (for simplicity named A-Team and A-Star Team :-) ). The goal was to create a Focused crawling algorithm that can find publicly available invoices on the Internet. The teams were supposed to feed our API with found URLs, and our algorithms automatically classified whether the URL represented an invoice or not.

Most of the participants decided to use provided Bing Search API and tuned their “query generation” algorithms. The results are in the figure below. The winning A-Team pushed 8996 URLs from which 1826 were identified as invoices (i.e. 20,3% precision rate). The second team, A-Star, uploaded 5320 URLs which results in 376 unique invoices (7,1% precision rate). Interesting note: both teams upload only 48 duplicate invoices.

Results of Focused Crawling Challenge

It’s also worth mentioning, that one of the participants Karel Ha didn’t want bother himself with crawling the Internet and decided for a “moonshot” strategy — generating artificial invoices with Generative Adversarial Networks (GANs). However, few hours of GPU time and 10 000 iterations do not seem to be enough to fool our invoice classifiers, but the result looks pretty cool (see. image below).

“Invoice” generated by Karel Ha

We want to thank all the participants, especially to those who accepted the challenge:

A-Team (Winners): Elnaz Babayeva, Yury Kasimov, Miroslav Spousta, Petr Zika
A-Star Team: Karel Ha, Petr Marek, Roman Long, Stepan Prochazka

--

--