The field of Machine Learning and Artificial Intelligence is changing rapidly. Five years ago, classical Machine Learning was the hottest trend; now it’s just like an iPhone 6S — outdated. Deep Learning dominates the market these days, and if you’ll come back to this post in 2025, there’s a good chance we’ve moved way past beyond this (self-note: I bet on Deep Reinforcement Learning).
Being a data scientist requires you to keep up with the latest innovations and discoveries, but there’s so much information coming in from so many directions, it’s easy to get lost in the stream. So what should you focus on? Well, it depends on you — but to get you going, here’s my reading list of AI sources that allows me to keep up to date with the latest innovative ideas. You can give it a shot by going over one or two sources whenever you got a few minutes to kill. …
In case you missed it, there’s a pandemic out there, and it forces all of us to shut down all public events. As time goes by, we all begin to understand the impact of the lockdowns, social distancing and absence of gatherings. One of the things we realized, and by “we” I refer to the Algo group at Taboola, where I work, is the impact this has on those who are just beginning their career path or are about to shift it.
We used to host and attend many data science meetups and conferences, and noticed that many junior data scientists and data scientists-to-be used these gatherings to ask for and receive guidance and unofficial consulting regarding their career paths. And now, when all these are canceled, they have no one to reach out. And so, we came up with a new initiative, which we named Algo Boost (algoboost.me), to allow everyone to schedule a 30-minutes, one-on-one Zoom session with us, and get the guidance they seek. …
This blogpost is now available in Polish too, read it on BulldogJob.pl
About two years ago I published my very first data-science related blogpost. It was about Categorical Correlations, and I honestly thought no-one will find it useful. It was just experimental, and for myself. 1.7K claps later, I’ve learned that I cannot determine what other people will find useful, and I’m quite happy I can assist others on the web like others on the web assist me.
I was also quite new to Python and Github at that time, so I also experimented with writing the code to these categorical-correlations that I wrote about, and publishing it on Github. …
The ROC graphs generating code used in this post is available as part of the dython
library, which can be found on my GitHub page. Examples seen in this post are also available as a notebook.
Assessing the predictions of any machine-learning model is probably the most important task of a Data Scientist — perhaps even more than actually developing the model. After all, while building super complex algorithms is the coolest thing, not knowing how to estimate their output properly is not the coolest thing.
There are several algorithms and tools dedicated to allowing a clearer view of how a model uses the inputs we supply it (like LIME and SHAP), but rarely do we discuss how the output of the model is analyzed — which is exactly what we’ll do today. …
Implementations of all algorithms discussed in this blogpost can be found on my GitHub page.
The Qrash Course Series:
The previous — and first — Qrash Course post took us from knowing pretty much nothing about Reinforcement Learning all the way to fully understand one of the most fundamental algorithms of RL: Q Learning, as well as its Deep Learning version, Deep Q-Network. Let’s continue our journey and introduce two more algorithms: Gradient Policy and Actor-Critic. …
This blog post was originally published on Taboola’s Engineering Blog.
Our core business at Taboola is to provide the surfers-of-the-web with personalized content recommendations wherever they might surf. We do so using state of the art Deep Learning methods, which learn what to display to each user from our growing pool of articles and advertisements. But as we challenge ourselves manifesting better models and better predictions, we also find ourselves constantly facing another issue — how do we not listen to our models. Or in other words: how do we explore better?
As I’ve just mentioned, our pool of articles is growing, meaning more and more items are added each minute — and from an AI perspective, this is a major issue we must tackle, because by the time we finish training a new model and push it to production, it will already have to deal with items that never existed in its training data. In a previous post, I’ve discussed how we use weighted sampling to allow more exploration of items with low CTR (Click-Through Rate) while attempting not to harm the traffic of high CTR items. In this post, I’ll extend this dilemma even further, and discuss how we can allow meaningful exploration of items which our models have never seen before. …
This blog post was originally published on Taboola’s Engineering Blog.
If you happen to write code for a living, there’s a pretty good chance you’ve found yourself explaining another interviewer again how to reverse a linked list or how to tell if a string contains only digits. Usually, the necessity of this B.Sc. material ends once a contract is signed, as most of these low-level questions are dealt with for us under-the-hood of modern coding languages and external libraries.
Still, not long ago we found ourselves facing one such question in real-life: find an efficient algorithm for real-time weighted sampling. As naive as it might seem at first sight, we’d like to show you why it’s actually not — and then walk you through how we solved it, just in case you’ll run into something similar. …
The Tic-Tac-Toe game described in this post, as well as all algorithms and pre-trained models can be found on the tic_tac_toe
repository on my GitHub page.
When I’m being asked to describe what fascinates me so much about Reinforcement Learning, I usually explain that I see it as if I train my computer in the same way I trained my dog — using nothing but rewards. My dog learned to sit, wait, come over, stand, lie down and pretend to be shot at (kudos to my wife), all in the exact same way — I rewarded her every time she did what I asked her to. She had no idea what we wanted from her every time we tried to teach her something new, but after enough trial-and-error, and some trial-and-success, she figured it out. The exact same thing happens when a Reinforcement Learning model is being taught something new. …
Whenever I begin learning a subject which is new to me, I find the hardest thing to cope with is its new terminology. Every field have many terms and definitions which are completely obscure to an outsider, and can make a newcomer’s first step quite difficult.
When I made my first step into the world or Reinforcement Learning, I was quite overwhelmed by the new terms which popped-up every other line, and it always surprised me how behind those complex words stood quite simple and logical ideas. I therefore decided to write them all down in my own words, so I’ll always be able to look them up in case I forget. …
Just recently I posted an introduction post to Reinforcement Learning and Deep Q Networks. But as we all know, there’s a huge difference between understanding the theory and actually implementing it in the real world.
It took me a while to find what would be a worthy first challenge for DQN. Most tutorials I saw implemented a DQN + Convolutional nets, and attempted to design an agent that beats Atari or Doom games. This seemed like a distraction: I didn’t see any point of wasting time on designing image-processing networks to solve a Reinforcement problem. …
About