Recently, I’ve come across a post asking how likely one is to meet someone with COVID-19 in a group of a certain size. In this quick post, I would like to describe two approaches for computing this number.

Assuming we know the probability P(positive) of a random individual in the population being positive, we can compute the probability that at least one individual in a group of size N is positive using the following formula:

The above equation says that the probability of at least one individual in a group of size N being positive is equal to 1 minus…

Visual search has been around for a while as a part of Google Images or Pinterest Lens. As it’s becoming more and more popular in e-commerce helping merchants boost their sales by allowing customers to simply upload what they are looking for instead of going through a load of attribute filters, I decided to take a look at how one could go about building such visual search engine from scratch and then using AnnDB.

Architecture overview

Let’s have a brief look at what we’ll need in order to provide the most basic form of visual search service.

Our service exposes HTTP API…

In this article I want to give you an overview of a RNN model I built to forecast time series data. Main objectives of this work were to design a model that can not only predict the very next time step but rather generate a sequence of predictions and utilize multiple driving time series together with a set of static (scalar) features as its inputs.

Model architecture

On a high level, this model utilizes pretty standard sequence-to-sequence recurrent neural network architecture. Its inputs are past values of the predicted time series concatenated with other driving time series values (optional) and timestamp embeddings…

In this article I’d like to show you a model I used for the Quora question pairs competition. First, I’ll describe a Decomposable Attention Model for Natural Language Inference (Parikh et al., 2016) and then extend it with a convolutional layer to improve loss and classification accuracy. For the purpose of this article I’ll use the Stanford NLI corpus for comparison.


Natural language inference (NLI) and paraphrase detection are one of the main research topics in natural language processing. Natural language inference refers to a problem of determining entailment and contradiction between two statements and paraphrase detection focuses on determining…

I recently found this publicly available dataset of credit card transactions on Kaggle so I thought it might be interesting to play with it a bit and see how good classification results can I get. In this article I’d like to share with you how to overcome imbalance in target classes, how to choose right metrics for your model as well as results I came up with.

Looking at the data

As the first step we’ll load our dataset into a Pandas data frame and print out some basic statistics about individual columns. This tells us whether we’ll need to normalize values before feeding…

It’s been a while since I got an idea to try to estimate the object distance using just a single camera and machine learning. First I have to admit that this approach can only work under controlled conditions and it’s pretty difficult to get it working for new objects. If you are interested in how it’s done for real-world use cases then check following papers monocular distance estimation, stereoscopic distance measurement.


This is a less interesting part of this project but let me quickly describe what hardware I’ve used and how it all fits together. Signal from RaspberryPi camera is…

Marek Galovič

CTU Prague, Former Data Scientist @Shopify. I like data, machine learning and algorithms.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store