Image for post
Image for post
Ok, we all work remotely at the moment anyway. But this chair will be waiting for you.

*Modeling lung imaging data using deep neural networks following ResponsibleAI principles. No worries, you don’t need to have prior experience with medical data.

Who we are

ML2DataLab is a research group lead by Przemysław Biecek that brings together research oriented individuals interested in machine learning from the Warsaw University of Technology and the University of Warsaw.

We love open software. We strive to work on current challenges such as ExplainableAI (EMA, dalex, arena, explainable meta learning), COVID (epidemics, XAI for lung images), robustness of NLP models (WildNLP, NER) applications in finance or medical domains (scoring, segmentation).

Who are we looking…


The end of the year is a great time to summarize accomplishments of the team. This year in MI2DataLab we summarized good things that happend in the form of baubles on the christmas tree (yes, this is the only known exception for using 3D plots). Each color of a bauble represents a different kind of result (yellow for articles, red for the package’s releases, blue for conferences, orange for blogs, and light blue for workshops and trainings).

Here is the decorated christmas tree and below I will describe how to make it from the scratch using rgl package for R.

Image for post
Image for post
Christmas Tree with our articles, workshops, software releases


Image for post
Image for post
Risk calculator for severe condition after Covid-19 https://crs19.pl/

TL;DR: If you want to better understand the relationship between some dependent and target variable, you should build many different models (glm, boosting, rf) and compare their PD profiles (e.g. with DALEX).

The CRS-19 (Covid-19 Risk Score) model

Recently, the MOCOS group (MOdeling COronavirus Spread) developed second version of the Covid-19 model for severe condition after being infected with Covid-19. It was built on a sample of over 52 thousands of cases in Poland with a positive PCR test for Covid-19 disease (more about the data later). You can play with the model at https://crs19.pl/.

The main goal of this…


Image for post
Image for post
Number of deaths in consecutive weeks. See the second plot for the whole story.

Recently there have been several blog entries showing excessive number of deaths in different countries.
Recently I discovered that in the eurostat database (1) one can find current data on the number of deaths, (2) this number is broken down by age, gender and geographical area, (3) one can use the ‘eurostat’ package to easily read and plot these data.

It turns out that the difference in the number of deaths by age leads to interesting observations.

Read the data from demo_r_mwk_10 table from eurostat.

library(eurostat)
mdata <- get_eurostat("demo_r_mwk_10")
mdata2010 <- mdata[as.character(mdata$time) >= "2010",]

Do some cleaning in order to…


Joint work with Szymon Maksymiuk and Alicja Gosiewska.

The growing demand for fast and automated development of predictive models has contributed to the popularity of machine learning frameworks. ML frameworks allow us to quickly build models that maximize a selected performance measure. However, it turned out that as the result we are getting black-box models and too often it is difficult to detect certain problems early enough. Insufficiently tested models quickly lose their effectiveness, lead to unfair decisions, discriminate, are deferred by users, do not give the possibility of an appeal.

In order to build models responsibly, we need tools…


Author: Jakub Wiśniewski

Image for post
Image for post

TL;DR

The fairmodels R Package facilitates bias detection through model visualizations. It implements few mitigation strategies that could reduce the bias. It enables easy to use checks for fairness metrics and comparison between different Machine Learning (ML) models.

Longer version

Fairness in ML is a quickly emerging field. Big companies like IBM or Google developed some tools already (see AIF360) with growing community of users. Unfortunately, there aren’t many tools enabling to discover bias and discrimination in machine learning models created in R. Therefore, checking the fairness of the classifier created in R might be a difficult task. …


Image for post
Image for post

Today I had the pleasure to give a workshop on Explanatory Model Analysis at the eRum 2020 conference. The conference was completely online, so were the workshops.

All the materials from my workshop are at http://tiny.cc/eRum2020.

The complete three-hour workshop is summarized in this 8-page long cheatsheet. Special thanks to Anna Kozak for the cover.

Delivering online workshops is different from the classical formula. Here are my experiences.

In the classic workshop it is easy to wander around the room and talk to participants about potential problems. In online formula it is tricky to guess how many more need more…


Image for post
Image for post

TL;DR

The xai2cloud package allows for simple deployment of an R model as a cloud service. Any predictive model from the local R console can be converted into a REST service available remotely in the DigitalOcean cloud. GET and POST methods are created automatically for elementary XAI methods such as Break Down and Ceteris Paribus. There will be more in the future.

Longer version

For years, I have been suffering from the fact that R objects created in the R console cannot be easily governed, managed nor shared between computers. To work around this problem, from 2013 we are developing…


The new version of modelStudio has recently been released on CRAN.
modelStudio is an R package that automates the exploration of ML models and allows for interactive examination. It works in a model agnostic fashion, therefore is compatible with most of the ML frameworks (e.g. mlr/mlr3, xgboost, caret, h2o, scikit-learn, lightGBM, keras/tensorflow).

Recently, we have uploaded to arXiv an article presenting the main principles behind this tool: The Grammar of Interactive Explanatory Model Analysis. Here are the highlights.

Image for post
Image for post
The first generation of model explanations aims at exploring individual aspects of a model behaviour. The second generation of model explanation aims at integration of individual aspects into a vibrant and multi-threaded customisable story about the model that address the needs of different stakeholders.

Local and global level model explanations complement each other. There is an increasing number of voices arguing that a single method of…


Image for post
Image for post
The Arena performs detailed comparative analysis of the ML models regardless of their internal structure or the language in which they are trained.

TL;DR: Piotr Piątyszek from MI2DataLab developed a new R package for interactive juxtapositioning of multiple ML models: http://arenar.drwhy.ai/

Most predictive ML models are based on a simple assumption: the future will be similar to the past. We can learn some relations on historical data and use them to predict the future.

The COVID19 pandemic shows us how fragile this assumption is.

Explainability is now more important than ever because without understanding how black box ML models work we risk meaningless predictions due to data drift, out of distribution errors or other issues.

As part of the DrWhy initiative, we are…

Przemyslaw Biecek

Interested in innovations in predictive modeling. Posts about eXplainable AI, IML, AutoML, AutoEDA and Evidence-Based Machine Learning. Part of r-bloggers.com.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store