*Modeling lung imaging data using deep neural networks following ResponsibleAI principles. No worries, you don’t need to have prior experience with medical data.
Who we are
ML2DataLab is a research group lead by Przemysław Biecek that brings together research oriented individuals interested in machine learning from the Warsaw University of Technology and the University of Warsaw.
We love open software. We strive to work on current challenges such as ExplainableAI (EMA, dalex, arena, explainable meta learning), COVID (epidemics, XAI for lung images), robustness of NLP models (WildNLP, NER) applications in finance or medical domains (scoring, segmentation).
Who are we looking…
The end of the year is a great time to summarize accomplishments of the team. This year in MI2DataLab we summarized good things that happend in the form of baubles on the christmas tree (yes, this is the only known exception for using 3D plots). Each color of a bauble represents a different kind of result (yellow for articles, red for the package’s releases, blue for conferences, orange for blogs, and light blue for workshops and trainings).
Here is the decorated christmas tree and below I will describe how to make it from the scratch using rgl package for R.
TL;DR: If you want to better understand the relationship between some dependent and target variable, you should build many different models (glm, boosting, rf) and compare their PD profiles (e.g. with DALEX).
The CRS-19 (Covid-19 Risk Score) model
Recently, the MOCOS group (MOdeling COronavirus Spread) developed second version of the Covid-19 model for severe condition after being infected with Covid-19. It was built on a sample of over 52 thousands of cases in Poland with a positive PCR test for Covid-19 disease (more about the data later). You can play with the model at https://crs19.pl/.
The main goal of this…
Recently there have been several blog entries showing excessive number of deaths in different countries.
Recently I discovered that in the eurostat database (1) one can find current data on the number of deaths, (2) this number is broken down by age, gender and geographical area, (3) one can use the ‘eurostat’ package to easily read and plot these data.
It turns out that the difference in the number of deaths by age leads to interesting observations.
Read the data from demo_r_mwk_10 table from eurostat.
mdata <- get_eurostat("demo_r_mwk_10")
mdata2010 <- mdata[as.character(mdata$time) >= "2010",]
Do some cleaning in order to…
The growing demand for fast and automated development of predictive models has contributed to the popularity of machine learning frameworks. ML frameworks allow us to quickly build models that maximize a selected performance measure. However, it turned out that as the result we are getting black-box models and too often it is difficult to detect certain problems early enough. Insufficiently tested models quickly lose their effectiveness, lead to unfair decisions, discriminate, are deferred by users, do not give the possibility of an appeal.
In order to build models responsibly, we need tools…
Author: Jakub Wiśniewski
The fairmodels R Package facilitates bias detection through model visualizations. It implements few mitigation strategies that could reduce the bias. It enables easy to use checks for fairness metrics and comparison between different Machine Learning (ML) models.
Fairness in ML is a quickly emerging field. Big companies like IBM or Google developed some tools already (see AIF360) with growing community of users. Unfortunately, there aren’t many tools enabling to discover bias and discrimination in machine learning models created in R. Therefore, checking the fairness of the classifier created in R might be a difficult task. …
All the materials from my workshop are at http://tiny.cc/eRum2020.
The complete three-hour workshop is summarized in this 8-page long cheatsheet. Special thanks to Anna Kozak for the cover.
Delivering online workshops is different from the classical formula. Here are my experiences.
In the classic workshop it is easy to wander around the room and talk to participants about potential problems. In online formula it is tricky to guess how many more need more…
The xai2cloud package allows for simple deployment of an R model as a cloud service. Any predictive model from the local R console can be converted into a REST service available remotely in the DigitalOcean cloud. GET and POST methods are created automatically for elementary XAI methods such as Break Down and Ceteris Paribus. There will be more in the future.
For years, I have been suffering from the fact that R objects created in the R console cannot be easily governed, managed nor shared between computers. To work around this problem, from 2013 we are developing…
The new version of modelStudio has recently been released on CRAN.
modelStudio is an R package that automates the exploration of ML models and allows for interactive examination. It works in a model agnostic fashion, therefore is compatible with most of the ML frameworks (e.g. mlr/mlr3, xgboost, caret, h2o, scikit-learn, lightGBM, keras/tensorflow).
Recently, we have uploaded to arXiv an article presenting the main principles behind this tool: The Grammar of Interactive Explanatory Model Analysis. Here are the highlights.
Most predictive ML models are based on a simple assumption: the future will be similar to the past. We can learn some relations on historical data and use them to predict the future.
The COVID19 pandemic shows us how fragile this assumption is.
Explainability is now more important than ever because without understanding how black box ML models work we risk meaningless predictions due to data drift, out of distribution errors or other issues.
As part of the DrWhy initiative, we are…