EuroPython 2018 for data scientist.

Kamil Kaczmarek
neptune-ai
Published in
4 min readJul 23, 2018

This week, Monday to Sunday, Edinburgh will host quite large group of Python gurus, professionals and enthusiasts. The EuroPython 2018 is a central Python community event in Europe. This short article introduces selection of talks and trainings that sound like a time well spent.

Is there anything relevant to the data scientist?

There is a lot to see and explore, simply because EuroPython has PyData track. It is — in short — track for Pythonists (or aspiring Pythonists 😉) who develop or use open source data science and machine learning tools.

Below check, intentionally subjective, two lists of talks and trainings. First list is about the Python programming language and cover topics that can become handy for every intermediate level coder who is willing to improve her skills. Second list is pure data science and represent just a small fraction of what PyData Europe can offer.

At the end, our contribution to the EuroPython is briefly introduced 😊.

EuroPython 2018 conference logo

EuroPython Python talks and trainings — improve your engineering skills.

  1. Ridiculously Advanced Python (training) is there to show you how to advance from intermediate level Python programmer. Properties, class decorators, annotations, data classes 👉 all are advanced features of the Python language. Tutorial’s source code is available on GitHub, published by Francesco Pierfederici.
  2. Get your documentation right is a training session that focuses on the documentation structure. GitHub open source survey clearly states: Documentation is highly valued, but often overlooked. Indeed, 93% of the 2017 survey edition respondents reported that: incomplete or outdated documentation is a pervasive problem. Have to confess, that our open source initiatives (steppy and steppy-toolkit) also require more attention the docs 😜.
  3. Domain-Driven Design Patterns in Python (talk). DDD in short is an approach to software development that puts problem domain first. Modeling and understanding the problem leads to clean and less complicated systems. Complex machine learning projects that involve feature extraction from multiple sources, modeling, ensembling and post-processing may benefit from DDD approach. Food for thought for data scientist who is looking for high level ideas about software design.
PyData at EuroPython 2018, conference logo

PyData track — improve your data skills

For data science community, PyData track is a main reason to attend EuroPython. Talks ranging from reinforcement learning to privacy issues suggest rather fun time in Edinburgh 😄. Talks mentioned below is just a tip of the iceberg. Full listing is available on the EuroPython site.

  1. Best Practices for a Blazing Fast Machine Learning Pipeline is a training session that guides you from preprocessing all the way to the cloud deployment. Seems interesting mainly because it will cover Options for scaling and pipelining. Ability to develop end-to-end pipelines should be under the belt of any data scientist. For pipelining purposes, I use steppy library which — together with Kuba Czakon — we maintain as open source. I’m very interested in David’s (tutorial author 😉) experiences in developing pipelines…
  2. Deep Learning with PyTorch for Fun and Profit will be rather fun talk build around all cool ideas in deep learning like style transfer or speech generation.
  3. Alexander Hendorf proposes Data Wrangling & Visualisation with Pandas, Jupyter & Dask training session 👉 gentle introduction to EDA with Pandas and Jupyter Notebooks. Dask is also mentioned as a computing library that let users scale or parallelize their analyses.
  4. Privacy for Data Scientists (talk). Name speaks for itself. Because of GDPR, which came into force in May 2018, European data scientists should practice data science with attention to data privacy.

Finally, let me introduce our contribution to the EuroPython: Best practices for elegant experimentation in data science projects (case study). Here, we use Mapping Challenge 🌐, recent machine learning competition, to show our daily practices and project results. It was an instance segmentation challenge and the goal was to segment roofs on satellite images like these below:

example images

All of our work is open, that is: source code, solution write-up (where we present both good and bad ideas) and experiments results. The goal of our work was to create entirely open solution to the complex deep learning problem. Below, check visualization of the custom loss function 👉 this idea worked particularly well. In principle this loss use additional information about the distances between buildings and about their size.

1st column is input image, 2nd column is mask, 3rd column visualizes distances between buildings (darker color is higher value), 4th column visualizes weight assigned to the roof (smaller roofs are assigned higher values, background is fixed to black).

Happy to present results of the hard work of Kuba Czakon, Andrzej Pyskir, Piotr Tarasiewicz and me. My slot is on Friday, July 27th — feel free to come by to talk :)

Feel free to contact me at any time with either feedback or questions.

--

--

Kamil Kaczmarek
neptune-ai

I’m systems neuroscience researcher turned data scientist with intrinsic desire to learn new things • github.com/kamil-kaczmarek • Kaggle Master • neptune.ai