The COVID Tracking Project has been one of the most successful citizen-driven data collection projects in history. Driven by The Atlantic and supported by an army of volunteers, it has collected the nuggets of information about testing and case counts, often beating federal and state authorities to the race. Yet sustaining such a project over the long run, especially when primarily driven by volunteer engagement, is quite difficult. And so, after a year, the COVID Tracking Project is shutting down on 07 March.

The good news is that if you have been accessing the COVID Tracking Project’s data via the…


The COVID-19 outbreak is in many ways an outlier. It emerged with unusual speed, spread rapidly throughout the globe and has elicited a public health response that is unprecedented in recent history. Yet this time, humanity faces a pandemic with a wealth of information that would have been unimaginable just a few brief years ago. Writing for ZDNet, Larry Dignan referred to this outbreak as the most visualized pandemic ever, and the sheer number of data sets available — some more accurate, timely and reliable than others — to the general public is staggering. Yet in data, as in all…


In the subtitle of his remarkable history about the race for the nuclear bomb, science writer and historian of science Jim Baggott referred to World War II as the “first war of physics”.

Today, the efforts waged to curb the COVID-19 pandemic may be the first example of a large-scale, global data-driven response to a worldwide crisis, and as such perhaps the first war of data science.

It is difficult to overstate just how much data has become available in an extremely short time, and open science and the networks for sharing clinical and epidemiological data have enabled an unprecedented…


A Data Scientist’s take on defending Machine Learning models

source: https://pasadenaweekly.com/permanent-record-is-accused-spy-edward-snowdens-defense-brief-to-the-american-people/

Introduction

I’ve recently read Edward Snowden’s Permanent Record during my holiday. I think it is a great book that I highly recommend for basically anyone, however it is particularly interesting for IT-folks for the obvious reasons. It is a great story about a guy growing up together with the internet, starts to serve his country in a patriotic fervour after 9/11, and becomes a whistleblower when he notices the US has gone too far violating privacy in the name of security. Moreover, a paradox I found most interesting is something a Data Scientist can easily relate to.

The systems that collect…


Part II — Case Study

Natural Language Processing is a catchy phrase these days

This is Part 2 of a pair of tutorials on text pre-processing in python. In the first part, I laid out the theoretical foundations. In this second part, I’ll demonstrate the steps described in Part 1 in python on texts in different languages while discussing their differing effect arising from different structures of languages.

If you haven’t, you should first read Part 1!

You can check out the code on GitHub!

Relevance

In the first part, I outlined text pre-processing principles based on a framework from an academic article. The underlying goal of all these techniques was to reduce text data dimensionality but keep the relevant information incorporated in the text. In this second part, I will present the effect of the following techniques on two central properties of text, word count and unique word count — the latter representing the dimensionality of text data:

  1. Removing stopwords
  2. Removing both extremely frequent and infrequent words
  3. Stemming…


The year is 1943. Britain’s survival still hangs by a thin, precarious thread, despite America joining the war effort. Just a few months ago, in February 1942, two German battleships, the Scharnhorst and the Gneisenau, alongside the heavy cruiser Prinz Eugen, and their escort fleet executed Operation Cerberus — a daring raid passing through the English Channel from their French port of Brest at the western cape of Normandy. Britain is not merely humiliated — not even the Spanish Armada has been able to cross the Channel in 1588 — , it is also existentially threatened. If it cannot protect…


Part I — Theoretical Background

This is Part 1 of a pair of tutorials on text pre-processing in python. In this first part, I’ll lay out the theoretical foundations. In the second part, I’ll demonstrate the steps described below in python on texts in different languages while discussing their differing effect arising from different structures of languages.

Introduction

Text mining is an important topic for business and academia as well. Business applications include chatbots, automated news tagging and many others. In terms of academia, political texts are often the subjects of rigorous analysis. …


When I was a kid, I drove my parents and teachers absolutely insane by my need to know why things are the way they are. This was particularly tough on my poor chemistry teacher, constantly inundated with questions. Fine, potassium cyanate is toxic, but why? Also, why are nitriles normally not particularly pungent, but isonitriles smell like some eldritch horror a chemically talented Lovecraft would have dreamed up? …


A vector map of Budapest. https://www.shutterstock.com/image-vector/black-white-vector-city-map-budapest-1035519106

Ever wondered how to draw a map of less common geographical areas? And color them based on some data? This pair of tutorials shows how to build this from scratch! First, you need to construct the border of your polygons — Part 1 is about this task. After that you need to create a map, and color those polygons according to some value of your interest. That will be shown in Part 2.

Part 1 of this tutorial is available here.

There are many tutorials on the internet for drawing maps in Python, even more sophisticated maps like heatmaps (where…


A map of Budapest. Source: https://hebstreits.com/product/budapest-hungary-downtown-vector-map/

Ever wondered how to draw a map of less common geographical areas? Perhaps even colour them based on some data? This is the first in a series of two tutorials that show you how to build this from scratch! First, you need to construct the border of your polygons — Part 1 is about this task. After that you need to create a map, and color those polygons according to some value of your interest. That will be shown in Part 2.

There are many tutorials on the internet for drawing maps in Python, even more sophisticated maps like heatmaps…

Starschema Blog

Leveraging technology to support digital transformation across the enterprise.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store