The mechanics of contributing to open source software

Contribute to open source software and get free stickers!

This is intended to be a step-by-step reference for uploading your changes to an open source library, perhaps as the result of a sprint. In Github-speak, we would call this “making a pull request.”

$ git clone <LINK_FROM_FORKED_REPOSITORY_PAGE>

If you already have the fork cloned, run the following in the terminal

$ git pull origin master

3. All changes should be made on your own branch. Create a branch, then “check out” the branch to save your work there

$ git branch…

Helpful Tips for Attending Your First Open Source Sprint

Let’s face it: getting a great score on a Kaggle competition doesn’t require adherence to PEP8 or really any other software development best practices. And yet, code is our craft, and at some point in your career you may want or need to learn to write production-level code. Contributing to open source can be a great way to hone your code skills, learn professional best practices, and give back to the wider tech community.

Great, you say, but where do I start? There’s no “right” way to get involved in open source, and there are plenty of guides to getting…


Survival analysis refers to a suite of statistical techniques developed to infer “lifetimes”, or time-to-event series, without having to observe the event of interest for every subject in your training set. The event of interest is sometimes called the subject’s “death”, since these tools were originally used to analyze the effects of medical treatment on patient survival in clinical trials.

Meanwhile, customer churn (defined as the opposite of customer retention) is a critical cost that many customer-facing businesses are keen to minimize. There is no silver bullet methodology for predicting which customers will churn (and, one must be careful in…


This article takes inspiration from Andrew Gadius’ blog on the same topic, using updated libraries and Python 3 to achieve a similar effect. If you do not already have a working environment, check out Part One of this two-part blog post. UPDATE June 4, 2019: Or get started using this notebook in Google Colab.

Prepare your workspace

In > jupyter notebook or your preferred Python 3 IDE, first import the relevant libraries. In Python:

import geopandas as gpd
import requests
import zipfile
import io
import matplotlib.pyplot as plt
%matplotlib inline # jupyter "magic" to display plots in notebook

Load the data

For this example we’ll download…


Get started with open-source spatial analysis in Python can be tricky. For one, there’s many possible libraries from which to choose, of varying and overlapping functionality but not enough cross-compatibility. Piecemeal and sparse documentation also remains a challenge, and most top blog posts for common questions on Google I’ve found to be using older code conventions.

This two-part article aims to get you started in an geodata-stack that is modern and works. Part One (below) will walk you through installation on Windows. Part Two more or less follows the example of Andrew Gadius’ blog on the same topic to give…


So you saw the latest Stack Overflow chart of popularity of new languages, and — deciding maybe there’s something to this “big data” trend after all — you feel it’s time to get familiar with Apache Spark.

Apache Spark — almost as big a deal as deep learning

Sure, you could get up and running with a few keystrokes on UNIX/MacOS, but what if all you have at home is an old Windows laptop? I tried following the installation instructions from the O’Reilly book Learning Spark (which, like many wonderful tech reference materials, may be available for free from your…

Lauren Oldja

Data Scientist. Social Scientist.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store