Using scikit-learn to quickly build and evaluate regression models

Over the past couple of weeks, I’ve been slowly analyzing with some publicly available data on the number of bicyclists that cross the Fremont Bridge in Seattle, WA (part 1 and part 2). I say slowly because I’ve simultaneously been taking advantage of some Q3 hiring increases and applying for lots of jobs. I’ve even managed to land my first interview since I started looking for jobs in March! Needless to say, it’s been a long few months, and I’m happy to be making some progress.

Last week, I merged the bicycling data with weather data from the Dark Sky API and looked at correlations between features in the two datasets. I decided on a set of features to use for my modeling, and this week, I use the dataset to test a few regression methods. This is another short update of the progress on this project, but it’s been a great refresher on using the standard Python data science toolkit of pandas, scikit-learn, and matplotlib/seaborn. …

Using public data from the City of Seattle and Dark Sky API to visualize biking trends

Last week, I started analyzing some data tracking bicycle ridership from the City of Seattle. …

City-owned bicycle counters provide unique data for transportation planning

As the summer weather creeps toward consistently beautiful here in Seattle, I’ve been hitting the roads and trails on my bike more frequently. …

Using the sequence alignment software wrappers in Biopython

Last week I started playing around with some bioinformatics tools in Python with the library Biopython. In my previous post, I introduced the field of bioinformatics and provided an example of downloading data from GenBank with Biopython’s API interface. Today, I want to move to a typical next step for analyzing DNA sequence data — the alignment process. I’ll give an introduction to sequence alignments, and then give a brief example of using Biopython to filter some data and run alignment software.

Introduction to Sequence Alignments

When working with biological sequence data, either DNA, RNA, or protein, biologists often want to be able to compare one sequence to another in order to make some inferences about the function or evolution of the sequences. Just like you wouldn’t want to use data from data tables where data was in the wrong column for analyses, in order to make robust inferences from sequence data, we need to make sure our sequence data is well organized or “aligned.” Unfortunately, sequence data does not come with nice labels, like a date, miles per gallon, or horsepower. Instead, all we have is the position number in the sequence, and that is relative to that sequence only. Luckily, many sequences are highly conserved or similar between related organisms (and all organisms are related to some degree!). If we’re fairly certain that we’ve obtained data from the same sequence from multiple organisms, we can put that data into a matrix that we call an alignment. If you’re only comparing two sequences, it’s called a pairwise alignment. If you’re comparing three or more sequences, it’s called a multiple sequence alignment (MSA). …

Accessing molecular biology data through the browser and with Biopython

As I mentioned a few posts ago, I’ve been working on a couple manuscripts for publication from my dissertation research in plant evolutionary biology. Since I’ve been learning more and more data science skills, I’ve been revisiting some of the common tasks I did to manipulate and analyze biological data. …

Using Facebook Prophet to predict changes in median home values for nearly 15,000 US communities over 5 years

A few months ago I collaborated on a project to identify the most profitable investment opportunities in real estate across the United States. The project was open-ended — we were given the data and asked to provide some business-oriented insights using time series analyses. We assumed the roles of data scientists for a hypothetical real estate investment firm and dove into the data. In this post, I’ll give a high-level overview of our goals and methods of the project and its results.


Smart (and lucky) investments in real estate can be highly profitable. The practice of buying homes or property in quickly growing markets and reselling those at a higher price for a profit at a later date is known as “house flipping.” We aimed to identify real estate markets across the United States that were predicted to grow rapidly in the next five years so that investors could buy property in those areas now at a low price and then sell the property for profit in the future when the price had appreciated significantly. Specifically, we wanted to identify the top five real estate markets in the US (by Zip Code) with the highest percent return on investment after 5 years. …

Renewed positivity and outlook from the first half of this well-written book by Emily Robinson and Jacqueline Nolis

Since early March I’ve been actively applying for data science jobs in Seattle. I had just finished my data science bootcamp with Flatiron and I was very excited to get out there and find my dream job. …

Developing a blind study for testing generative model performance

Image for post
Image for post

Over the past couple of weeks I’ve been continuing work on my project called BeatMapSynth, a program for generating custom user content for the VR game Beat Saber. …

Laying out the framework for a user-friendly Python program

My project BeatMapSynth is starting to get some interest from Beat Saber players, which is awesome, but as a result I’ve realized that setting up a Python environment and running BeatMapSynth from the command line just isn’t easy enough for many users. Since one of the main goals for BeatMapSynth was to allow users to easily create their own custom content, I really want to streamline the installation and use of my program. This is going to be the first post in a series showing how I’m taking a Python script and turning it into a cross-platform, GUI program.

PyQt5 and…


Wyatt Sharber, PhD

Data scientist and plant evolutionary biologist. Seattle, WA, USA.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store