Using scikit-learn to quickly build and evaluate regression models

Over the past couple of weeks, I’ve been slowly analyzing with some publicly available data on the number of bicyclists that cross the Fremont Bridge in Seattle, WA (part 1 and part 2). I say slowly because I’ve simultaneously been taking advantage of some Q3 hiring increases and applying for lots of jobs. I’ve even managed to land my first interview since I started looking for jobs in March! Needless to say, it’s been a long few months, and I’m happy to be making some progress.

Last week, I merged the bicycling data with weather data from the Dark Sky…


Using public data from the City of Seattle and Dark Sky API to visualize biking trends

Last week, I started analyzing some data tracking bicycle ridership from the City of Seattle. As a biker myself, I was curious to know how much the weather impacts the number of bicyclists deciding to ride for fun or for commuting back and forth to work each day. In my previous post, I showed that the number of bicyclists is higher during the summer months, which is a pretty good indicator in itself, but I wanted to dive into this with more data and compare the ridership data to weather data directly. …


City-owned bicycle counters provide unique data for transportation planning

As the summer weather creeps toward consistently beautiful here in Seattle, I’ve been hitting the roads and trails on my bike more frequently. One of my favorite recreational routes involves crossing the Fremont Bridge, which has dedicated bike and pedestrian lanes. I recently noticed on the southbound side that there is a sign with a digital readout of the number of bicyclists who have crossed the bridge so far that day. …


Using the sequence alignment software wrappers in Biopython

Last week I started playing around with some bioinformatics tools in Python with the library Biopython. In my previous post, I introduced the field of bioinformatics and provided an example of downloading data from GenBank with Biopython’s API interface. Today, I want to move to a typical next step for analyzing DNA sequence data — the alignment process. I’ll give an introduction to sequence alignments, and then give a brief example of using Biopython to filter some data and run alignment software.

Introduction to Sequence Alignments

When working with biological sequence data, either DNA, RNA, or protein, biologists often want…


Accessing molecular biology data through the browser and with Biopython

As I mentioned a few posts ago, I’ve been working on a couple manuscripts for publication from my dissertation research in plant evolutionary biology. Since I’ve been learning more and more data science skills, I’ve been revisiting some of the common tasks I did to manipulate and analyze biological data. Back then, I only used clunky GUI programs and specialized command line programs for these tasks. Now I have a lot more confidence in using programming languages, mainly Python and R, and I’ve realized that many of the tasks that took me hours to complete manually with point and click…


Using Facebook Prophet to predict changes in median home values for nearly 15,000 US communities over 5 years

A few months ago I collaborated on a project to identify the most profitable investment opportunities in real estate across the United States. The project was open-ended — we were given the data and asked to provide some business-oriented insights using time series analyses. We assumed the roles of data scientists for a hypothetical real estate investment firm and dove into the data. In this post, I’ll give a high-level overview of our goals and methods of the project and its results.

Objective

Smart (and lucky) investments in real estate can be highly profitable. The practice of buying homes or…


Renewed positivity and outlook from the first half of this well-written book by Emily Robinson and Jacqueline Nolis

Since early March I’ve been actively applying for data science jobs in Seattle. I had just finished my data science bootcamp with Flatiron and I was very excited to get out there and find my dream job. Unfortunately, early March was also the beginning of the COVID-19 crisis in the US. The crisis has significantly dampened the job market for data scientists, but luckily not as badly as some other careers. …


Developing a blind study for testing generative model performance

Over the past couple of weeks I’ve been continuing work on my project called BeatMapSynth, a program for generating custom user content for the VR game Beat Saber. Although the program has been available for users to download and use themselves for a couple months, I still wanted to test the models on users other than myself and a few friends. In this post, I’m going to talk about the challenges of evaluating a generative model such as this and present my plan for a user experience survey. …


Making an interactive GUI using Python and PyQt5

Last week, I introduced PyQt5 as a way to create graphical user interfaces (GUI) for Python programs. For that post, I focused on using QtDesigner for creating the backbone of the GUI, which resulted in a custom Python class of objects such as text boxes, labels, radio buttons, and clickable buttons. When an instance of this class is called, it creates the GUI, but at that point, all of the interactive features were dead — they weren’t connected to the actions, or functions, that actually make those objects do things! …


Laying out the framework for a user-friendly Python program

My project BeatMapSynth is starting to get some interest from Beat Saber players, which is awesome, but as a result I’ve realized that setting up a Python environment and running BeatMapSynth from the command line just isn’t easy enough for many users. Since one of the main goals for BeatMapSynth was to allow users to easily create their own custom content, I really want to streamline the installation and use of my program. This is going to be the first post in a series showing how I’m taking a Python script and turning it into a cross-platform, GUI program.

PyQt5…

Wyatt Sharber, PhD

Data scientist and plant evolutionary biologist. Seattle, WA, USA.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store