Currently a biological data scientist blogging about side projects and things learned through brute force.

Select Hidden Markov Models (HMM) by String Searching

In bioinformatics, hidden Markov models (HMMs) are used to identify sequences that may be evolutionarily similar in terms of function. Hidden Markov models try to assign X (in this example, function) given the observed Y (amino acid sequence) or for the Bayesian purists: P(X | Y). In the logo above, other genes with the function of interest (metallophos) likely contain D (Aspartic Acid) at position 8 and H (Histidine) at position 10. If those two amino acids are present in an unlabeled gene then the confidence in calling the function of this new gene as a metallophos increases. …

And they contain some nasty genes that can help the bacteria thrive

Open source article related to story:

Nontuberculous mycobacteria (NTM) are bacteria that cause NTM lung disease. Additionally, NTM has been linked to diseases outside the lung with some studies suggesting a link between NTM and Crohn’s disease. NTM are commonly found in the natural and built environment such as soils and in shower heads.

A major question is how bacteria found everywhere become a bacteria that can cause illness? One potential mechanism is a virus that carries bacterial genes. These viruses are called bacteriophages or phages for short. Phages only infect bacteria. This means phages will not cause any…


Some helpful tips to build your personal data science portfolio

Data science is the intersection of statistics, programming, and story-telling. Fundamentally, a data science role derives value from the ability to create actionable insights. The responsibilities of a data scientist differ from those of a research scientist in academia. Industry tasks move at a faster pace and are less rooted in ambiguity than traditional research. This article will provide guidance to those looking to make the jump from academia to a role as a data scientist.

1: Keep learning

What makes an adept data scientist is the ability to adapt to new technologies and information. Continued learning of new techniques and technologies will…

Exploring relative data in pie charts using matplotlib, plotly, and python

For those familiar with the movie Anchorman, the plot of the movie revolves around the shenanigans of various characters as a women breaks the glass ceiling to host the local than national news. Apple’s TV series, The Morning Show, also uses the shattering of on air television norms as plot device. According to US Census data, women outnumber men in the United States 50.8% to 49.2%. Seeing women on cable news networks has become normal for some time, however, given the renewed focus this year on racial justice how represented are minorities as on air correspondents across the networks? In…

Hands-on Tutorials

How to improve vision model performance by reshaping and resampling data

The popularization of machine learning has changed our world in wonderful ways. Some notable applications of machine learning allow us to do the previously unthinkable, like determining if an image is a hot dog or not a hot dog.

A network analysis of the 2020 college football schedule using NetworkX and musings about difficult decisions in college football

When Notre Dame paused in-person learning on Aug. 17th, it was a response to an explosion of positive cases of COVID-19. I questioned how this move would affect the continuation of football activities if the campus was closed to students. Thankfully, after returning to in-person learning, the numbers on the Notre Dame dashboard show positive tests have fallen and remained stable. However, the threat of another cluster remains, especially as classes and football present a taste of normalcy. Other football powerhouses did not take such swift action. For example, the University of Alabama Tuscaloosa (aka Alabama) continued in person learning…

A population pyramid animation using Seaborn and Celluloid

The effects of the 2008 US housing market crash are still rippling through the real economy. Many in Generation X and Millennials saw their accumulated equity vanish. Those close to or in retirement witnessed one of their most trusted assets, their property, fall off a precipice. Across all age groups, people suffered financial difficulties. Those were able to afford a new property or retain owned assets saw substantial gains in the last decade, not only recovering to pre-crash highs but in some areas doubling or even tripling in value. Meanwhile, a perfect storm of debt, a dwindling number early career…

A Tutorial to Create Star Charts using Diamonds and Matplotlib

Diamonds are a data scientist’s best friend. More specifically, the diamond dataset found on Kaggle. In this article, I will walk through a simple workflow to create a Star Chart (aka Spider Chart or Radar Charts). This tutorial was adapted from the wonderful workflow of Alex at Python Charts. All of the code in this article and the needed dataset are available on GitHub.

Diamonds are a Data Scientist’s Best Friend

To begin you will need a few libraries. I am running Python 3. I created this workflow using a Jupyter notebook, pandas, matplotlib, and numpy. …

Age Classification using Deep Learning on photos with and sans beard

Machine learning applied to computer vision is a growing field in data science and some interesting problems have been solved using these methods. Computer vision is identifying poachers and bringing them to justice as well as powering the wave of self-driving cars. In this article, I am using computer vision for a more irrelevant reason, age identification of myself with and without a beard. I incorporate a lot of my code and reasoning from prior posts that you should definitely check out if you are interested in age prediction using machine learning. OpenCV is a must if your interested in…

An NBA style lottery system written as a web application

COVID-19 has affected sports and, by extension, fantasy sports. However, our fantasy football league is still primed and ready to draft for an uncertain season. Our league is competitive and to ensure no one tanks for the first pick, we implemented an NBA style lottery system to allow everyone an opportunity to get the coveted first pick. In this article, I will explain how I created a web application for our fantasy league to implement an NBA style lottery system.

NBA style lottery (weighted lottery)

The NBA adopted a weighted lottery system in 1990 to give the team with the worst record the best chance…

