Bioinformatics 1: K-mer Counting

A challenging yet intriguing interdisciplinary problem

Published in

The Startup

4 min readJul 2, 2020

Image by PublicDomainPictures from Pixabay

K-mer counting is an interesting yet challenging problem in bioinformatics. In this article, we’ll talk about what k-mers are, the problem of k-mer counting, its applications, and some interesting insights from the computer science perspective.

What are k-mers?

In simple terms, k-mers are substrings of length k in a given string (can be DNA, RNA, protein, or any string sequence). Since our interest is towards bioinformatics, we will converge our attention to k-mers in a DNA sequence.

Consider the DNA sequence “ACGAGGTACGA” which consists of 11 nucleotides. Let’s try to obtain all the 4-mers (substrings of length 4) in this DNA sequence.

The idea is simple. We create a window of length 4 and slide it from left to right, shifting one character at a time. If the length of the given DNA sequence is N, we would end up with N - k+1 k-mers.

Total no. of k-mers = N - k + 1

In the above example, the given DNA sequence is 11 characters long (N=11) and k = 4, thus we get eight…

Bioinformatics 1: K-mer Counting

A challenging yet intriguing interdisciplinary problem

What are k-mers?

Written by Gunavaran Brihadiswaran