Finding (real) peaks in your signal with SciPy and some common-sense tips

Piero Paialunga
Analytics Vidhya
Published in
5 min readNov 9, 2020

--

If you are a data scientist, one of your typical task is to analyze a certain signal and find its peaks. This may be your goal for a ton of reasons, but the bigger one is that peaks somehow tend to show a certain property of your signal. It is not surprising that a library that helps you finding peaks does already exist in Python, and it is SciPy (with its find_peaks function). Nonetheless it is important to use this powerful library wisely to get exactly what we want and to do so we need to make our definition of “peak” more explicit.

What are you looking for?

In the example I’m going to show a peak is a star :)

In the astrophysics domain, the Hubble telescope furnishes the image file data of the Tucanae star . It looks like this:

Mathematically speaking, it is just a matrix of N rows and M columns. Each number of the matrix represents the intensity of the pixel. As we would like to have a star to be as closed as possible to a pixel we would like to:

A) If two peaks are too close to each other we may want to consider them as one single peak

B) Not to have higher values of intensity too close to the peak we are considering (i.e. if our peak is close to 1000, we don’t want to have a 2000 peak close to this one)

C) We may want that peak to decay to a certain value (0 in this example) in both ends of the range we are studying

As it is possible to see, there is no astrophysics in these condition, as they are common sense consideration about what it really is a “peak”. They may also sound trivial but it is important to consider them in our code. Speaking of, let’s start with the Python talk!

Import your wizards and name your elves.

In this specific example, you may use this setup:

import astropy
from astropy.io import fits
import pandas as pd
import matplotlib.pyplot as plt
import matplotlib.pyplot as plt
import matplotlib.colors as mcolors
from matplotlib import cm
import numpy as np
import math
import seaborn as sns
from scipy.misc import electrocardiogram
from scipy.signal import find_peaks
hdu_list= fits.open('imagefilepath')
img_data=hdu_list[0].data

Use find_peaks

Let’s say we are interested in detecting the 1000 height peak. Find peaks is a powerful tool to do that, but it does include the A, B and C evil peaks. Anyway it is a good start. Fix a certain Y on your image and find your peaks with find_peaks. Generally you may want to do that for all the Y-es.

HEIGHTS_fixed_y=[]
X_HEIGHTS=[]
Y=[]
for y in range(len(img_data)):
x = img_data[y]
peaks, _= find_peaks(x,height=(950, 1050))
if len(peaks)!=0:
for i in range(len(peaks)):
HEIGHTS_fixed_y.append(x[peaks].tolist()[i])
X_HEIGHTS.append(peaks.tolist()[i])
Y.append(y)
#X_HEIGHTS save all the x coordinates of the peaks
#HEIGHTS_FIXED_y save all the heights of the peaks (they will be between 950 and 1050)
#Y save all the y coordinates of the peaks
thousand=pd.DataFrame()
thousand['Heights']=HEIGHTS_fixed_y
thousand['$X_{heights}$']=X_HEIGHTS
thousand['Y']=Y

A) Deleting the peaks that are too close to each other

Let’s say Y=400. You may have that at this coordinate, X=20 is a peak and X=9800 is a peak too, and that is ok. But if X=20 is a peak and X=24 is a peak too, then we are not doing things right! Let’s say that the Xes for a fixed Y has to be separated by at least 10 pixels, and delete them if it is not the case.

Not-to-be-taken peak, class A)
y_unique_values=list(set(thousand.Y.tolist()))
wrong_y=[]
for y in y_unique_values:
wrong_x=thousand[thousand['Y']==y]['$X_{heights}$'].tolist()
differences=[]
if len(wrong_x)!=1:

for x in range(len(wrong_x)-1):
first_value=wrong_x[x]
for j in range(x+1,len(wrong_x)):
difference=abs(first_value-wrong_x[j])
differences.append(difference)
for d in differences:
if d<=10:
wrong_y.append(y)
for y in wrong_y:
thousand=thousand[thousand.Y!=y]
thousand=thousand.reset_index().drop(columns=['index'])

B) It is not a peak if there is something higher

Not-to-be-taken peak, class B)

If you want to peak height to be close to 1000, you can’t accept that peaks with height close to 2000 are too close to the x of your peak. Get rid of them with this code:

not_peak_index=[]
for i in range(len(thousand)):
y_values=img_data[thousand.Y.tolist()[i]].tolist()
x_min,x_max=thousand['$X_{heights}$'].tolist()[i]-10,thousand['$X_{heights}$'].tolist()[i]+10
range_values=y_values[x_min:x_max]
binary=0
for value in range_values:
if value > thousand['Heights'].tolist()[i]:
binary=binary+1
if binary!=0:
not_peak_index.append(i)

C) Ok peak, you had your moment, now go back to sleep.

You may want that your peak decays to 0 at both ends of your range (i.e. a situation like the one below is not acceptable)

Not to be taken peak, class C)

Detect the peaks that actually go to 0 with this code:

not_zeros_index=[]
for i in range(len(thousand)):
y_values=img_data[thousand.Y.tolist()[i]].tolist()
x_min,x_max=thousand['$X_{heights}$'].tolist()[i]-10,thousand['$X_{heights}$'].tolist()[i]+10
if abs(y_values[x_min])>10 and abs(y_values[x_max]>10):
not_zeros_index.append(i)

We set 10 because it is an acceptable compromise not to delete a larger set of points, but of course you are the boss!

Conclusions:

SciPy (and Python libraries in general) is a really powerful library, but it needs to be used wisely as it can not know what we want if we don’t specify it. In this article 3 tips are given in order to clean your find_peaks resulting dataframe from unwanted peaks but of course, you may want to develop some more conditions on your urgencies :)

If you liked the article and you want to know more about Machine Learning, or you just want to ask me something you can:

A. Follow me on Linkedin, where I publish all my stories
B. Subscribe to my newsletter. It will keep you updated about new stories and give you the chance to text me to receive all the corrections or doubts you may have.
C. Become a referred member, so you won’t have any “maximum number of stories for the month” and you can read whatever I (and thousands of other Machine Learning and Data Science top writer) write about the newest technology available.

Ciao :)

--

--

Piero Paialunga
Analytics Vidhya

PhD in Aerospace Engineering at the University of Cincinnati. Machine Learning Engineer @ Gen Nine, Martial Artist, Coffee Drinker, from Italy.