Calculating Relative Strength Index using Pandas

Manjunath Reddy
7 min readJun 4, 2023

--

A momentum indicator called the Relative Strength Index (RSI) compares the current price to the average high and low prices over the course of a prior trading period. This indicator assists in identifying trend reversals, price declines, and the development of bullish or bearish markets by calculating overbought or oversold position.

It employs a standardisation formula to ensure that all values are placed on a scale of 0–100 for easy comparison; a smoothing function to calm the unpredictable oscillations inside momentum oscillators; and finally, it only takes the most recent periods’ data to determine the value for the following periods.

The Pandas library for Python is an incredible utility for data analysis. It is highly optimized for dealing with large datasets, comes with a dizzying array of built-in functions, and is used by many other analytical packages as an integral data handler.

import pandas as pd

# Load our sample CSV data as DataFrame
df = pd.read_csv('./data/wilder-rsi-data.csv', header=0).set_index(['period'])

# View the result
print(df.head())

price
period
1 54.80
2 56.80
3 57.85
4 59.85
5 60.57

# View summary
print(df.info())

<class 'pandas.core.frame.DataFrame'>
Int64Index: 38 entries, 1 to 38
Data columns (total 1 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 price 38 non-null float64
dtypes: float64(1)
memory usage: 608.0 bytes

Here, we have directly loaded our data into a Pandas DataFrame using the read_csv method. Then, we used the theset_index([‘period’])command to utilise the values in the first column as the index of our DataFrame, telling Pandas that the first line of our data is the header (index 0 being the first line).

Step 1: Calculate Price Differences

The biggest benefit of adopting Pandas is that we will no longer need to manually operate our sliding window. We first notice this as we compute the price variations between each daily price and the one before it:

# Calculate Price Differences
df['diff'] = df.diff(1)

# View Results
print(df)

price diff
period
1 54.80 NaN
2 56.80 2.00
3 57.85 1.05
4 59.85 2.00
5 60.57 0.72
... ... ...
34 58.27 -1.98
35 58.70 0.43
36 57.72 -0.98
37 58.10 0.38
38 58.20 0.10

[38 rows x 2 columns]

Here we see differences calculated for all but the first row in our data, as expected.

Step 2: Calculate Gains & Losses

Another good surprise comes when we utilise the clip() function that our DataFrame object provides to calculate our gains and losses. This function instructs the computer to “keep all values above or below a given threshold.” In our example, we established a 0 higher and 0 lower barrier for our gain and loss, respectively. The following lines of code perform this:

# Calculate Avg. Gains/Losses
df['gain'] = df['diff'].clip(lower=0).round(2)
df['loss'] = df['diff'].clip(upper=0).abs().round(2)

# View Result
print(df)

period
1 54.80 NaN NaN NaN
2 56.80 2.00 2.00 0.00
3 57.85 1.05 1.05 0.00
4 59.85 2.00 2.00 0.00
5 60.57 0.72 0.72 0.00
... ... ... ... ...
34 58.27 -1.98 0.00 1.98
35 58.70 0.43 0.43 0.00
36 57.72 -0.98 0.00 0.98
37 58.10 0.38 0.38 0.00
38 58.20 0.10 0.10 0.00

[38 rows x 4 columns]

Step 3: Calculate Average Gains & Losses

It will take some imagination to put Wilder’s RSI method into practise. Using the first n-periods (14 in this case) of our historical values, we will first produce a SMA value. After that, we’ll implement the WSM once again for those values, thus we’re going to use a placeholder of NaN. Let’s first obtain our SMA:

# Get initial Averages
df['avg_gain'] = df['gain'].rolling(window=window_length, min_periods=window_length).mean()[:window_length+1]
df['avg_loss'] = df['loss'].rolling(window=window_length, min_periods=window_length).mean()[:window_length+1]

# View Result
print(df)

# View first SMA value
print(df.iloc[window_length-1: window_length+2])

price diff gain loss avg_gain avg_loss
period
14 61.37 -1.43 0.00 1.43 NaN NaN
15 62.50 1.13 1.13 0.00 0.842857 0.292857
16 62.57 0.07 0.07 0.00 NaN NaN

Here, we can see that for the first window_length periods, a SMA value has been generated, and NaN values have been appended for all subsequent periods. The WSM average implementation is what is left. Sadly, Pandas doesn’t offer a built-in way for this, so we will have to manually create it in the code below:

# Get WMS averages
# Average Gains
for i, row in enumerate(df['avg_gain'].iloc[window_length+1:]):
df['avg_gain'].iloc[i + window_length + 1] =\
(df['avg_gain'].iloc[i + window_length] *
(window_length - 1) +
df['gain'].iloc[i + window_length + 1])\
/ window_length

# Average Losses
for i, row in enumerate(df['avg_loss'].iloc[window_length+1:]):
df['avg_loss'].iloc[i + window_length + 1] =\
(df['avg_loss'].iloc[i + window_length] *
(window_length - 1) +
df['loss'].iloc[i + window_length + 1])\
/ window_length

# View initial results
print(df[window_length-1:window_length+5])

price diff gain loss avg_gain avg_loss
period
14 61.37 -1.43 0.00 1.43 NaN NaN
15 62.50 1.13 1.13 0.00 0.842857 0.292857
16 62.57 0.07 0.07 0.00 0.787653 0.271939
17 60.80 -1.77 0.00 1.77 0.731392 0.378943
18 59.37 -1.43 0.00 1.43 0.679150 0.454019
19 60.35 0.98 0.98 0.00 0.700639 0.421589

To account for our average profits and losses, two more columns have been added in this table. These figures appear to be comparable so far to those calculated using our earlier pure Python approach. We then determine our RS values.

Step 4: Calculate the RS Value

This step merely entails adding a new column to our DataFrame and assigning the values obtained by dividing the average gains by the average losses from our previous computation to that column. This one-liner is shown in the following code:

# Calculate RS Values
df['rs'] = df['avg_gain'] / df['avg_loss']

Step 5: Calculate the RSI

Here, we apply the formula rsi = 100 — (100 / (1.0 + RS)) given by Wilder in New Concepts in Technical Trading Systems using our prior RS values. The following code causes the establishment of a new RSI column as a result:

# Calculate RSI
df['rsi'] = 100 - (100 / (1.0 + df['rs']))

# View Result
print(df)

price diff gain loss avg_gain avg_loss rs rsi
period
1 54.80 NaN NaN NaN NaN NaN NaN NaN
2 56.80 2.00 2.00 0.00 NaN NaN NaN NaN
3 57.85 1.05 1.05 0.00 NaN NaN NaN NaN
4 59.85 2.00 2.00 0.00 NaN NaN NaN NaN
5 60.57 0.72 0.72 0.00 NaN NaN NaN NaN
... ... ... ... ... ... ... ... ...
34 58.27 -1.98 0.00 1.98 0.485716 0.619412 0.784157 43.951108
35 58.70 0.43 0.43 0.00 0.481736 0.575169 0.837557 45.579922
36 57.72 -0.98 0.00 0.98 0.447327 0.604085 0.740503 42.545340
37 58.10 0.38 0.38 0.00 0.442518 0.560936 0.788891 44.099456
38 58.20 0.10 0.10 0.00 0.418052 0.520869 0.802605 44.524721

[38 rows x 8 columns]

Here, we can see the fruits of our previous efforts, with our data now including the RS and RSI columns. We can once again use the ability of Pandas to save to a CSV file now that we have our data.

Step 6: Save Data

The pd.to_csv() method of Pandas gives us a way to store our extended data, just as it gave us a way to load it. This approach can be altered to suit a variety of tastes, but in this case we’ll only change the output path and leave the rest as defaults:

# Save dataframe as CSV
df.to_csv('wilder-rsi-pandas-output.csv')

Unless it is written to a buffer object, this function returns a None object. A local file with the name wilder-rsi-pandas-output.csv will then be created and saved in our current working directory. Our computations are now complete, but to make sure nothing went wrong with our implementation, we still need to double-check our data.

Step 7: Validate Results

At first inspection, our RSI numbers seem to be very similar to those determined in our earlier implementation. We’ll revalidate our data to make sure. Thankfully, we can accomplish all of this in Pandas this time.

# Validation list
WILDER_RSI_VALUES = [
74.36, 74.55, 65.75, 59.68, 61.98, 66.44, 65.75, 67.0, 71.43,
70.5, 72.14, 67.95, 60.78, 55.56, 56.71, 49.49, 48.19, 52.38,
50.0, 43.5, 45.36, 42.53, 44.14, 44.75
]

# Load in the validation data + create NaN values preceding our first RSI calculations
v = pd.DataFrame(pd.concat([pd.Series(["NaN"] * (window_length)), pd.Series(WILDER_RSI)])).reset_index(level=0).drop(['index'], axis=1)
v.index = list(range(1, len(v) + 1)) # reindex starting with 0

# Calculate differences
df['diff_rsi'] = ((df['rsi'] - v.values).abs())
df['diff_pct'] = ((df['rsi'] - v.values) / v.values * 100).abs()

# Round off for easy comparison
df['diff_rsi'] = df['diff_rsi'].apply(lambda x: round(x, 2))
df['diff_pct'] = df['diff_pct'].apply(lambda x: round(x, 2))

# View Results
print(df)

price diff gain loss ... rs rsi diff_rsi diff_pct
period ...
15 62.50 1.13 1.13 0.00 ... 2.878049 74.213836 0.15 0.20
16 62.57 0.07 0.07 0.00 ... 2.896435 74.335516 0.21 0.29
17 60.80 -1.77 0.00 1.77 ... 1.930084 65.871286 0.12 0.18
18 59.37 -1.43 0.00 1.43 ... 1.495863 59.933704 0.25 0.43
19 60.35 0.98 0.98 0.00 ... 1.661902 62.432876 0.45 0.73
20 62.35 2.00 2.00 0.00 ... 2.026822 66.962047 0.52 0.79
21 62.17 -0.18 0.00 0.18 ... 1.957584 66.188617 0.44 0.67
22 62.55 0.38 0.38 0.00 ... 2.035249 67.053772 0.05 0.08
23 64.55 2.00 2.00 0.00 ... 2.475455 71.226792 0.20 0.28
24 64.37 -0.18 0.00 0.18 ... 2.374159 70.362986 0.14 0.19
25 65.30 0.93 0.93 0.00 ... 2.601844 72.236440 0.10 0.13
26 64.42 -0.88 0.00 0.88 ... 2.111859 67.864864 0.09 0.13
27 62.90 -1.52 0.00 1.52 ... 1.563986 60.998221 0.22 0.36
28 61.60 -1.30 0.00 1.30 ... 1.262352 55.798210 0.24 0.43
29 62.05 0.45 0.45 0.00 ... 1.334247 57.159636 0.45 0.79
30 60.05 -2.00 0.00 2.00 ... 0.992659 49.815793 0.33 0.66
31 59.70 -0.35 0.00 0.35 ... 0.946968 48.638097 0.45 0.93
32 60.90 1.20 1.20 0.00 ... 1.116919 52.761538 0.38 0.73
33 60.25 -0.65 0.00 0.65 ... 1.016177 50.401189 0.40 0.80
34 58.27 -1.98 0.00 1.98 ... 0.784157 43.951108 0.45 1.04
35 58.70 0.43 0.43 0.00 ... 0.837557 45.579922 0.22 0.48
36 57.72 -0.98 0.00 0.98 ... 0.740503 42.545340 0.02 0.04
37 58.10 0.38 0.38 0.00 ... 0.788891 44.099456 0.04 0.09
38 58.20 0.10 0.10 0.00 ... 0.802605 44.524721 0.23 0.50

Here, we can see that our RSI values are pretty similar to Wilder’s but with some deviation. These variations result from the fact that our Pandas computations were performed with greater accuracy than Wilder’s original data, which was rounded to the second decimal point.

--

--

Manjunath Reddy
0 Followers

Data Analytics | Data Science | Visualization. Looking to work with some interest datasets.