Calculating Relative Strength Index using Pandas

7 min readJun 4, 2023

A momentum indicator called the Relative Strength Index (RSI) compares the current price to the average high and low prices over the course of a prior trading period. This indicator assists in identifying trend reversals, price declines, and the development of bullish or bearish markets by calculating overbought or oversold position.

It employs a standardisation formula to ensure that all values are placed on a scale of 0–100 for easy comparison; a smoothing function to calm the unpredictable oscillations inside momentum oscillators; and finally, it only takes the most recent periods’ data to determine the value for the following periods.

The Pandas library for Python is an incredible utility for data analysis. It is highly optimized for dealing with large datasets, comes with a dizzying array of built-in functions, and is used by many other analytical packages as an integral data handler.

import pandas as pd

# Load our sample CSV data as DataFrame
df = pd.read_csv('./data/wilder-rsi-data.csv', header=0).set_index(['period'])

# View the result
print(df.head())

        price
period       
1       54.80
2       56.80
3       57.85
4       59.85
5       60.57

# View summary
print(df.info())

<class 'pandas.core.frame.DataFrame'>
Int64Index: 38 entries, 1 to 38
Data columns (total 1 columns):
 #   Column  Non-Null Count  Dtype  
---  ------  --------------  -----  
 0   price   38 non-null     float64
dtypes: float64(1)
memory usage: 608.0 bytes

Here, we have directly loaded our data into a Pandas DataFrame using the read_csv method. Then, we used the theset_index([‘period’])command to utilise the values in the first column as the index of our DataFrame, telling Pandas that the first line of our data is the header (index 0 being the first line).

Step 1: Calculate Price Differences

The biggest benefit of adopting Pandas is that we will no longer need to manually operate our sliding window. We first notice this as we compute the price variations between each daily price and the one before it:

# Calculate Price Differences
df['diff'] = df.diff(1)

# View Results
print(df)

        price  diff
period             
1       54.80   NaN
2       56.80  2.00
3       57.85  1.05
4       59.85  2.00
5       60.57  0.72
...       ...   ...
34      58.27 -1.98
35      58.70  0.43
36      57.72 -0.98
37      58.10  0.38
38      58.20  0.10

[38 rows x 2 columns]

Here we see differences calculated for all but the first row in our data, as expected.

Step 2: Calculate Gains & Losses

Another good surprise comes when we utilise the clip() function that our DataFrame object provides to calculate our gains and losses. This function instructs the computer to “keep all values above or below a given threshold.” In our example, we established a 0 higher and 0 lower barrier for our gain and loss, respectively. The following lines of code perform this:

# Calculate Avg. Gains/Losses
df['gain'] = df['diff'].clip(lower=0).round(2)
df['loss'] = df['diff'].clip(upper=0).abs().round(2)

# View Result
print(df)

period                         
1       54.80   NaN   NaN   NaN
2       56.80  2.00  2.00  0.00
3       57.85  1.05  1.05  0.00
4       59.85  2.00  2.00  0.00
5       60.57  0.72  0.72  0.00
...       ...   ...   ...   ...
34      58.27 -1.98  0.00  1.98
35      58.70  0.43  0.43  0.00
36      57.72 -0.98  0.00  0.98
37      58.10  0.38  0.38  0.00
38      58.20  0.10  0.10  0.00

[38 rows x 4 columns]

Step 3: Calculate Average Gains & Losses

It will take some imagination to put Wilder’s RSI method into practise. Using the first n-periods (14 in this case) of our historical values, we will first produce a SMA value. After that, we’ll implement the WSM once again for those values, thus we’re going to use a placeholder of NaN. Let’s first obtain our SMA:

# Get initial Averages
df['avg_gain'] = df['gain'].rolling(window=window_length, min_periods=window_length).mean()[:window_length+1]
df['avg_loss'] = df['loss'].rolling(window=window_length, min_periods=window_length).mean()[:window_length+1]

# View Result
print(df)

# View first SMA value
print(df.iloc[window_length-1: window_length+2])

        price  diff  gain  loss  avg_gain  avg_loss
period                                             
14      61.37 -1.43  0.00  1.43       NaN       NaN
15      62.50  1.13  1.13  0.00  0.842857  0.292857
16      62.57  0.07  0.07  0.00       NaN       NaN

Here, we can see that for the first window_length periods, a SMA value has been generated, and NaN values have been appended for all subsequent periods. The WSM average implementation is what is left. Sadly, Pandas doesn’t offer a built-in way for this, so we will have to manually create it in the code below:

# Get WMS averages
# Average Gains
for i, row in enumerate(df['avg_gain'].iloc[window_length+1:]):
    df['avg_gain'].iloc[i + window_length + 1] =\
        (df['avg_gain'].iloc[i + window_length] *
         (window_length - 1) +
         df['gain'].iloc[i + window_length + 1])\
        / window_length

# Average Losses
for i, row in enumerate(df['avg_loss'].iloc[window_length+1:]):
    df['avg_loss'].iloc[i + window_length + 1] =\
        (df['avg_loss'].iloc[i + window_length] *
         (window_length - 1) +
         df['loss'].iloc[i + window_length + 1])\
        / window_length

# View initial results
print(df[window_length-1:window_length+5])

        price  diff  gain  loss  avg_gain  avg_loss
period                                             
14      61.37 -1.43  0.00  1.43       NaN       NaN
15      62.50  1.13  1.13  0.00  0.842857  0.292857
16      62.57  0.07  0.07  0.00  0.787653  0.271939
17      60.80 -1.77  0.00  1.77  0.731392  0.378943
18      59.37 -1.43  0.00  1.43  0.679150  0.454019
19      60.35  0.98  0.98  0.00  0.700639  0.421589

To account for our average profits and losses, two more columns have been added in this table. These figures appear to be comparable so far to those calculated using our earlier pure Python approach. We then determine our RS values.

Step 4: Calculate the RS Value

This step merely entails adding a new column to our DataFrame and assigning the values obtained by dividing the average gains by the average losses from our previous computation to that column. This one-liner is shown in the following code:

# Calculate RS Values
df['rs'] = df['avg_gain'] / df['avg_loss']

Step 5: Calculate the RSI

Here, we apply the formula rsi = 100 — (100 / (1.0 + RS)) given by Wilder in New Concepts in Technical Trading Systems using our prior RS values. The following code causes the establishment of a new RSI column as a result:

# Calculate RSI
df['rsi'] = 100 - (100 / (1.0 + df['rs']))

# View Result
print(df)

        price  diff  gain  loss  avg_gain  avg_loss        rs        rsi
period                                                                  
1       54.80   NaN   NaN   NaN       NaN       NaN       NaN        NaN
2       56.80  2.00  2.00  0.00       NaN       NaN       NaN        NaN
3       57.85  1.05  1.05  0.00       NaN       NaN       NaN        NaN
4       59.85  2.00  2.00  0.00       NaN       NaN       NaN        NaN
5       60.57  0.72  0.72  0.00       NaN       NaN       NaN        NaN
...       ...   ...   ...   ...       ...       ...       ...        ...
34      58.27 -1.98  0.00  1.98  0.485716  0.619412  0.784157  43.951108
35      58.70  0.43  0.43  0.00  0.481736  0.575169  0.837557  45.579922
36      57.72 -0.98  0.00  0.98  0.447327  0.604085  0.740503  42.545340
37      58.10  0.38  0.38  0.00  0.442518  0.560936  0.788891  44.099456
38      58.20  0.10  0.10  0.00  0.418052  0.520869  0.802605  44.524721

[38 rows x 8 columns]

Here, we can see the fruits of our previous efforts, with our data now including the RS and RSI columns. We can once again use the ability of Pandas to save to a CSV file now that we have our data.

Step 6: Save Data

The pd.to_csv() method of Pandas gives us a way to store our extended data, just as it gave us a way to load it. This approach can be altered to suit a variety of tastes, but in this case we’ll only change the output path and leave the rest as defaults:

# Save dataframe as CSV
df.to_csv('wilder-rsi-pandas-output.csv')

Unless it is written to a buffer object, this function returns a None object. A local file with the name wilder-rsi-pandas-output.csv will then be created and saved in our current working directory. Our computations are now complete, but to make sure nothing went wrong with our implementation, we still need to double-check our data.

Step 7: Validate Results

At first inspection, our RSI numbers seem to be very similar to those determined in our earlier implementation. We’ll revalidate our data to make sure. Thankfully, we can accomplish all of this in Pandas this time.

# Validation list
WILDER_RSI_VALUES = [
    74.36, 74.55, 65.75, 59.68, 61.98, 66.44, 65.75, 67.0, 71.43,
    70.5, 72.14, 67.95, 60.78, 55.56, 56.71, 49.49, 48.19, 52.38,
    50.0, 43.5, 45.36, 42.53, 44.14, 44.75
]

# Load in the validation data + create NaN values preceding our first RSI calculations
v = pd.DataFrame(pd.concat([pd.Series(["NaN"] * (window_length)), pd.Series(WILDER_RSI)])).reset_index(level=0).drop(['index'], axis=1)
v.index = list(range(1, len(v) + 1))  # reindex starting with 0

# Calculate differences
df['diff_rsi'] = ((df['rsi'] - v.values).abs())
df['diff_pct'] = ((df['rsi'] - v.values) / v.values * 100).abs()

# Round off for easy comparison
df['diff_rsi'] = df['diff_rsi'].apply(lambda x: round(x, 2))
df['diff_pct'] = df['diff_pct'].apply(lambda x: round(x, 2))

# View Results
print(df)

        price  diff  gain  loss  ...        rs        rsi  diff_rsi  diff_pct
period                           ...                                         
15      62.50  1.13  1.13  0.00  ...  2.878049  74.213836      0.15      0.20
16      62.57  0.07  0.07  0.00  ...  2.896435  74.335516      0.21      0.29
17      60.80 -1.77  0.00  1.77  ...  1.930084  65.871286      0.12      0.18
18      59.37 -1.43  0.00  1.43  ...  1.495863  59.933704      0.25      0.43
19      60.35  0.98  0.98  0.00  ...  1.661902  62.432876      0.45      0.73
20      62.35  2.00  2.00  0.00  ...  2.026822  66.962047      0.52      0.79
21      62.17 -0.18  0.00  0.18  ...  1.957584  66.188617      0.44      0.67
22      62.55  0.38  0.38  0.00  ...  2.035249  67.053772      0.05      0.08
23      64.55  2.00  2.00  0.00  ...  2.475455  71.226792      0.20      0.28
24      64.37 -0.18  0.00  0.18  ...  2.374159  70.362986      0.14      0.19
25      65.30  0.93  0.93  0.00  ...  2.601844  72.236440      0.10      0.13
26      64.42 -0.88  0.00  0.88  ...  2.111859  67.864864      0.09      0.13
27      62.90 -1.52  0.00  1.52  ...  1.563986  60.998221      0.22      0.36
28      61.60 -1.30  0.00  1.30  ...  1.262352  55.798210      0.24      0.43
29      62.05  0.45  0.45  0.00  ...  1.334247  57.159636      0.45      0.79
30      60.05 -2.00  0.00  2.00  ...  0.992659  49.815793      0.33      0.66
31      59.70 -0.35  0.00  0.35  ...  0.946968  48.638097      0.45      0.93
32      60.90  1.20  1.20  0.00  ...  1.116919  52.761538      0.38      0.73
33      60.25 -0.65  0.00  0.65  ...  1.016177  50.401189      0.40      0.80
34      58.27 -1.98  0.00  1.98  ...  0.784157  43.951108      0.45      1.04
35      58.70  0.43  0.43  0.00  ...  0.837557  45.579922      0.22      0.48
36      57.72 -0.98  0.00  0.98  ...  0.740503  42.545340      0.02      0.04
37      58.10  0.38  0.38  0.00  ...  0.788891  44.099456      0.04      0.09
38      58.20  0.10  0.10  0.00  ...  0.802605  44.524721      0.23      0.50

Here, we can see that our RSI values are pretty similar to Wilder’s but with some deviation. These variations result from the fact that our Pandas computations were performed with greater accuracy than Wilder’s original data, which was rounded to the second decimal point.