从Pandas数据框计算RSI指标?

nmpmafwu  于 2023-01-11  发布在  其他
关注(0)|答案(5)|浏览(365)

∮我的问题∮
我在Github上试了很多库,但是所有库都没有产生与TradingView匹配的结果,所以我按照这个link上的公式计算RSI指标,我用Excel计算,用TradingView整理结果,我知道这是绝对正确的但是,但是我没有找到用Pandas计算的方法。

公式

100
RSI = 100 - --------
             1 + RS

RS = Average Gain / Average Loss

The very first calculations for average gain and average loss are simple
14-period averages:

First Average Gain = Sum of Gains over the past 14 periods / 14.
First Average Loss = Sum of Losses over the past 14 periods / 14

The second, and subsequent, calculations are based on the prior averages
and the current gain loss:

Average Gain = [(previous Average Gain) x 13 + current Gain] / 14.
Average Loss = [(previous Average Loss) x 13 + current Loss] / 14.

预期结果

close   change     gain     loss     avg_gian    avg_loss        rs  \
0    4724.89      NaN      NaN      NaN          NaN         NaN       NaN   
1    4378.51  -346.38     0.00   346.38          NaN         NaN       NaN   
2    6463.00  2084.49  2084.49     0.00          NaN         NaN       NaN   
3    9838.96  3375.96  3375.96     0.00          NaN         NaN       NaN   
4   13716.36  3877.40  3877.40     0.00          NaN         NaN       NaN   
5   10285.10 -3431.26     0.00  3431.26          NaN         NaN       NaN   
6   10326.76    41.66    41.66     0.00          NaN         NaN       NaN   
7    6923.91 -3402.85     0.00  3402.85          NaN         NaN       NaN   
8    9246.01  2322.10  2322.10     0.00          NaN         NaN       NaN   
9    7485.01 -1761.00     0.00  1761.00          NaN         NaN       NaN   
10   6390.07 -1094.94     0.00  1094.94          NaN         NaN       NaN   
11   7730.93  1340.86  1340.86     0.00          NaN         NaN       NaN   
12   7011.21  -719.72     0.00   719.72          NaN         NaN       NaN   
13   6626.57  -384.64     0.00   384.64          NaN         NaN       NaN   
14   6371.93  -254.64     0.00   254.64   931.605000  813.959286  1.144535   
15   4041.32 -2330.61     0.00  2330.61   865.061786  922.291480  0.937948   
16   3702.90  -338.42     0.00   338.42   803.271658  880.586374  0.912201   
17   3434.10  -268.80     0.00   268.80   745.895111  836.887347  0.891273   
18   3813.69   379.59   379.59     0.00   719.730460  777.109680  0.926163   
19   4103.95   290.26   290.26     0.00   689.053999  721.601845  0.954895   
20   5320.81  1216.86  1216.86     0.00   726.754428  670.058856  1.084613   
21   8555.00  3234.19  3234.19     0.00   905.856968  622.197509  1.455899   
22  10854.10  2299.10  2299.10     0.00  1005.374328  577.754830  1.740140   

       rsi_14  
0         NaN  
1         NaN  
2         NaN  
3         NaN  
4         NaN  
5         NaN  
6         NaN  
7         NaN  
8         NaN  
9         NaN  
10        NaN  
11        NaN  
12        NaN  
13        NaN  
14  53.369848  
15  48.399038  
16  47.704239  
17  47.125561  
18  48.083322  
19  48.846358  
20  52.029461  
21  59.281719  
22  63.505515

我的代码

导入

import pandas as pd
import numpy as np

加载数据

df = pd.read_csv("rsi_14_test_data.csv")
close = df['close']
print(close)

0      4724.89
1      4378.51
2      6463.00
3      9838.96
4     13716.36
5     10285.10
6     10326.76
7      6923.91
8      9246.01
9      7485.01
10     6390.07
11     7730.93
12     7011.21
13     6626.57
14     6371.93
15     4041.32
16     3702.90
17     3434.10
18     3813.69
19     4103.95
20     5320.81
21     8555.00
22    10854.10
Name: close, dtype: float64

变更

计算每行的更改

change = close.diff(1)
print(change)

0         NaN
1     -346.38
2     2084.49
3     3375.96
4     3877.40
5    -3431.26
6       41.66
7    -3402.85
8     2322.10
9    -1761.00
10   -1094.94
11    1340.86
12    -719.72
13    -384.64
14    -254.64
15   -2330.61
16    -338.42
17    -268.80
18     379.59
19     290.26
20    1216.86
21    3234.19
22    2299.10
Name: close, dtype: float64

得与失
从变化中得到得失

is_gain, is_loss = change > 0, change < 0
gain, loss = change, -change
gain[is_loss] = 0
loss[is_gain] = 0
​
gain.name = 'gain'
loss.name = 'loss'
print(loss)

0         NaN
1      346.38
2        0.00
3        0.00
4        0.00
5     3431.26
6        0.00
7     3402.85
8        0.00
9     1761.00
10    1094.94
11       0.00
12     719.72
13     384.64
14     254.64
15    2330.61
16     338.42
17     268.80
18       0.00
19       0.00
20       0.00
21       0.00
22       0.00
Name: loss, dtype: float64

计算首个平均损益

之前n行的平均值

n = 14
avg_gain = change * np.nan
avg_loss = change * np.nan
​
avg_gain[n] = gain[:n+1].mean()
avg_loss[n] = loss[:n+1].mean()
​
avg_gain.name = 'avg_gain'
avg_loss.name = 'avg_loss'
​
avg_df = pd.concat([gain, loss, avg_gain, avg_loss], axis=1)
print(avg_df)

       gain     loss  avg_gain    avg_loss
0       NaN      NaN       NaN         NaN
1      0.00   346.38       NaN         NaN
2   2084.49     0.00       NaN         NaN
3   3375.96     0.00       NaN         NaN
4   3877.40     0.00       NaN         NaN
5      0.00  3431.26       NaN         NaN
6     41.66     0.00       NaN         NaN
7      0.00  3402.85       NaN         NaN
8   2322.10     0.00       NaN         NaN
9      0.00  1761.00       NaN         NaN
10     0.00  1094.94       NaN         NaN
11  1340.86     0.00       NaN         NaN
12     0.00   719.72       NaN         NaN
13     0.00   384.64       NaN         NaN
14     0.00   254.64   931.605  813.959286
15     0.00  2330.61       NaN         NaN
16     0.00   338.42       NaN         NaN
17     0.00   268.80       NaN         NaN
18   379.59     0.00       NaN         NaN
19   290.26     0.00       NaN         NaN
20  1216.86     0.00       NaN         NaN
21  3234.19     0.00       NaN         NaN
22  2299.10     0.00       NaN         NaN

平均增益和平均损耗的第一次计算是可以的,但我不知道如何应用pandas. core. window. rolling.来计算第二次和以后的平均增益和平均损耗,因为它们在许多行和不同列中,可能是这样的:

avg_gain[n] = (avg_gain[n-1]*13 + gain[n]) / 14

我的愿望-我的问题

  • 计算和使用技术指标的最佳方法是什么?
  • 在"Pandas风格"中完成上述代码。
  • 与Pandas相比,传统的循环编码方式是否会降低性能?
2ul0zpep

2ul0zpep1#

平均增益和损耗由递归公式计算,不能用numpy矢量化,但我们可以尝试找到一个 * 解析 *(即非递归)解来计算各个元素,然后用numpy实现这样的解。
将平均增益表示为y,当前增益表示为x,我们得到y[i] = a*y[i-1] + b*x[i],其中a = 13/14b = 1/14对应于n = 14。x1c 0d1x(抱歉的图片,只是太麻烦键入它)
这可以使用cumsum(rma =移动平均值)以numpy为单位高效计算:

import pandas as pd
import numpy as np

df = pd.DataFrame({'close':[4724.89, 4378.51,6463.00,9838.96,13716.36,10285.10,
                          10326.76,6923.91,9246.01,7485.01,6390.07,7730.93,
                          7011.21,6626.57,6371.93,4041.32,3702.90,3434.10,
                          3813.69,4103.95,5320.81,8555.00,10854.10]})
n = 14

def rma(x, n, y0):
    a = (n-1) / n
    ak = a**np.arange(len(x)-1, -1, -1)
    return np.r_[np.full(n, np.nan), y0, np.cumsum(ak * x) / ak / n + y0 * a**np.arange(1, len(x)+1)]

df['change'] = df['close'].diff()
df['gain'] = df.change.mask(df.change < 0, 0.0)
df['loss'] = -df.change.mask(df.change > 0, -0.0)
df['avg_gain'] = rma(df.gain[n+1:].to_numpy(), n, np.nansum(df.gain.to_numpy()[:n+1])/n)
df['avg_loss'] = rma(df.loss[n+1:].to_numpy(), n, np.nansum(df.loss.to_numpy()[:n+1])/n)
df['rs'] = df.avg_gain / df.avg_loss
df['rsi_14'] = 100 - (100 / (1 + df.rs))

df.round(2)的输出:

close   change     gain     loss  avg_gain  avg_loss    rs    rsi  rsi_14
0      4724.89      NaN      NaN      NaN       NaN       NaN   NaN    NaN     NaN
1      4378.51  -346.38     0.00   346.38       NaN       NaN   NaN    NaN     NaN
2      6463.00  2084.49  2084.49     0.00       NaN       NaN   NaN    NaN     NaN
3      9838.96  3375.96  3375.96     0.00       NaN       NaN   NaN    NaN     NaN
4     13716.36  3877.40  3877.40     0.00       NaN       NaN   NaN    NaN     NaN
5     10285.10 -3431.26     0.00  3431.26       NaN       NaN   NaN    NaN     NaN
6     10326.76    41.66    41.66     0.00       NaN       NaN   NaN    NaN     NaN
7      6923.91 -3402.85     0.00  3402.85       NaN       NaN   NaN    NaN     NaN
8      9246.01  2322.10  2322.10     0.00       NaN       NaN   NaN    NaN     NaN
9      7485.01 -1761.00     0.00  1761.00       NaN       NaN   NaN    NaN     NaN
10     6390.07 -1094.94     0.00  1094.94       NaN       NaN   NaN    NaN     NaN
11     7730.93  1340.86  1340.86     0.00       NaN       NaN   NaN    NaN     NaN
12     7011.21  -719.72     0.00   719.72       NaN       NaN   NaN    NaN     NaN
13     6626.57  -384.64     0.00   384.64       NaN       NaN   NaN    NaN     NaN
14     6371.93  -254.64     0.00   254.64    931.61    813.96  1.14  53.37   53.37
15     4041.32 -2330.61     0.00  2330.61    865.06    922.29  0.94  48.40   48.40
16     3702.90  -338.42     0.00   338.42    803.27    880.59  0.91  47.70   47.70
17     3434.10  -268.80     0.00   268.80    745.90    836.89  0.89  47.13   47.13
18     3813.69   379.59   379.59     0.00    719.73    777.11  0.93  48.08   48.08
19     4103.95   290.26   290.26     0.00    689.05    721.60  0.95  48.85   48.85
20     5320.81  1216.86  1216.86     0.00    726.75    670.06  1.08  52.03   52.03
21     8555.00  3234.19  3234.19     0.00    905.86    622.20  1.46  59.28   59.28
22    10854.10  2299.10  2299.10     0.00   1005.37    577.75  1.74  63.51   63.51

关于您最后一个关于性能的问题:python /panda中的显式循环很糟糕,尽可能避免它们。如果你不能,试试cython or numba
为了说明这一点,我将我的numpy解决方案与dimitris_ps的loop solution进行了一个小的比较:

import pandas as pd
import numpy as np
import timeit

mult = 1        # length of dataframe = 23 * mult
number = 1000   # number of loop for timeit

df0 = pd.DataFrame({'close':[4724.89, 4378.51,6463.00,9838.96,13716.36,10285.10,
                          10326.76,6923.91,9246.01,7485.01,6390.07,7730.93,
                          7011.21,6626.57,6371.93,4041.32,3702.90,3434.10,
                          3813.69,4103.95,5320.81,8555.00,10854.10] * mult })
n = 14

def rsi_np():
    # my numpy solution from above
    return df
    
def rsi_loop():
    # loop solution https://stackoverflow.com/a/57008625/3944322
    # without the wrong alternative calculation of df['avg_gain'][14]
    return df

df = df0.copy()
time_np = timeit.timeit('rsi_np()', globals=globals(), number = number) / 1000 * number

df = df0.copy()
time_loop = timeit.timeit('rsi_loop()', globals=globals(), number = number) / 1000 * number

print(f'rows\tnp\tloop\n{len(df0)}\t{time_np:.1f}\t{time_loop:.1f}')

assert np.allclose(rsi_np(), rsi_loop(), equal_nan=True)

结果(毫秒/循环):

rows    np    loop
23      4.9   9.2
230     5.0   112.3
2300    5.5   1122.7

因此,即使对于8行(第15...22行),循环解决方案所需的时间也是numpy解决方案的两倍。Numpy的伸缩性很好,而循环解决方案对于大型数据集不可行。

j7dteeu8

j7dteeu82#

有一个更简单的方法,包裹塔利布。

import talib   
close = df['close']
rsi = talib.RSI(close, timeperiod=14)

如果你想布林线去与您的RSI,这是很容易的。

upperBB, middleBB, lowerBB = talib.BBANDS(close, timeperiod=20, nbdevup=2, nbdevdn=2, matype=0)

您可以使用RSI的布林线来代替70和30的固定参考水平。

upperBBrsi, MiddleBBrsi, lowerBBrsi = talib.BBANDS(rsi, timeperiod=50, nbdevup=2, nbdevdn=2, matype=0)

最后,您可以使用%B钙化标准化RSI。

normrsi = (rsi - lowerBBrsi) / (upperBBrsi - lowerBBrsi)

关于Talib https://mrjbq7.github.io/ta-lib/的信息
信息在布林线https://www.BollingerBands.com

syqv5f0l

syqv5f0l3#

这里有一个选择。
我只会提到你的第二颗子弹

# libraries required
import pandas as pd
import numpy as np

# create dataframe
df = pd.DataFrame({'close':[4724.89, 4378.51,6463.00,9838.96,13716.36,10285.10,
                          10326.76,6923.91,9246.01,7485.01,6390.07,7730.93,
                          7011.21,6626.57,6371.93,4041.32,3702.90,3434.10,
                          3813.69,4103.95,5320.81,8555.00,10854.10]})

df['change'] = df['close'].diff(1) # Calculate change

# calculate gain / loss from every change
df['gain'] = np.select([df['change']>0, df['change'].isna()], 
                       [df['change'], np.nan], 
                       default=0) 
df['loss'] = np.select([df['change']<0, df['change'].isna()], 
                       [-df['change'], np.nan], 
                       default=0)

# create avg_gain /  avg_loss columns with all nan
df['avg_gain'] = np.nan 
df['avg_loss'] = np.nan

n = 14 # what is the window

# keep first occurrence of rolling mean
df['avg_gain'][n] = df['gain'].rolling(window=n).mean().dropna().iloc[0] 
df['avg_loss'][n] = df['loss'].rolling(window=n).mean().dropna().iloc[0]
# Alternatively
df['avg_gain'][n] = df.loc[:n, 'gain'].mean()
df['avg_loss'][n] = df.loc[:n, 'loss'].mean()

# This is not a pandas way, looping through the pandas series, but it does what you need
for i in range(n+1, df.shape[0]):
    df['avg_gain'].iloc[i] = (df['avg_gain'].iloc[i-1] * (n - 1) + df['gain'].iloc[i]) / n
    df['avg_loss'].iloc[i] = (df['avg_loss'].iloc[i-1] * (n - 1) + df['loss'].iloc[i]) / n

# calculate rs and rsi
df['rs'] = df['avg_gain'] / df['avg_loss']
df['rsi'] = 100 - (100 / (1 + df['rs'] ))
ttcibm8c

ttcibm8c4#

这是rsi代码,替换所有带有“aa”的内容:

import pandas as pd
rsi_period = 14
df = pd.Series(coinaalist)
chg = df.diff(1)
gain = chg.mask(chg<0,0)
loss = chg.mask(chg>0,0)
avg_gain = gain.ewm(com = rsi_period-1,min_periods=rsi_period).mean()
avg_loss = loss.ewm(com = rsi_period-1,min_periods=rsi_period).mean()
rs = abs(avg_gain / avg_loss)
crplaa = 100 - (100/(1+rs))
coinaarsi = crplaa.iloc[-1]
r7knjye2

r7knjye25#

如果要使用本地Pandas叫声计算时间序列的RSI,可以使用以下单行代码:

n=14
df['rsi14'] = 100 - (100 / (1 + df['Close'].diff(1).mask(df['Close'].diff(1) < 0, 0).ewm(alpha=1/n, adjust=False).mean() / df['Close'].diff(1).mask(df['Close'].diff(1) > 0, -0.0).abs().ewm(alpha=1/n, adjust=False).mean()))

它比 numpy 结果更快(毫秒/循环):

rows    np      loop    native
23      1.0     1.3     0.8
230     1.1     1.4     0.9
2300    1.1     1.3     0.9
23000   3.4     1.8     1.2

相关问题