优化pandas Dataframe 计算,无需循环遍历行

hec6srdp  于 2023-04-10  发布在  其他
关注(0)|答案(1)|浏览(143)

我使用Pandas计算资本后,每一个贸易如下,但不知道是否有任何其他方法来计算这没有循环通过每一行?提前感谢!!!

逻辑:

trade = 1: buy
trade = -1: sell

diff: <Open> price of sell order - <Open> price of buy order, only calculate after each sell
position: capital/<Open> price, resets after each sell
gain = diff * position, only calculate after each sell
gain_c = gain_c + gain
capital = capital + gain

我使用的代码:

for i in range(1, len(df_trades)):
    if df_trades.loc[i, 'TRADE'] != -1: # BUY order: set POSITION for buy order
        df_trades.loc[i, 'POSITION'] = df_trades.loc[i,'CAPITAL']//df_trades.loc[i,'<Open>']
        df_trades.loc[i,'GAIN'] = 0
        df_trades.loc[i,'GAIN_C'] = df_trades.loc[i -1,'GAIN_C']
    if df_trades.loc[i, 'TRADE'] == -1: # SELL order: recalculate Capital, gain, gain_c
        df_trades.loc[i, 'POSITION'] = 0
        df_trades.loc[i,'DIFF'] = df_trades.loc[i,'<Open>'] - df_trades.loc[i-1,'<Open>']            
        df_trades.loc[i,'GAIN'] = df_trades.loc[i,'DIFF'] * df_trades.loc[i-1,'POSITION']
        df_trades.loc[i,'GAIN_C'] = df_trades.loc[i-1,'GAIN_C'] + df_trades.loc[i,'GAIN']
        df_trades.loc[i,'CAPITAL'] = df_trades.loc[i-1,'CAPITAL'] + df_trades.loc[i,'GAIN']

预期输出:

<Open> TRADE    CAPITAL  POSITION      GAIN    GAIN_C  DIFF
  7.17   1.0  100000.00   13947.0      0.00      0.00   NaN
  7.20  -1.0  100418.41       0.0    418.41    418.41  0.03
  7.19   1.0  100418.41   13966.0      0.00    418.41   NaN
  8.96  -1.0  125138.23       0.0  24719.82  25138.23  1.77
  8.99   1.0  125138.23   13919.0      0.00  25138.23   NaN
vsdwdz23

vsdwdz231#

您可以定义一个函数,它有一个行输入[和输出],.apply(而不是使用for循环)跨列,如df_trades = df_trades.apply(calculate_capital, axis=1, from_df=df_trades),其中calculate_capital定义为

def calculate_capital(row, from_df):
    if row.name==0: return row
    prev_row = from_df.loc[row.name-1]
    if row['TRADE'] != -1: # BUY order: set POSITION for buy order
        row['POSITION'] = row['CAPITAL']//row['<Open>']
        row['GAIN'] = 0
        row['GAIN_C'] = prev_row['GAIN_C']
    if row['TRADE'] == -1: # SELL order: recalculate Capital, gain, gain_c
        row['POSITION'] = 0
        row['DIFF'] = row['<Close>'] - prev_row['<Open>']            
        row['GAIN'] = row['DIFF'] * prev_row['POSITION']
        row['GAIN_C'] = prev_row['GAIN_C'] + row['GAIN']
        row['CAPITAL'] = prev_row['CAPITAL'] + row['GAIN']
    return row

示例:如果我将示例数据重新创建为df_trades,并使用

df_trades = pd.DataFrame([l.split() for l in '''
  7.17   1.0  100000.00   13947.0      0.00      0.00   NaN
  7.20  -1.0  100418.41       0.0    418.41    418.41  0.03
  7.19   1.0  100418.41   13966.0      0.00    418.41   NaN
  8.96  -1.0  125138.23       0.0  24719.82  25138.23  1.77
  8.99   1.0  125138.23   13919.0      0.00  25138.23   NaN
'''.splitlines() if l.strip()], columns='''
  <Open> TRADE    CAPITAL  POSITION      GAIN    GAIN_C  DIFF
'''.split())
df_trades = df_trades.assign(**{'<Close>': 8})[[
    '<Open>','<Close>', 'TRADE','CAPITAL','POSITION', 'GAIN','GAIN_C','DIFF'
]].applymap(float)

(我添加并使用8 s填充了<Close>,因为您的循环片段指示应该有这样的列。)

以下结果的更改以绿色突出显示。

回想起来,我认为应用这个函数可能会更快:

def calculate_new_row(row):
    if row.name==0: return row
    if row['TRADE'] != -1: # BUY order: set POSITION for buy order
        row['POSITION'] = row['CAPITAL']//row['<Open>']
        row['GAIN'] = 0
        row['GAIN_C'] = row['GAIN_C_prev']
    if row['TRADE'] == -1: # SELL order: recalculate Capital, gain, gain_c
        row['POSITION'] = 0
        row['DIFF'] = row['<Close>'] - row['<Open>_prev']            
        row['GAIN'] = row['DIFF'] * row['POSITION_prev']
        row['GAIN_C'] = row['GAIN_C_prev'] + row['GAIN']
        row['CAPITAL'] = row['CAPITAL_prev'] + row['GAIN']
    return row

但是在将df_trades(通过.shift)连接到移位版本(view example)之后

df_trades = df_trades.join(
    df_trades.shift(), rsuffix='_prev').apply(
        calculate_new_row, axis=1)[df_trades.columns]

相关问题