pandas 使用时间戳中的金额列的总和创建 Dataframe

s4chpxco  于 2023-06-28  发布在  其他
关注(0)|答案(2)|浏览(128)

我有一个dataframe,看起来像这样:

CCY Pair        Time    Amt   
0     EURUSD    13/05/2023  1000    
1     EURUSD    13/05/2023  2000    
2     EURUSD    14/05/2023  3000   
3     EURUSD    14/05/2023  5000    
4     GBPEUR    15/05/2023  4000

我想对时间列求和,这样数据框看起来就像这样:

CCY Pair        Time    Amt   AmtSum
0     EURUSD    13/05/2023  1000  3000   
1     EURUSD    14/05/2023  3000  8000    
2     GBPEUR    15/05/2023  4000  4000

我的代码似乎不能正确地计算金额:

df2= pd.to_datetime(df['Time'])

StartDate = df['Time']
EndDate= StartDate.iloc[::-1]

dfx=df
dfx['StartDate'] = StartDate
dfx['EndDate'] = EndDate

dfx['AmtSum'] = dfx.apply(lambda x: df.loc[(df.Time >= x.StartDate) & 
                                            (df.Time <= x.EndDate), 'Amt'].sum(), axis=1)
dxxyhpgq

dxxyhpgq1#

您可以尝试.groupby() + .agg()

x = df.groupby('Time').agg({'CCY Pair':'first', 'Amt': ['first', 'sum']})
x = x.droplevel(1, axis=1).reset_index()
print(x)

图纸:

Time CCY Pair   Amt   Amt
0  13/05/2023   EURUSD  1000  3000
1  14/05/2023   EURUSD  3000  8000
2  15/05/2023   GBPEUR  4000  4000

或者,如果您想按Time/CCY Pair分组:

x = df.groupby(['Time', 'CCY Pair']).agg({'Amt': ['first', 'sum']})
x = x.droplevel(1, axis=1).reset_index()
print(x)
r7s23pms

r7s23pms2#

看起来你想把你的数据框按'CCY Pair'和'Time'列分组,然后对'Amt'列求和。您可以使用pandas中的groupby()函数来实现这一点。以下是如何修改代码:

import pandas as pd

# Your initial dataframe
data = {'CCY Pair': ['EURUSD', 'EURUSD', 'EURUSD', 'EURUSD', 'GBPEUR'],
        'Time': ['13/05/2023', '13/05/2023', '14/05/2023', '14/05/2023', '15/05/2023'],
        'Amt': [1000, 2000, 3000, 5000, 4000]}

df = pd.DataFrame(data)

# Convert 'Time' column to datetime format
df['Time'] = pd.to_datetime(df['Time'], format='%d/%m/%Y')

# Group by 'CCY Pair' and 'Time' columns and sum the 'Amt' column
df_sum = df.groupby(['CCY Pair', 'Time'], as_index=False).sum()

# Rename the columns
df_sum.columns = ['CCY Pair', 'Time', 'AmtSum']

# Merge the original dataframe with the summed dataframe
result = df.merge(df_sum, on=['CCY Pair', 'Time'])

# Drop the duplicate rows
result = result.drop_duplicates(subset=['CCY Pair', 'Time'])

# Reorder the columns
result = result[['CCY Pair', 'Time', 'Amt', 'AmtSum']]

print(result)

这段代码会给予你想要的输出:

CCY Pair       Time   Amt  AmtSum
0   EURUSD 2023-05-13  1000    3000
2   EURUSD 2023-05-14  3000    8000
4   GBPEUR 2023-05-15  4000    4000

相关问题