python 合并PandasDataFrame中特定列中具有相同ID值的特定行

5ktev3wc  于 2022-11-21  发布在  Python
关注(0)|答案(2)|浏览(354)

我有一个DataFramedf1,其中IDAmount在特定的Dates上。我试图对具有相同ID值的两个特定行的Amount求和。

df1:
    Date        ID      Amount
0   2022-01-02  1200    10.0
1   2022-01-02  1200    1.0
2   2022-01-02  1400    12.0
3   2022-01-02  1500    11.0
4   2022-01-03  1300    12.5
5   2022-01-03  1300    0.5
6   2022-01-03  1500    12.0

这将是所需的输出:

df1:
    Date        ID      Amount
0   2022-01-02  1200    11 <-- 10+1
1   2022-01-02  1200    0  <-- -1
2   2022-01-02  1400    12
3   2022-01-02  1500    11
4   2022-01-03  1300    13 <-- 12.5+0.5
5   2022-01-03  1300    0  <-- -0.5
6   2022-01-03  1500    12

我尝试使用np.where()来替换Amount,其中shiftedID值等于ID值。
再现性:

import pandas as pd
df1 = pd.DataFrame({
    'Date':['2022-01-02', '2022-01-02', '2022-01-02', '2022-01-02', '2022-01-03', '2022-01-03', '2022-01-03'],
    'ID':[1200, 1200, 1400, 1500, 1300, 1300, 1500],
    'Amount':[10, 1, 12, 11, 12.5, 0.5, 12]})

非常感谢您的帮助!

tzxcd3kk

tzxcd3kk1#

如果我没理解错您的问题,它看起来像是一个事务数据,您需要按[Date,ID]分组。
如果是这样,则可以通过以下方式实现:

df1["Amount"] = df1.groupby(["Date", "ID"])["Amount"].transform(lambda x: [x.sum() if i==0 else 0 for i,_ in enumerate(x)])

完整示例。我在数据集的末尾添加了一些额外的数据,只是为了测试超过2个条目的拐角条件:

import pandas as pd

df1 = pd.DataFrame({
    'Date':['2022-01-02', '2022-01-02', '2022-01-02', '2022-01-02', '2022-01-03', '2022-01-03', '2022-01-03', '2022-01-04', '2022-01-04', '2022-01-04'],
    'ID':[1200, 1200, 1400, 1500, 1300, 1300, 1500, 1500, 1500, 1500],
    'Amount':[10, 1, 12, 11, 12.5, 0.5, 12, 10, 3, 5]})

df1["Amount"] = df1.groupby(["Date", "ID"])["Amount"].transform(lambda x: [x.sum() if i==0 else 0 for i,_ in enumerate(x)])

print(df1)

[Out]:
         Date    ID  Amount
0  2022-01-02  1200    11.0
1  2022-01-02  1200     0.0
2  2022-01-02  1400    12.0
3  2022-01-02  1500    11.0
4  2022-01-03  1300    13.0
5  2022-01-03  1300     0.0
6  2022-01-03  1500    12.0
7  2022-01-04  1500    18.0
8  2022-01-04  1500     0.0
9  2022-01-04  1500     0.0
e3bfsja2

e3bfsja22#

让我们试试下面的代码:

Amount=[None]*len(df1)

for i in range(1, len(df1)):

    if df1['ID'][i] == df1['ID'][i-1]:

        Amount[i]   = df1['Amount'][i] - df1['Amount'][i]
        Amount[i-1] = df1['Amount'][i] + df1['Amount'][i-1]

    else:

        Amount[i] = df1['Amount'][i]

df1['Amount']=Amount

输出

>>> df1
        Date    ID  Amount
0   2022-01-02  1200    11.0
1   2022-01-02  1200    0.0
2   2022-01-02  1400    12.0
3   2022-01-02  1500    11.0
4   2022-01-03  1300    13.0
5   2022-01-03  1300    0.0
6   2022-01-03  1500    12.0

相关问题