pandas 在Python中,根据特定条件为每个组/id更改列条目

xiozqbni  于 2023-04-04  发布在  Python
关注(0)|答案(2)|浏览(132)

我有以下 Dataframe :

#Load the required libraries
import pandas as pd

#Create dataset
data = {'id': [1, 1, 1, 1, 1,1, 1, 1, 1, 1, 1,
               2, 2, 2, 2, 2, 2,
               3, 3, 3, 3, 3, 3,
               4, 4, 4, 4,
               5, 5, 5, 5, 5,5, 5, 5,5],
        'cycle': [1,2, 3, 4, 5,6,7,8,9,10,11,
                  1,2, 3,4,5,6,
                  1,2, 3, 4, 5,6,
                  1,2, 3, 4,
                  1,2, 3, 4, 5,6,7,8,9,],
        'Salary': [7, 7, 7,8,9,10,11,12,13,14,15,
                   4, 4, 4,4,5,6,
                   8,9,10,11,12,13,
                   8,9,10,11,
                   7, 7,9,10,11,12,13,14,15,],
        'Children': ['No', 'Yes', 'Yes', 'Yes', 'Yes', 'No','No', 'Yes', 'Yes', 'Yes', 'No',
                     'Yes', 'Yes', 'No', 'Yes', 'Yes', 'Yes', 
                     'No','Yes', 'Yes', 'No','No', 'Yes',
                     'Yes', 'No','Yes', 'Yes',
                      'No',  'Yes', 'No', 'No', 'Yes', 'Yes', 'Yes', 'Yes', 'No',],
        'Days': [123, 128, 66, 66, 120, 141, 52,96, 120, 141, 52,
                 96, 120,120, 141, 52,96,
                 15,123, 128, 66, 120, 141,
                 141,123, 128, 66,
                 123, 128, 66, 123, 128, 66, 120, 141, 52,],
        }

#Convert to dataframe
df = pd.DataFrame(data)
print("df = \n", df)

上面的dataframe看起来是这样的:

这里,每个id根据“cycle”列具有不同的周期。
id-1具有最大11个周期。
ID-2具有最大6个周期。
ID-3具有最大6个周期。
ID-4具有最大4个周期。
ID-5具有最大9个周期。
我对'cycles'有一定的阈值限制。假设cycle_threshold = 8
如果最大循环且id〈cycle_threshold,则'Days'列保持不变。否则,'Days'列将被标记为'NA'。
比如说
对于id-2,由于最大循环数为6,即〈8,因此“天数”列保持不变
然而,对于id-1,由于最大周期是11,其〉8,因此“天数”列变为“NA”。
结果如下所示:

有人能告诉我如何在Python中实现这个任务吗?

ht4b089n

ht4b089n1#

IIUC,你可以得到id每列cycle,并检查它是否大于8。如果是,将整个组屏蔽为NaN

df['Days'] = df['Days'].mask(df.groupby('id')['cycle'].transform('max').gt(8))
print(df)

    id  cycle  Salary Children   Days
0    1      1       7       No    NaN
1    1      2       7      Yes    NaN
2    1      3       7      Yes    NaN
3    1      4       8      Yes    NaN
4    1      5       9      Yes    NaN
5    1      6      10       No    NaN
6    1      7      11       No    NaN
7    1      8      12      Yes    NaN
8    1      9      13      Yes    NaN
9    1     10      14      Yes    NaN
10   1     11      15       No    NaN
11   2      1       4      Yes   96.0
12   2      2       4      Yes  120.0
13   2      3       4       No  120.0
14   2      4       4      Yes  141.0
15   2      5       5      Yes   52.0
16   2      6       6      Yes   96.0
17   3      1       8       No   15.0
18   3      2       9      Yes  123.0
19   3      3      10      Yes  128.0
20   3      4      11       No   66.0
21   3      5      12       No  120.0
22   3      6      13      Yes  141.0
23   4      1       8      Yes  141.0
24   4      2       9       No  123.0
25   4      3      10      Yes  128.0
26   4      4      11      Yes   66.0
27   5      1       7       No    NaN
28   5      2       7      Yes    NaN
29   5      3       9       No    NaN
30   5      4      10       No    NaN
31   5      5      11      Yes    NaN
32   5      6      12      Yes    NaN
33   5      7      13      Yes    NaN
34   5      8      14      Yes    NaN
35   5      9      15       No    NaN
pkwftd7m

pkwftd7m2#

使用GroupBy.transform获取每组的最大cycle,通过cycle_threshold进行比较,并在Series.where中设置NaN s:

cycle_threshold = 8

m1 = df.groupby('id')['cycle'].transform('max').lt(cycle_threshold)
df['Days'] = df['Days'].where(m1)

#alternative
df.loc[~m1, 'Days'] = np.nan
print("df = \n", df)
     id  cycle  Salary Children   Days
0    1      1       7       No    NaN
1    1      2       7      Yes    NaN
2    1      3       7      Yes    NaN
3    1      4       8      Yes    NaN
4    1      5       9      Yes    NaN
5    1      6      10       No    NaN
6    1      7      11       No    NaN
7    1      8      12      Yes    NaN
8    1      9      13      Yes    NaN
9    1     10      14      Yes    NaN
10   1     12      15       No    NaN
11   2      1       4      Yes   96.0
12   2      2       4      Yes  120.0
13   2      3       4       No  120.0
14   2      4       4      Yes  141.0
15   2      5       5      Yes   52.0
16   2      6       6      Yes   96.0
17   3      1       8       No   15.0
18   3      2       9      Yes  123.0
19   3      3      10      Yes  128.0
20   3      4      11       No   66.0
21   3      5      12       No  120.0
22   3      6      13      Yes  141.0
23   4      1       8      Yes  141.0
24   4      2       9       No  123.0
25   4      3      10      Yes  128.0
26   4      4      11      Yes   66.0
27   5      1       7       No    NaN
28   5      2       7      Yes    NaN
29   5      3       9       No    NaN
30   5      4      10       No    NaN
31   5      5      11      Yes    NaN
32   5      6      12      Yes    NaN
33   5      7      13      Yes    NaN
34   5      8      14      Yes    NaN
35   5      9      15       No    NaN

相关问题