pandas 使用多于2列和2行的groupby后的条件填充新的框架列

zu0ti5jz 于 2024-01-04 发布在其他

关注(0)|答案(2)|浏览(177)

我有下表，我需要按Col 1分组，检查Col 2是否包含Y，
如果是，则创建新列Col 4，并在Col 4的所有行中插入相应的Col 3;如果否，则在Col 4中仅插入colyCol 3
| 指数|Col1| Col2| Col3|
| --|--|--|--|
| 0 | 1 |X| ABC|
| 1 | 1 |Y| XX|
| 2 | 1 |X| QW|
| 3 | 2 |X| VB|
| 4 | 2 |X| AY|
| 5 | 3 |X| MM|
| 6 | 3 |X| YY|
| 7 | 3 |Y| XX|

所需表

| --|Col1| Col2| Col3|新栏|
| --|--|--|--|--|
| 0 | 1 |X| ABC| XX|
| 1 | 1 |Y| XX| XX|
| 2 | 1 |X| QW| XX|
| 3 | 2 |X| VB| VB|
| 4 | 2 |X| AY| AY|
| 5 | 3 |X| MM| XX|
| 6 | 3 |X| YY| XX|
| 7 | 3 |Y| XX| XX|

pandas

来源：https://stackoverflow.com/questions/77606262/fill-new-dataframe-column-using-conditions-after-groupby-using-more-than-2-colum

2条答案

按热度按时间

a11xaf1n1#

您可以在隐藏不需要的值后使用groupby_transform：

df['New_Col'] = (df['Col3'].mask(df['Col2'] != 'Y')
                           .groupby(df['Col1'])
                           .transform('first')
                           .fillna(df['Col3']))

字符串
输出量：

>>> df
   index  Col1 Col2 Col3 New_Col
0      0     1    X  ABC      XX
1      1     1    Y   XX      XX
2      2     1    X   QW      XX
3      3     2    X   VB      VB
4      4     2    X   AY      AY
5      5     3    X   MM      XX
6      6     3    X   YY      XX
7      7     3    Y   XX      XX

型
一步一步：

# Hide values
>>> out = df['Col3'].mask(df['Col2'] != 'Y')
0    NaN
1     XX
2    NaN
3    NaN
4    NaN
5    NaN
6    NaN
7     XX
Name: Col3, dtype: object

# Get the first value for each group (nan is the last value)
>>> out = out.groupby(df['Col1']).transform('first')
0      XX
1      XX
2      XX
3    None
4    None
5      XX
6      XX
7      XX
Name: Col3, dtype: object

# Fill missing values to default (Col3)
>>> out = out.fillna(df['Col3'])
0    XX
1    XX
2    XX
3    VB
4    AY
5    XX
6    XX
7    XX
Name: Col3, dtype: object

型

赞(0）回复(0）举报 2024-01-04

gdx19jrr2#

仅通过boolean indexing过滤Y行，并通过Col1使用Series.map，最后通过Series.fillna替换不匹配的值：

s = df[df['Col2'].eq('Y')].set_index('Col1')['Col3']

df['Col4'] = df['Col1'].map(s).fillna(df['Col3'])
print (df)
   index  Col1 Col2 Col3 Col4
0      0     1    X  ABC   XX
1      1     1    Y   XX   XX
2      2     1    X   QW   XX
3      3     2    X   VB   VB
4      4     2    X   AY   AY
5      5     3    X   MM   XX
6      6     3    X   YY   XX
7      7     3    Y   XX   XX

字符串
如果可能，匹配值为NaN，则修改解决方案：

print (df)
   index  Col1 Col2 Col3
0      0     1    X  ABC
1      1     1    Y  NaN
2      2     1    X   QW
3      3     2    X   VB
4      4     2    X   AY
5      5     3    X   MM
6      6     3    X   YY
7      7     3    Y   XX

s = df[df['Col2'].eq('Y')].set_index('Col1')['Col3']

df['Col4'] = np.where(df['Col1'].isin(s.index), df['Col1'].map(s), df['Col3'])
print (df)
  index  Col1 Col2 Col3 Col4
0      0     1    X  ABC  NaN
1      1     1    Y  NaN  NaN
2      2     1    X   QW  NaN
3      3     2    X   VB   VB
4      4     2    X   AY   AY
5      5     3    X   MM   XX
6      6     3    X   YY   XX
7      7     3    Y   XX   XX

型
编辑：如果每组有多个Y值，则解决方案-将每组的随机值替换为DataFrameGroupBy.sample：

m = df['Col2'].eq('Y')

s = df[m].groupby(df['Col1']).sample(1).set_index('Col1')['Col3']
df['Col4'] = df['Col3'].where(m).fillna(df['Col1'].map(s)).fillna(df['Col3'])

print (df)
   index  Col1 Col2 Col3 Col4
0      0     1    Y  ABC  ABC
1      1     1    Y   XX   XX
2      2     1    X   QW  ABC
3      3     2    X   VB   VB
4      4     2    X   AY   AY
5      5     3    X   MM   XX
6      6     3    X   YY   XX
7      7     3    Y   XX   XX

型

赞(0）回复(0）举报 2024-01-04

我来回答

pandas 使用多于2列和2行的groupby后的条件填充新的框架列

2条答案

相关问题

热门标签

最新问答