Pandas图:每组两列向前填充缺失值

4xy9mtcn  于 2023-01-28  发布在  其他
关注(0)|答案(1)|浏览(167)

我有分组级别为['颜色','水果','日期','值']的数据。

data = pd.DataFrame({'color': ['Green','Green', 'Green', 'Green', 'Red', 'Red'], 
                    'fruit' : ['Banana', 'Banana', 'Apple', 'Apple', 'Banana', 'Apple'],
                    'date': ['2011-01-01', '2011-01-02', '2011-01-01', '2011-01-02', '2011-02-01', '2011-02-01'],
                    'value': [ 1, np.nan, np.nan, 2, 3 , np.nan]})

Output:

Class   fruit   date    value
0   Green   Banana  2011-01-01  1.0
1   Green   Banana  2011-01-02  NaN
2   Green   Apple   2011-01-01  NaN
3   Green   Apple   2011-01-02  2.0
4   Yellow  Banana  2011-02-01  3.0
5   Yellow  Apple   2011-02-01  NaN

我需要填充“值”,而对于日期我们没有数据。因此,此填充将仅限于[“颜色”,“水果”]级别。
我尝试使用df = df.groupby(['color', 'fruit', 'date'])['value'].mean().replace(to_replace=0, method='ffill')填充,但这会将数据溢出到下一个关联的[color,fruit]组

Expected Output:

Class   fruit   date    value
0   Green   Banana  2011-01-01  1.0
1   Green   Banana  2011-01-02  1.0
2   Green   Apple   2011-01-01  NaN
3   Green   Apple   2011-01-02  2.0
4   Yellow  Banana  2011-02-01  3.0
5   Yellow  Apple   2011-02-01  NaN
y1aodyip

y1aodyip1#

您可以将GroupBy.cumcountpandas.Series.ffill一起使用:

m = data.groupby(["color", "fruit"]).cumcount().astype(bool)

data["value"] = data["value"].ffill().where(m, data["value"])

或者如@*Mustafa艾丹 * 所述,只需使用GroupBy.ffill

data["value"] = data.groupby(["color", "fruit"])["value"].ffill()

输出:

print(data)

   color   fruit        date  value
0  Green  Banana  2011-01-01    1.0
1  Green  Banana  2011-01-02    1.0
2  Green   Apple  2011-01-01    NaN
3  Green   Apple  2011-01-02    2.0
4    Red  Banana  2011-02-01    3.0
5    Red   Apple  2011-02-01    NaN

相关问题