我正在尝试为组填充缺少的值(区号、商店名称、商品名称、日期、销售额)。对于每个组,我需要用52周的数据填写销售金额。我需要用ffill(df.ffill())和bfill(df.bfill())创建两个不同的列,然后我需要用ffill&bfill/2对新创建的列求和以获得结果。
area_code shop_name item_name week_date sales_amount
101 Global Market Mango Fruits 6/3/2018 5.13
101 Global Market Mango Fruits 6/10/2018 nan
101 Global Market Mango Fruits 6/17/2018 7.13
101 Global Market Chips 6/3/2018 5
101 Global Market Chips 6/10/2018 nan
102 Global Market Mango Fruits 6/3/2018 10.34
102 Global Market Mango Fruits 6/10/2018 nan
102 Global Market Chips 6/10/2018 nan
102 Global Market Chips 6/17/2018 nan
102 Global Market Chips 6/24/2018 nan
102 Global Market Potato 6/24/2018 nan
After
area_code shop_name item_name week_date sales_amount
101 Global Market Mango Fruits 6/3/2018 5.13
101 Global Market Mango Fruits 6/10/2018 6.13
101 Global Market Mango Fruits 6/17/2018 7.13
101 Global Market Chips 6/3/2018 5
101 Global Market Chips 6/10/2018 5
102 Global Market Mango Fruits 6/3/2018 10.34
102 Global Market Mango Fruits 6/10/2018 10.34
102 Global Market Chips 6/10/2018 Value available before this week for this group
102 Global Market Chips 6/17/2018 Value available before this week for this group
102 Global Market Chips 6/24/2018 Value available before this week for this group
102 Global Market Potato 6/24/2018 Value available before this week for this group
For example -
Week 1 10
Week 2 nan
week 3 nan
“本周前本组可用值”表示第3周、第2周的值与第1周的值相同。否则,如果第1周和第3周有数据,则根据ffill或bfill填写第2周。如果不喜欢这样,那么只需为每个组填充ffill或bfill值。
如何迭代Dataframe?
如何迭代每个组并填充值?
我试着用,但没有得到任何运气
我需要填写的一周数据从2018年6月3日开始,到2019年6月3日结束
Pandas:用每组的平均值来填充缺失的值
1条答案
按热度按时间4xrmg8kj1#
前向填充和后向填充在Pandas中非常容易,我以前也有同样的要求,我遵循这里提到的方法
https://johnpaton.net/posts/forward-fill-spark/#:~:text=the%20strategy%20to%20forward%20fill,因为%20是%20 sys之间的%20行。