pandas 添加列以计算特定条件发生的连续周数

fquxozlt  于 2023-05-12  发布在  其他
关注(0)|答案(2)|浏览(123)

我有下面的df,我希望添加列“期望的结果”,这将计算连续周数的促进_类型是活跃在一个特定的商店为特定的产品代码。有什么关于下一步行动的建议吗?谢谢
| 存储码|乘积码|日期|提升型|预期结果|
| --------------|--------------|--------------|--------------|--------------|
| 1|二百二十二|2021-01-03 2021-01-03| Promo descuento| 1|
| 1|二百二十二|2021-02-28 2021-02-28 2021-02-28|普罗莫卡韦切拉|1|
| 1|二三二|2021-03-21 - 2021-03-21|促销多功能压缩机|1|
| 1|二九六|2021-01-17 2021-01-17 2021-01-17| Promo descuento| 1|
| 1|二九六|2021-01-24 2021-01-24| Promo descuento|二|
| 1|二九六|2021-01-31 - 2021 - 2021-01-31| Promo descuento|三|
| 1|二九六|2021-02-07 2021-02-07| Promo descuento|四个|
| 1|三百八十二|2021-02-07 2021-02-07| Promo descuento| 1|
| 1|六零八|2021-01- 10 2021-01-10 2021-01-10| Promo descuento| 1|
| 1|六零八|2021-01-17 2021-01-17 2021-01-17| Promo descuento|二|
| 1|六一二|2021-01-03 2021-01-03| Promo descuento| 1|
| 1|六一二|2021-01-31 - 2021 - 2021-01-31| Promo descuento| 1|

3bygqnnd

3bygqnnd1#

如果我正确理解了这个问题,这应该可以解决你的问题:将pandas导入为pd

  1. def add_desired_outcome(df):
  2. # Group the data by store_code, product_code, and promotion_type
  3. grouped = df.groupby(['store_code', 'product_code', 'promotion_type'])['date']
  4. # Calculate the difference in days between consecutive dates within each group
  5. date_diff = grouped.apply(lambda x: x.diff().dt.days)
  6. # Check if the difference is not equal to 7 (i.e. not consecutive weeks)
  7. not_consecutive = date_diff.ne(7)
  8. # Calculate the cumulative sum within each group to create unique groups for consecutive weeks
  9. consecutive_groups = not_consecutive.groupby(level=0).cumsum()
  10. # Calculate the cumulative count within each group and add 1 to get the desired outcome
  11. desired_outcome = consecutive_groups.groupby(level=0).cumcount().add(1)
  12. # Add the desired outcome column to the DataFrame
  13. df['Desired Outcome'] = desired_outcome
  14. return df

我希望这有帮助!

展开查看全部
kxeu7u2r

kxeu7u2r2#

首先,让我们将将要使用的数据复制到另一个dataframe。让我们也创建一个零列,稍后使用:

  1. >>> df_data = df[["date", "product_code", "promotion_type"]].copy()
  2. >>> df_data["zeros"] = 0
  3. date product_code promotion_type zeros
  4. 0 2021-01-03 222 Promo descuento 0
  5. 1 2021-02-28 222 Promo cabecera 0
  6. 2 2021-03-21 232 Promo multicompra 0
  7. 3 2021-01-17 296 Promo descuento 0
  8. 4 2021-01-24 296 Promo descuento 0
  9. 5 2021-01-31 296 Promo descuento 0
  10. 6 2021-02-07 296 Promo descuento 0
  11. 7 2021-02-07 382 Promo descuento 0
  12. 8 2021-01-10 608 Promo descuento 0
  13. 9 2021-01-17 608 Promo descuento 0
  14. 10 2021-01-03 612 Promo descuento 0
  15. 11 2021-01-31 612 Promo descuento 0

为了便于计算行之间的差异,让我们将"promotion_type"列转换为数字:

  1. >>> df_data["promotion_type"] = df_data["promotion_type"].rank(method='dense')
  2. date product_code promotion_type zeros
  3. 0 2021-01-03 222 2.0 0
  4. 1 2021-02-28 222 1.0 0
  5. 2 2021-03-21 232 3.0 0
  6. 3 2021-01-17 296 2.0 0
  7. 4 2021-01-24 296 2.0 0
  8. 5 2021-01-31 296 2.0 0
  9. 6 2021-02-07 296 2.0 0
  10. 7 2021-02-07 382 2.0 0
  11. 8 2021-01-10 608 2.0 0
  12. 9 2021-01-17 608 2.0 0
  13. 10 2021-01-03 612 2.0 0
  14. 11 2021-01-31 612 2.0 0

由于您希望 * 针对每个产品 * 执行此操作,因此需要使用groupby。然后,我们可以diff() this来获得组中连续行之间的差异。

  1. >>> groups = df_data.groupby("product_code")
  2. >>> group_diff = groups.diff()
  3. date promotion_type zeros
  4. 0 NaT NaN NaN
  5. 1 56 days -1.0 0.0
  6. 2 NaT NaN NaN
  7. 3 NaT NaN NaN
  8. 4 7 days 0.0 0.0
  9. 5 7 days 0.0 0.0
  10. 6 7 days 0.0 0.0
  11. 7 NaT NaN NaN
  12. 8 NaT NaN NaN
  13. 9 7 days 0.0 0.0
  14. 10 NaT NaN NaN
  15. 11 28 days 0.0 0.0

现在,请注意,我们只想为那些group_diff["date"]7 daysgroup_diff["promotion_type"]为零的行增加df["Desired Outcome"]的值。让我们将这些行的df_datazeros列设置为1:

  1. >>> filt = (group_diff["date"] == pd.Timedelta(7, 'd')) & (group_diff["promotion_type"] == 0)
  2. >>> df_data.loc[filt, "zeros"] = 1
  3. date product_code promotion_type zeros
  4. 0 2021-01-03 222 2.0 0
  5. 1 2021-02-28 222 1.0 0
  6. 2 2021-03-21 232 3.0 0
  7. 3 2021-01-17 296 2.0 0
  8. 4 2021-01-24 296 2.0 1
  9. 5 2021-01-31 296 2.0 1
  10. 6 2021-02-07 296 2.0 1
  11. 7 2021-02-07 382 2.0 0
  12. 8 2021-01-10 608 2.0 0
  13. 9 2021-01-17 608 2.0 1
  14. 10 2021-01-03 612 2.0 0
  15. 11 2021-01-31 612 2.0 0

最后,让我们再次对df_data进行分组,这次取所有组的累积和:

  1. >>> desired_outcome = df_data[["product_code", "zeros"]].groupby("product_code").cumsum()
  2. zeros
  3. 0 0
  4. 1 0
  5. 2 0
  6. 3 0
  7. 4 1
  8. 5 2
  9. 6 3
  10. 7 0
  11. 8 0
  12. 9 1
  13. 10 0
  14. 11 0

请注意,这是我们需要的期望结果,除了我们需要添加一个:

  1. >>> df["Desired Outcome"] = desired_outcome + 1

这给我们留下了想要的结果:

  1. store_code product_code date promotion_type Desired Outcome
  2. 0 1 222 2021-01-03 Promo descuento 1
  3. 1 1 222 2021-02-28 Promo cabecera 1
  4. 2 1 232 2021-03-21 Promo multicompra 1
  5. 3 1 296 2021-01-17 Promo descuento 1
  6. 4 1 296 2021-01-24 Promo descuento 2
  7. 5 1 296 2021-01-31 Promo descuento 3
  8. 6 1 296 2021-02-07 Promo descuento 4
  9. 7 1 382 2021-02-07 Promo descuento 1
  10. 8 1 608 2021-01-10 Promo descuento 1
  11. 9 1 608 2021-01-17 Promo descuento 2
  12. 10 1 612 2021-01-03 Promo descuento 1
  13. 11 1 612 2021-01-31 Promo descuento 1
展开查看全部

相关问题