pandas 添加列以计算特定条件发生的连续周数

fquxozlt 于 2023-05-12 发布在其他

关注(0)|答案(2)|浏览(126)

我有下面的df，我希望添加列“期望的结果”，这将计算连续周数的促进_类型是活跃在一个特定的商店为特定的产品代码。有什么关于下一步行动的建议吗？谢谢
| 存储码|乘积码|日期|提升型|预期结果|
| --------------|--------------|--------------|--------------|--------------|
| 1|二百二十二|2021-01-03 2021-01-03| Promo descuento| 1|
| 1|二百二十二|2021-02-28 2021-02-28 2021-02-28|普罗莫卡韦切拉|1|
| 1|二三二|2021-03-21 - 2021-03-21|促销多功能压缩机|1|
| 1|二九六|2021-01-17 2021-01-17 2021-01-17| Promo descuento| 1|
| 1|二九六|2021-01-24 2021-01-24| Promo descuento|二|
| 1|二九六|2021-01-31 - 2021 - 2021-01-31| Promo descuento|三|
| 1|二九六|2021-02-07 2021-02-07| Promo descuento|四个|
| 1|三百八十二|2021-02-07 2021-02-07| Promo descuento| 1|
| 1|六零八|2021-01- 10 2021-01-10 2021-01-10| Promo descuento| 1|
| 1|六零八|2021-01-17 2021-01-17 2021-01-17| Promo descuento|二|
| 1|六一二|2021-01-03 2021-01-03| Promo descuento| 1|
| 1|六一二|2021-01-31 - 2021 - 2021-01-31| Promo descuento| 1|

pandas

来源：https://stackoverflow.com/questions/76221944/adding-a-column-to-count-consecutive-weeks-an-specific-condition-happens

2条答案

按热度按时间

3bygqnnd1#

如果我正确理解了这个问题，这应该可以解决你的问题：将pandas导入为pd

def add_desired_outcome(df):
    # Group the data by store_code, product_code, and promotion_type
    grouped = df.groupby(['store_code', 'product_code', 'promotion_type'])['date']
    
    # Calculate the difference in days between consecutive dates within each group
    date_diff = grouped.apply(lambda x: x.diff().dt.days)
    
    # Check if the difference is not equal to 7 (i.e. not consecutive weeks)
    not_consecutive = date_diff.ne(7)
    
    # Calculate the cumulative sum within each group to create unique groups for consecutive weeks
    consecutive_groups = not_consecutive.groupby(level=0).cumsum()
    
    # Calculate the cumulative count within each group and add 1 to get the desired outcome
    desired_outcome = consecutive_groups.groupby(level=0).cumcount().add(1)
    
    # Add the desired outcome column to the DataFrame
    df['Desired Outcome'] = desired_outcome
    
    return df

我希望这有帮助！

展开查看全部

赞(0）回复(0）举报 2023-05-12

kxeu7u2r2#

首先，让我们将将要使用的数据复制到另一个dataframe。让我们也创建一个零列，稍后使用：

>>> df_data = df[["date", "product_code", "promotion_type"]].copy()
>>> df_data["zeros"] = 0
         date  product_code     promotion_type  zeros
0  2021-01-03           222    Promo descuento      0
1  2021-02-28           222     Promo cabecera      0
2  2021-03-21           232  Promo multicompra      0
3  2021-01-17           296    Promo descuento      0
4  2021-01-24           296    Promo descuento      0
5  2021-01-31           296    Promo descuento      0
6  2021-02-07           296    Promo descuento      0
7  2021-02-07           382    Promo descuento      0
8  2021-01-10           608    Promo descuento      0
9  2021-01-17           608    Promo descuento      0
10 2021-01-03           612    Promo descuento      0
11 2021-01-31           612    Promo descuento      0

为了便于计算行之间的差异，让我们将"promotion_type"列转换为数字：

>>> df_data["promotion_type"] = df_data["promotion_type"].rank(method='dense')
         date  product_code  promotion_type  zeros
0  2021-01-03           222             2.0      0
1  2021-02-28           222             1.0      0
2  2021-03-21           232             3.0      0
3  2021-01-17           296             2.0      0
4  2021-01-24           296             2.0      0
5  2021-01-31           296             2.0      0
6  2021-02-07           296             2.0      0
7  2021-02-07           382             2.0      0
8  2021-01-10           608             2.0      0
9  2021-01-17           608             2.0      0
10 2021-01-03           612             2.0      0
11 2021-01-31           612             2.0      0

由于您希望 * 针对每个产品 * 执行此操作，因此需要使用groupby。然后，我们可以diff() this来获得组中连续行之间的差异。

>>> groups = df_data.groupby("product_code")
>>> group_diff = groups.diff()
      date  promotion_type  zeros
0      NaT             NaN    NaN
1  56 days            -1.0    0.0
2      NaT             NaN    NaN
3      NaT             NaN    NaN
4   7 days             0.0    0.0
5   7 days             0.0    0.0
6   7 days             0.0    0.0
7      NaT             NaN    NaN
8      NaT             NaN    NaN
9   7 days             0.0    0.0
10     NaT             NaN    NaN
11 28 days             0.0    0.0

现在，请注意，我们只想为那些group_diff["date"]为7 days且group_diff["promotion_type"]为零的行增加df["Desired Outcome"]的值。让我们将这些行的df_data的zeros列设置为1：

>>> filt = (group_diff["date"] == pd.Timedelta(7, 'd')) & (group_diff["promotion_type"] == 0)
>>> df_data.loc[filt, "zeros"] = 1
         date  product_code  promotion_type  zeros
0  2021-01-03           222             2.0      0
1  2021-02-28           222             1.0      0
2  2021-03-21           232             3.0      0
3  2021-01-17           296             2.0      0
4  2021-01-24           296             2.0      1
5  2021-01-31           296             2.0      1
6  2021-02-07           296             2.0      1
7  2021-02-07           382             2.0      0
8  2021-01-10           608             2.0      0
9  2021-01-17           608             2.0      1
10 2021-01-03           612             2.0      0
11 2021-01-31           612             2.0      0

最后，让我们再次对df_data进行分组，这次取所有组的累积和：

>>> desired_outcome = df_data[["product_code", "zeros"]].groupby("product_code").cumsum()
    zeros
0       0
1       0
2       0
3       0
4       1
5       2
6       3
7       0
8       0
9       1
10      0
11      0

请注意，这是我们需要的期望结果，除了我们需要添加一个：

>>> df["Desired Outcome"] = desired_outcome + 1

这给我们留下了想要的结果：

store_code  product_code       date     promotion_type  Desired Outcome
0            1           222 2021-01-03    Promo descuento                1
1            1           222 2021-02-28     Promo cabecera                1
2            1           232 2021-03-21  Promo multicompra                1
3            1           296 2021-01-17    Promo descuento                1
4            1           296 2021-01-24    Promo descuento                2
5            1           296 2021-01-31    Promo descuento                3
6            1           296 2021-02-07    Promo descuento                4
7            1           382 2021-02-07    Promo descuento                1
8            1           608 2021-01-10    Promo descuento                1
9            1           608 2021-01-17    Promo descuento                2
10           1           612 2021-01-03    Promo descuento                1
11           1           612 2021-01-31    Promo descuento                1

展开查看全部

赞(0）回复(0）举报 2023-05-12

我来回答

pandas 添加列以计算特定条件发生的连续周数

2条答案

相关问题

热门标签

最新问答