python GroupBy Pandas计数连续零

u4dcyp6a  于 2023-03-06  发布在  Python
关注(0)|答案(3)|浏览(178)

我的输入看起来像下面的df。
我需要按列(A,B)分组,计算每个组中连续零的数量/长度,并写入新列“Zero_count”

Input:
A    B  DATE      hour  measure     
A10  1  1/1/2014    0   0       
A10  1  1/1/2014    1   0       
A10  1  1/1/2014    2   0       
A10  1  1/1/2014    3   0       
A10  2  1/1/2014    4   0       
A10  2  1/1/2014    5   1       
A10  2  1/1/2014    6   2       
A10  3  1/1/2014    7   0       
A11  1  1/1/2014    8   0       
A11  1  1/1/2014    9   0       
A11  1  1/1/2014    10  2       
A11  1  1/1/2014    11  0       
A11  1  1/1/2014    12  0       
A12  2  1/1/2014    13  1       
A12  2  1/1/2014    14  3       
A12  2  1/1/2014    15  0       
A12  4  1/1/2014    16  5       
A12  4  1/1/2014    17  0       
A12  6  1/1/2014    18  0

我尝试使用“groupby”技术来获取组,但我正在寻找组内连续零计数。我尝试使用lambda函数,但它计数零的总数,而我对重复连续零感兴趣。我希望我的输出如下所示:

Output
A    B  DATE      hour  measure Consec_zero_count
A10  1  1/1/2014    0   0       4
A10  1  1/1/2014    1   0       4
A10  1  1/1/2014    2   0       4
A10  1  1/1/2014    3   0       4
A10  2  1/1/2014    4   0       1
A10  2  1/1/2014    5   1       0
A10  2  1/1/2014    6   2       0
A10  3  1/1/2014    7   0       1
A11  1  1/1/2014    8   0       2
A11  1  1/1/2014    9   0       2
A11  1  1/1/2014    10  2       0
A11  1  1/1/2014    11  0       2
A11  1  1/1/2014    12  0       2
A12  2  1/1/2014    13  1       0
A12  2  1/1/2014    14  3       0
A12  2  1/1/2014    15  0       1
A12  4  1/1/2014    16  5       0
A12  4  1/1/2014    17  0       1
A12  6  1/1/2014    18  0       1

任何线索将不胜感激。提前感谢!

lx0bsm1f

lx0bsm1f1#

通过按ne!=)将shift艾德值与cumsum进行比较,为连续值的唯一组创建帮助器Series。然后将groupbytransformsize进行比较。最后一个筛选器值仅适用于0numpy.where

g = df['measure'].ne(df['measure'].shift()).cumsum()
counts = df.groupby(['A','B', g])['measure'].transform('size')
df['Consec_zero_count'] = np.where(df['measure'].eq(0), counts, 0)
print (df)
      A  B      DATE  hour  measure  Consec_zero_count
0   A10  1  1/1/2014     0        0                  4
1   A10  1  1/1/2014     1        0                  4
2   A10  1  1/1/2014     2        0                  4
3   A10  1  1/1/2014     3        0                  4
4   A10  2  1/1/2014     4        0                  1
5   A10  2  1/1/2014     5        1                  0
6   A10  2  1/1/2014     6        2                  0
7   A10  3  1/1/2014     7        0                  1
8   A11  1  1/1/2014     8        0                  2
9   A11  1  1/1/2014     9        0                  2
10  A11  1  1/1/2014    10        2                  0
11  A11  1  1/1/2014    11        0                  2
12  A11  1  1/1/2014    12        0                  2
13  A12  2  1/1/2014    13        1                  0
14  A12  2  1/1/2014    14        3                  0
15  A12  2  1/1/2014    15        0                  1
16  A12  4  1/1/2014    16        5                  0
17  A12  4  1/1/2014    17        0                  1
18  A12  6  1/1/2014    18        0                  1
4zcjmb1e

4zcjmb1e2#

与@jezrael的回答相似,但逻辑略有不同:

df.loc[df.measure.eq(0), 'Consec_zero_count'] = (df.groupby(['A','B', df.measure.ne(0).cumsum()])
                                                  .measure.transform(lambda x: x[x.eq(0)].size))

df['Consec_zero_count'] = df['Consec_zero_count'].fillna(0).astype(int)

>>> df
      A  B      DATE  hour  measure  Consec_zero_count
0   A10  1  1/1/2014     0        0                  4
1   A10  1  1/1/2014     1        0                  4
2   A10  1  1/1/2014     2        0                  4
3   A10  1  1/1/2014     3        0                  4
4   A10  2  1/1/2014     4        0                  1
5   A10  2  1/1/2014     5        1                  0
6   A10  2  1/1/2014     6        2                  0
7   A10  3  1/1/2014     7        0                  1
8   A11  1  1/1/2014     8        0                  2
9   A11  1  1/1/2014     9        0                  2
10  A11  1  1/1/2014    10        2                  0
11  A11  1  1/1/2014    11        0                  2
12  A11  1  1/1/2014    12        0                  2
13  A12  2  1/1/2014    13        1                  0
14  A12  2  1/1/2014    14        3                  0
15  A12  2  1/1/2014    15        0                  1
16  A12  4  1/1/2014    16        5                  0
17  A12  4  1/1/2014    17        0                  1
18  A12  6  1/1/2014    18        0                  1
i5desfxk

i5desfxk3#

def function1(dd:pd.DataFrame):
    col1=dd['measure'].ne(0).cumsum()
    return dd.groupby(col1)['measure'].transform(lambda ss:ss.loc[ss.eq(0)].size)

df1.groupby(['A','B']).apply(function1).droplevel([0,1])\
    .pipe(lambda ss:df1.assign(Consec_zero_count=ss.mask(df1['measure'].ne(0),0)))

输出:

A  B      DATE  hour  measure  Consec_zero_count
0   A10  1  1/1/2014     0        0                  4
1   A10  1  1/1/2014     1        0                  4
2   A10  1  1/1/2014     2        0                  4
3   A10  1  1/1/2014     3        0                  4
4   A10  2  1/1/2014     4        0                  1
5   A10  2  1/1/2014     5        1                  0
6   A10  2  1/1/2014     6        2                  0
7   A10  3  1/1/2014     7        0                  1
8   A11  1  1/1/2014     8        0                  2
9   A11  1  1/1/2014     9        0                  2
10  A11  1  1/1/2014    10        2                  0
11  A11  1  1/1/2014    11        0                  2
12  A11  1  1/1/2014    12        0                  2
13  A12  2  1/1/2014    13        1                  0
14  A12  2  1/1/2014    14        3                  0
15  A12  2  1/1/2014    15        0                  1
16  A12  4  1/1/2014    16        5                  0
17  A12  4  1/1/2014    17        0                  1
18  A12  6  1/1/2014    18        0                  1

相关问题