pandas 检索当前行与上一个/下一个正值之间的行数

lnxxn5zx  于 2023-01-01  发布在  其他
关注(0)|答案(2)|浏览(187)

我有以下 Dataframe :

feature
0        1
1        0
2        0
3        0
4        0
5        1
6        0
7        1

我想创建一个2列,其中将包括当前行与上一个和下一个正值之间的行数。输出 Dataframe 应如下所示:

feature    previous_feat        next_feat
0        1               NA                5
1        0                1                4
2        0                2                3
3        0                3                2
4        0                4                1
5        1                5                2
6        0                1                1
7        1                2               NA

我已经尝试过 * shift * 和 * mask * 方法的组合,但是我没有成功。请注意,可能是行数或索引差对我来说并不重要。对于NA值也是一样,可能是NA或0。

import pandas as pd

df = pd.DataFrame({"feature": [1, 0, 0, 0, 0, 1, 0, 1]})

# df["previous_feat"] = df.shift().mask(df["feature"] != 0)
9fkzdhlc

9fkzdhlc1#

您可以使用groupby.cumcount和布尔掩码:

# form groups
g1 = df.loc[::-1, 'feature'].eq(1).cumsum()
g2 = df['feature'].eq(1).cumsum()

# mask first/last
m1 = g2.eq(1) & df['feature'].eq(1)
m2 = g1.eq(1) & df['feature'].eq(1)

# compute cumcount
df['previous_feat'] = df.groupby(g1).cumcount().add(1).mask(m1)
df['next_feat'] = df[::-1].groupby(g2).cumcount().add(1).mask(m2)

输出:

feature  previous_feat  next_feat
0        1            NaN        5.0
1        0            1.0        4.0
2        0            2.0        3.0
3        0            3.0        2.0
4        0            4.0        1.0
5        1            5.0        2.0
6        0            1.0        1.0
7        1            2.0        NaN
tcomlyy6

tcomlyy62#

groups = df["feature"].shift().cumsum()
df["previous_feat"] = df["feature"].groupby(groups).cumcount().add(1)
df["next_feat"] = df["feature"].groupby(groups).cumcount(ascending=False).add(1).shift(-1)
feature  previous_feat  next_feat
0        1            NaN        5.0
1        0            1.0        4.0
2        0            2.0        3.0
3        0            3.0        2.0
4        0            4.0        1.0
5        1            5.0        2.0
6        0            1.0        1.0
7        1            2.0        NaN

相关问题