在不满足条件的行之前按 Dataframe 选择行(python)

gtlvzcf8 于 2023-01-04 发布在 Python

关注(0)|答案(2)|浏览(147)

我有一个包含一些特性的 Dataframe 。我想按“id”特性分组。然后对于每个组，我想确定“speed”特性值大于阈值的行，并选择此之前的所有行。
例如，对于“速度”特性，我的阈值是1.5，我的输入是：
| 身份证|速率|...|
| - ------|- ------|- ------|
| 1个|第1.2条|...|
| 1个|1.9岁|...|
| 1个|1.0分|...|
| 五个|0.9| ...|
| 五个|1.3岁|...|
| 五个|三、五|...|
| 五个|0.4分|...|
我想要的输出是：
| 身份证|速率|...|
| - ------|- ------|- ------|
| 1个|第1.2条|...|
| 五个|0.9| ...|
| 五个|1.3岁|...|

python

来源：https://stackoverflow.com/questions/74992701/select-rows-in-group-by-dataframe-before-the-row-which-not-satisfies-a-condition

2条答案

按热度按时间

zazmityj1#

这应该会得到你想要的结果：

# Create sample data
df = pd.DataFrame({'id':[1, 1, 1, 5, 5, 5, 5],
'speed':[1.2, 1.9, 1.0, 0.9, 1.3, 9.5, 0.4]
})
df

输出：
x一个一个一个一个x一个一个二个x
输出：

id   speed
0   1   1.2
4   5   1.3

赞(0）回复(0）举报 2023-01-04

yzuktlbb2#

我花了一个小时才弄明白，但我得到了你需要的东西。你需要REVERSE Dataframe ，并在groupbyed id中使用.cumsum()（累积和）来找到你设置的速度阈值之后的值。然后删除超过阈值的速度，以及不满足条件的行。最后，反向返回 Dataframe ：

# Create sample data
df = pd.DataFrame({'id':[1, 1, 1, 5, 5, 5, 5],
'speed':[1.2, 1.9, 1.0, 0.9, 1.3, 9.5, 0.4]
})

# Reverse the dataframe
df = df.iloc[::-1]

thre = 1.5
# Find rows with speed more than threshold
df = df.assign(ge=df.speed.ge(thre))

# Groupby and cumsum to get the rows that are after the threshold in with same id
df.insert(0, 'beforethre', df.groupby('id')['ge'].cumsum())

# Drop speed more than threshold  
df['ge'] = df['ge'].replace(True, np.nan)

# Drop rows that don't have any speed more than threshold or after threshold
df['beforethre'] = df['beforethre'].replace(0, np.nan)
df = df.dropna(axis=0).drop(['ge', 'beforethre'], axis=1)

# Reverse back the dataframe 
df = df.iloc[::-1]

# Viola!
df

输出：

id   speed
0   1   1.2
3   5   0.9
4   5   1.3

赞(0）回复(0）举报 2023-01-04

我来回答

在不满足条件的行之前按 Dataframe 选择行(python)

2条答案

相关问题

热门标签

最新问答