给定 Dataframe :
col_a col_b
0 b 2022-01-01
1 a 2022-01-02
2 c 2022-10-03
3 b 2022-10-01
4 a 2022-10-03
5 c 2022-10-02
我想:
- 按列分组
- 在组内,值按col_b排序
- 然后按col_b的min顺序对组进行排序
因此第一行应对应于在Col_B中具有第一值的Col_A组。
期望输出:
col_a col_b
0 b 2022-01-01 # b has first min value col_b --> at the start of df
1 b 2022-10-01 # the next value is the sorted next value of group b.
2 a 2022-01-02 # a has second min value col_b --> second in df order
3 a 2022-10-03
4 c 2022-10-03
5 c 2022-10-02
我可以按col_a分组,并在组内按col_b排序,但不能按每个组的col_b的最小值对df排序
import pandas as pd
df = pd.DataFrame({
"col_a": ['b', 'a', 'c', 'b', 'a', 'c'],
'col_b' : pd.to_datetime(["2022/01/01","2022/01/02","2022/10/03","2022/10/01", "2022/10/03","2022/10/02"])
})
desired_df = pd.DataFrame({
"col_a": ['b', 'b', 'a', 'a', 'c', 'c'],
'col_b' : pd.to_datetime(["2022/01/01", "2022/10/01","2022/01/02", "2022/10/03","2022/10/03", "2022/10/02"])
})
print (df.groupby('col_a').apply(lambda x: x.sort_values('col_b'))) # this is not working
1条答案
按热度按时间lndjwyie1#
可以将临时列与
groupby.transform
一起使用:或者
numpy.lexsort
:输出: