Pandas:i)groupby列_a ii)sort by列_b iii)sort df by列_b每组的最小值

gblwokeq  于 2023-02-06  发布在  其他
关注(0)|答案(1)|浏览(120)

给定 Dataframe :

col_a      col_b
0     b 2022-01-01
1     a 2022-01-02
2     c 2022-10-03
3     b 2022-10-01
4     a 2022-10-03
5     c 2022-10-02

我想:

  • 按列分组
  • 在组内,值按col_b排序
  • 然后按col_b的min顺序对组进行排序

因此第一行应对应于在Col_B中具有第一值的Col_A组。
期望输出:

col_a      col_b
0     b 2022-01-01 # b has first min value  col_b --> at the start of df 
1     b 2022-10-01 # the next value is the sorted next value of group b. 
2     a 2022-01-02 # a has second min value col_b --> second in df order 
3     a 2022-10-03
4     c 2022-10-03 
5     c 2022-10-02

我可以按col_a分组,并在组内按col_b排序,但不能按每个组的col_b的最小值对df排序

import pandas as pd
df = pd.DataFrame({
    "col_a": ['b', 'a', 'c', 'b', 'a', 'c'],
    'col_b' :   pd.to_datetime(["2022/01/01","2022/01/02","2022/10/03","2022/10/01", "2022/10/03","2022/10/02"])
})

desired_df = pd.DataFrame({
    "col_a": ['b', 'b', 'a', 'a', 'c', 'c'],
    'col_b' :  pd.to_datetime(["2022/01/01", "2022/10/01","2022/01/02", "2022/10/03","2022/10/03", "2022/10/02"])
})

print (df.groupby('col_a').apply(lambda x: x.sort_values('col_b'))) # this is not working
lndjwyie

lndjwyie1#

可以将临时列与groupby.transform一起使用:

out = (df
   .assign(key=df.groupby('col_a')['col_b'].transform('min'))
   .sort_values(by=['key', 'col_a', 'col_b'])
   .drop(columns='key')
)

或者numpy.lexsort

out = df.iloc[np.lexsort([df['col_b'], df['col_a'],
                          df.groupby('col_a')['col_b'].transform('min')])]

输出:

col_a      col_b
0     b 2022-01-01
3     b 2022-10-01
1     a 2022-01-02
4     a 2022-10-03
5     c 2022-10-02
2     c 2022-10-03

相关问题