Pandasgroupby.ngroup()是否按索引顺序排列？

8oomwypt 于 2023-01-11 发布在其他

关注(0)|答案(3)|浏览(98)

Pandasgroupby "ngroup"函数按"group"顺序标记每个组。
我正在寻找类似的行为，但需要分配的标签是在原始（索引）的顺序，我怎么能这样做有效（这将经常发生与大数组）在Pandas和numpy？

> df = pd.DataFrame(
          {"A": [9,8,7,8,9]},
          index=list("abcde"))
   A
a  9
b  8
c  7
d  8
e  9

> df.groupby("A").ngroup()
a    2
b    1
c    0
d    1
e    2

# LOOKING FOR ###################
a    0
b    1
c    2
d    1
e    0

如何使用一维numpy数组获得所需的输出？

arr = np.array([9,8,7,8 ,9])
# looking for [0,1,2,1,0]

pandas

来源：https://stackoverflow.com/questions/63985569/pandas-groupby-ngroup-in-index-order

3条答案

按热度按时间

jgwigjjp1#

也许更好的方法是factorize：

df['A'].factorize()[0]

输出：

array([0, 1, 2, 1, 0])

赞(0）回复(0）举报 2023-01-11

ygya80vv2#

您可以使用np.unique-

In [105]: a = np.array([9,8,7,8,9])

In [106]: u,idx,tags = np.unique(a, return_index=True, return_inverse=True)

In [107]: idx.argsort().argsort()[tags]
Out[107]: array([0, 1, 2, 1, 0])

赞(0）回复(0）举报 2023-01-11

jk9hmnmh3#

可以将sort=Flase传递给groupby（）：

df.groupby('A', sort=False).ngroup()

a    0
b    1
c    2
d    1
e    0
dtype: int64

据我所知，在numpy中没有groupby的直接等价物。对于纯numpy版本，您可以使用numpy.unique()来获得唯一值。numpy.unique()可以选择返回逆，基本上是重新创建输入数组的索引数组，但它首先对唯一值进行排序。因此结果与使用常规（排序的）pandas.groupby()命令相同。
要解决此问题，可以捕获每个唯一值第一次出现时的索引值。对索引值进行排序，并将这些值用作原始数组的索引，以按原始顺序获取唯一值。创建一个字典以在唯一值和组编号之间进行Map，然后使用该字典将数组中的值转换为相应的组编号。

import numpy as np

arr = np.array([9, 8, 7, 8, 9])

_, i = np.unique(arr, return_index=True)  # get the indexes of the first occurence of each unique value
groups = arr[np.sort(i)]  # sort the indexes and retrieve the values from the array so that they are in the array order
m = {value:ngroup for ngroup, value in enumerate(groups)}  # create a mapping of value:groupnumber
np.vectorize(m.get)(arr)  # use vectorize to create a new array using m

array([0, 1, 2, 1, 0])

赞(0）回复(0）举报 2023-01-11

我来回答

Pandasgroupby.ngroup()是否按索引顺序排列？

3条答案

相关问题

热门标签

最新问答