pandas 如何groupby numpy ndarray并返回每组的第一行,现在开始排序

8yparm6h  于 2023-03-06  发布在  其他
关注(0)|答案(2)|浏览(135)

我有ndarray:

[[1 1]
 [0 2]
 [0 3]
 [1 4]
 [1 5]
 [0 6]
 [1 7]]

我希望得到这样的简化结果:

[[1 1]
  [0 2]
  [1 4]
  [0 6]
  [1 7]]

结果ndarray应该包含每个组的第一行。我在列0的值上构建了一个组。这是值0或1。
线程中解决了类似问题:Is there any numpy group by function?但有键排序,在我的情况下,它不工作.

l1 = [1,0,0,1,1,0,1]
l2 = [1,2,3,4,5,6,7]
a = np.array([l1, l2]).T
print(a)
values, indexes = np.unique(a[:, 0], return_index=True)

在Pandas,我们可以实现这一点(解决方案从堆栈,但我不记得所有者,抱歉没有链接):

m1 = ( df['c0'] != df['c0'].shift(1)).cumsum()
df = df.groupby([df['c0'], m1]).head(1)

怎么和麻木做呢?
谢谢你的解决方案。
编辑日期:
在mozway写solution的时候,我创建了这样的东西:

import numpy as np

l1 = [1,0,0,1,1,0,1]
l2 = [1,2,3,4,5,6,7]
a = np.array([l1, l2]).T

print("solution")
"shift for numpy"
arr3 = np.array([np.NaN])
arr4 = np.array(a[ :-1, 0])
arr5 = np.concatenate([arr3, arr4])
print('arr5')
print(arr5)
"add shifted column"
a = np.c_[ a, arr5 ]

"diff between column 0 and shofted colum"
dif_col = np.where(a[:, 0] != a[:, 2], True, False)
"add diff column"
a = np.c_[ a, dif_col ]
"select only true"
mask = (a[:, 3] == True)
a = a[mask, :]
"remove unnecessary redundant columns "
a = np.delete(a, 2, 1)
a = np.delete(a, 2, 1)
print(a)

输出:

[[1. 1.]
 [0. 2.]
 [1. 4.]
 [0. 6.]
 [1. 7.]]

你觉得呢?

odopli94

odopli941#

您可以计算值发生变化的索引:

idx = np.where(np.diff(a[:, 0])!=0)[0]

out = a[np.r_[0, idx+1]]

输出:

array([[1, 1],
       [0, 2],
       [1, 4],
       [0, 6],
       [1, 7]])
每组最小值

我最初误解了,以为您想要每组的最小值,您需要合并到np.minimum.reduceat

idx = np.where(np.diff(a[:, 0])!=0)[0]
out = np.minimum.reduceat(a, np.r_[0, idx+1], axis=0)

示例:

l1 = [1,1,0,0,1,1,0,1]
l2 = [1,0,3,2,4,5,6,7]
a = np.array([l1, l2]).T

idx = np.where(np.diff(a[:, 0])!=0)[0]
out = np.minimum.reduceat(a, np.r_[0, idx+1], axis=0)

array([[1, 0],
       [0, 2],
       [1, 4],
       [0, 6],
       [1, 7]])
每组排序

使用lexsort

group = np.r_[0, np.cumsum(np.diff(a[:, 0])!=0)]
out = a[np.lexsort(np.c_[a[:, 1:], group].T)]

示例:

l1 = [1,1,0,0,1,1,0,1]
l2 = [1,0,3,2,4,5,6,7]
a = np.array([l1, l2]).T

group = np.r_[0, np.cumsum(np.diff(a[:, 0])!=0)]
# array([0, 0, 1, 1, 2, 2, 3, 4])

out = a[np.lexsort(np.c_[a[:, 1:], group].T)]

array([[1, 0],
       [1, 1],
       [0, 2],
       [0, 3],
       [1, 4],
       [1, 5],
       [0, 6],
       [1, 7]])
iyzzxitl

iyzzxitl2#

另一种可能的解决方案基于numpy.roll

m = a[:, 0] != np.roll(a[:,0], 1)
m[0] = True
a[m, :]

输出:

array([[1, 1],
       [0, 2],
       [1, 4],
       [0, 6],
       [1, 7]])

相关问题