pytorch 如何从numpy数组中获取前K个值的索引

f1tvaqid 于 2023-02-08 发布在其他

关注(0)|答案(3)|浏览(258)

假设我有来自Pytorch或Keras预测的概率，结果是softmax函数

from scipy.special import softmax
probs = softmax(np.random.randn(20,10),1) # 20 instances and 10 class probabilities
probs

我想从这个numpy数组中找到前5个索引，我所要做的就是在结果上运行一个循环，类似于：

for index in top_5_indices:
    if index in result:
        print('Found')

如果我的结果在前5名，我会得到。
Pytorch有top-k功能，我看过numpy.argpartition，但我不知道如何做到这一点？

pytorch

来源：https://stackoverflow.com/questions/65038206/how-to-get-indices-of-top-k-values-from-a-numpy-array

3条答案

按热度按时间

4xrmg8kj1#

numpy中的argpartition（a，k）函数将输入数组a的索引重新排列在第k个最小元素周围，这样所有较小元素的索引都在左边结束，所有较大元素的索引都在右边结束。时间复杂度为O（n）。
所以你可以得到5个最大元素的指数，如下所示：

np.argpartition(probs,-5)[-5:]

赞(0）回复(0）举报 2023-02-08

nuypyhwy2#

稍微贵一点，但是argsort可以：

idx = np.argsort(probs, axis=1)[:,-5:]

如果我们谈论的是pytorch：

probs = torch.from_numpy(softmax(np.random.randn(20,10),1))

values, idx = torch.topk(probs, k=5, axis=-1)

赞(0）回复(0）举报 2023-02-08

n6lpvg4x3#

现有的答案是正确的，但我想对它们进行扩展，以提供一个自包含函数，其行为与纯numpy的torch.topk完全相同。
下面是这个函数（我已经在内联中包含了指令）：

def topk(array, k, axis=-1, sorted=True):
    # Use np.argpartition is faster than np.argsort, but do not return the values in order
    # We use array.take because you can specify the axis
    partitioned_ind = (
        np.argpartition(array, -k, axis=axis)
        .take(indices=range(-k, 0), axis=axis)
    )
    # We use the newly selected indices to find the score of the top-k values
    partitioned_scores = np.take_along_axis(array, partitioned_ind, axis=axis)
    
    if sorted:
        # Since our top-k indices are not correctly ordered, we can sort them with argsort
        # only if sorted=True (otherwise we keep it in an arbitrary order)
        sorted_trunc_ind = np.flip(
            np.argsort(partitioned_scores, axis=axis), axis=axis
        )
        
        # We again use np.take_along_axis as we have an array of indices that we use to
        # decide which values to select
        ind = np.take_along_axis(partitioned_ind, sorted_trunc_ind, axis=axis)
        scores = np.take_along_axis(partitioned_scores, sorted_trunc_ind, axis=axis)
    else:
        ind = partitioned_ind
        scores = partition_scores
    
    return scores, ind

要验证正确性，您可以对torch进行测试：

import torch
import numpy as np

x = np.random.randn(50, 50, 10, 10)

axis = 2  # Change this to any axis and it'll be fine

val_np, ind_np = topk(x, k=10, axis=axis)

val_pt, ind_pt = torch.topk(torch.tensor(x), k=10, dim=axis)

print("Values are same:", np.all(val_np == val_pt.numpy()))
print("Indices are same:", np.all(ind_np == ind_pt.numpy()))

为了清楚起见，np.take_along_axis是用于访问更高维度中的原始值的recommended to be used with np.argpartition。
np.argpartition比np.argsort快，因为它不对整个数组排序。This answer声称它使用O(n)而不是'O（n log

赞(0）回复(0）举报 2023-02-08

我来回答

pytorch 如何从numpy数组中获取前K个值的索引

3条答案

相关问题

热门标签

最新问答