Pandas DataFrame，在行和列名中获取3个最大值

qxsslcnc 于 12个月前发布在其他

关注(0)|答案(5)|浏览(178)

在论坛上有很多例子，如何找到行的最大值与相应的列名。一些例子是here或here
我想做的是对上面的例子进行一些具体的修改。我的框架看起来像这样，其中所有列都是从左到右编号的（这个顺序非常重要）：

x_1 x_2 x_3 x_4 x_5 x_6 x_7 x_8 x_9 x_10
  0   0   1   2   2   0   0   0   0    0
  4   4   0   4   4   1   0   0   0    0
  0   0   1   2   3   0   0   0   0    0

字符串
现在，我想在每行的末尾创建6个新列，列名称和行中最大的值。

x_1 x_2 x_3 x_4 x_5 x_6 x_7 x_8 x_9 x_10 Max1 ValMax1 Max2 ValMax2 Max3 ValMax3
  0   0   1   2   2   0   0   0   0    0
  4   4   0   4   4   1   0   0   0    0
  0   0   1   2   3   0   0   0   0    0

型
如果某行有多个，则最多只有1个（例如第一行中的值2），我想在列Max 1中保存一个索引最小的列名。在这种情况下，第二大值也是2，但相应的列具有更大的索引。这意味着，必须在“Max（y）”列中保存只有一个列名。这是主要条件。在这种情况下，如果某行具有超过3个max值，只需要保存3个索引最小的列名。所以最终的输出应该像这样的DF：

x_1 x_2 x_3 x_4 x_5 x_6 x_7 x_8 x_9 x_10 Max1 ValMax1 Max2 ValMax2 Max3 ValMax3
  0   0   1   2   2   0   0   0   0    0  x_4       2  x_5       2  x_3       1
  4   4   0   4   4   1   0   0   0    0  x_1       4  x_2       4  x_4       4
  0   0   1   2   3   0   0   0   0    0  x_5       3  x_4       2  x_3       1

型
总结一下，我们有了下一个结果：第一行4 < 5，表示4先到（无论如何，第二个2立即出现在下一列）。在第二行1 < 2 < 4 < 5中，我们只有3列，因此最终结果中缺少5。在第三行中，索引不起任何作用，因为我们在行中有严格不同的值。这也是主要条件。

pandas

来源：https://stackoverflow.com/questions/77718744/pandas-dataframe-get-3-max-values-in-the-row-and-their-column-names

5条答案

按热度按时间

lvmkulzt1#

为了实现有效的方法，需要使用numpy的argpartition和索引进行向量化：

import numpy as np

N = 3

# convert to arrays
# and reverse to preserve order
# of min index in case of a tie
cols = df.columns.to_numpy()[::-1]
a = df.loc[:, ::-1].to_numpy()

# get the top N indices
idx = np.argpartition(a, -N)[:, :-N-1:-1]

# get the top names 
names = cols[idx]

# get the top values
values = np.take_along_axis(a, idx, axis=1)
# or
values = a[np.arange(len(a))[:,None], idx]

# assign to new columns
df[[f'{x}{i+1}' for i in range(N) for x in ['Max', 'ValMax']]
  ] = (np.dstack([names,  values])
         .reshape(len(df), -1)
       )

字符串
输出量：

x_1  x_2  x_3  x_4  x_5  x_6  x_7  x_8  x_9  x_10 Max1 ValMax1 Max2 ValMax2 Max3 ValMax3
0    0    0    1    2    2    0    0    0    0     0  x_4       2  x_5       2  x_3       1
1    4    4    0    4    4    1    0    0    0     0  x_1       4  x_2       4  x_4       4
2    0    0    1    2    3    0    0    0    0     0  x_5       3  x_4       2  x_3       1

型

赞(0）回复(0）举报 12个月前

iswrvxsc2#

在NumPy中这样做似乎更有意义，然后在最后获得列名。
我写了一个函数，你可以用它来获取数组的顶部n索引。它的工作原理是使用np.nanargmax，然后在再次执行之前将值屏蔽为NaN。（可能有更好的方法来做到这一点，但这只是我首先想到的。）

def argmax_n(arr: np.array, n: int, axis=None):
    arr = arr.astype('float')
    argmaxes = []
    for _ in range(n):
        argmax = np.nanargmax(arr, axis=axis, keepdims=True)
        argmaxes.append(argmax)
        np.put_along_axis(arr, argmax, np.NAN, axis=axis)
    return argmaxes

字符串
用法如下：

a = df.to_numpy()
argmax_3 = argmax_n(a, 3, axis=1)

型
然后你可以构建你想要的DataFrame，如果需要的话，可以用原始的DataFrame .join它。

max_data = {}
for i, arg in enumerate(argmax_3, start=1):
    max_data[f'Max{i}'] = df.columns[arg.flatten()]
    max_data[f'ValMax{i}'] = np.take_along_axis(a, arg, axis=1).flatten()

pd.DataFrame(max_data)

Max1  ValMax1 Max2  ValMax2 Max3  ValMax3
0  x_4        2  x_5        2  x_3        1
1  x_1        4  x_2        4  x_4        4
2  x_5        3  x_4        2  x_3        1

的字符串

赞(0）回复(0）举报 12个月前

vc9ivgsu3#

使用下面的代码块，它首先创建了一个Xramedf_copy的副本，其中列名被替换为相应的数字索引（正如你提到的顺序很重要）。然后它对每一行应用一个函数来获取前3个最大值的索引。然后这些索引被Map回原始列名。最后，它获取这些列的值，并且当然按预期对列进行重新排序。

import pandas as pd
import numpy as np

df = pd.DataFrame({
    'x_1': [0, 4, 0],
    'x_2': [0, 4, 0],
    'x_3': [1, 0, 1],
    'x_4': [2, 4, 2],
    'x_5': [2, 4, 3],
    'x_6': [0, 1, 0],
    'x_7': [0, 0, 0],
    'x_8': [0, 0, 0],
    'x_9': [0, 0, 0],
    'x_10': [0, 0, 0]
})

# Create a copy of the dataframe and replace column names with their corresponding numeric index
df_copy = df.copy()
df_copy.columns = np.arange(len(df.columns))

# Apply a function to each row (axis=1) to get the indices of the top 3 max values
df[['Max1', 'Max2', 'Max3']] = df_copy.apply(lambda row: row.nlargest(3).index, axis=1, result_type='expand')

# Map the numeric indices back to column names
df[['Max1', 'Max2', 'Max3']] = df[['Max1', 'Max2', 'Max3']].applymap(lambda x: df.columns[int(x)])

# Get the values
df[['ValMax1', 'ValMax2', 'ValMax3']] = df.apply(lambda row: [row[row['Max1']], row[row['Max2']], row[row['Max3']]], axis=1, result_type='expand')

# Reorder the columns
column_order = ['x_1', 'x_2', 'x_3', 'x_4', 'x_5', 'x_6', 'x_7', 'x_8', 'x_9', 'x_10', 'Max1', 'ValMax1', 'Max2', 'ValMax2', 'Max3', 'ValMax3']
df = df[column_order]
df

字符串
结果（如预期）：

x_1 x_2 x_3 x_4 x_5 x_6 x_7 x_8 x_9 x_10 Max1 ValMax1 Max2 ValMax2 Max3 ValMax3
  0   0   1   2   2   0   0   0   0    0  x_4       2  x_5       2  x_3       1
  4   4   0   4   4   1   0   0   0    0  x_1       4  x_2       4  x_4       4
  0   0   1   2   3   0   0   0   0    0  x_5       3  x_4       2  x_3       1

型

赞(0）回复(0）举报 12个月前

33qvvth14#

你也可以尝试这样的东西：

import pandas as pd

# Using @canaytore dataframe setup
df = pd.DataFrame({
'x_1': [0, 4, 0],
'x_2': [0, 4, 0],
'x_3': [1, 0, 1],
'x_4': [2, 4, 2],
'x_5': [2, 4, 3],
'x_6': [0, 1, 0],
'x_7': [0, 0, 0],
'x_8': [0, 0, 0],
'x_9': [0, 0, 0],
'x_10': [0, 0, 0]
 })

n = 4 #Top N values
dfr = df.T.rank(method='first', ascending=False)\
          .stack().astype('int')\
          .rename('place').loc[lambda x: x<=n]\
          .reset_index()\
          .pivot(index='level_1', columns='place', values='level_0')\
          .add_prefix('Max')

idx = dfr.stack().reset_index(level=0).to_numpy().tolist()

dfv = df.stack().loc[idx]
dfv = pd.DataFrame(dfv.to_numpy().reshape(-1,n), 
                   columns=[f'Max{i}Value' for i in range(1,n+1)])

df_out = pd.concat([df, pd.concat([dfr, dfv], axis=1).sort_index(axis=1)], axis=1)

print(df_out)

字符串
输出量：

x_1  x_2  x_3  x_4  x_5  x_6  x_7  x_8  x_9  x_10 Max1  Max1Value Max2  Max2Value Max3  Max3Value Max4  Max4Value
0    0    0    1    2    2    0    0    0    0     0  x_4          2  x_5          2  x_3          1  x_1          0
1    4    4    0    4    4    1    0    0    0     0  x_1          4  x_2          4  x_4          4  x_5          4
2    0    0    1    2    3    0    0    0    0     0  x_5          3  x_4          2  x_3          1  x_1          0

型

赞(0）回复(0）举报 12个月前

evrscar25#

你可以试试rank

rnk = df.rank(method = 'first',axis=1)>=8
value = df.to_numpy()[rnk].reshape(3,-1)
name = rnk.dot(rnk.columns+',').str[:-1].str.split(',')

字符串
创建df你可以使用相同的方法像mozway

赞(0）回复(0）举报 12个月前

我来回答

Pandas DataFrame，在行和列名中获取3个最大值

5条答案

相关问题

热门标签

最新问答