如何从Pandas的嵌套列表列中获取最小值？为什么numpy.min()在numpy.ean()可以工作的情况下不能工作？

0yycz8jy 于 2022-11-10 发布在其他

关注(0)|答案(3)|浏览(124)

我有一小段代码需要修改，我无法确切地找到为什么np.ean()在特定情况下工作，而np.min()在Pandas列由嵌套列表组成的情况下不能工作。也许这里有人能澄清一下？
下面这段代码可以完美地工作：

import pandas as pd
import numpy as np

def transformation(custom_df):
    dic = dict(zip(custom_df['customers'], custom_df['values']))
    custom_df['values'] = np.where(custom_df['values'].isna() & (custom_df['valid_neighbors'] >= 1),
                                   custom_df['neighbors'].apply(
                                       lambda row: np.mean([dic[v] for v in row if dic.get(v)])),
                                   custom_df['values'])
    return custom_df

customers = [1, 2, 3, 4, 5, 6]
values = [np.nan, np.nan, 10, np.nan, 11, 12]
neighbors = [[6], [3], [], [3, 5], [6], [5]]
vn = [1, 1, 0, 2, 1, 1]
df2 = pd.DataFrame({'customers': customers, 'values': values, 'neighbors': neighbors, 'valid_neighbors': vn})

   customers  values neighbors  valid_neighbors
0          1     NaN       [6]                1
1          2     NaN       [3]                1
2          3    10.0        []                0
3          4     NaN    [3, 5]                2
4          5    11.0       [6]                1
5          6    12.0       [5]                1

df2 = transformation(df2)

结果是：

customers  values neighbors  valid_neighbors
0          1    12.0       [6]                1
1          2    10.0       [3]                1
2          3    10.0        []                0
3          4    10.5    [3, 5]                2
4          5    11.0       [6]                1
5          6    12.0       [5]                1

但是，如果我在“change()”函数上将np.ean()更改为np.min()，它将返回一个ValueError，这让我纳闷为什么在调用np.ean()函数时没有发生这种情况：

ValueError: zero-size array to reduction operation minimum which has no identity

我想知道我没有满足哪些条件，我可以做些什么来获得预期的结果，这将是：

customers  values neighbors  valid_neighbors
0          1    12.0       [6]                1
1          2    10.0       [3]                1
2          3    10.0        []                0
3          4    10.0    [3, 5]                2
4          5    11.0       [6]                1
5          6    12.0       [5]                1

numpy

来源：https://stackoverflow.com/questions/74342298/how-to-get-the-minimum-value-from-a-nested-list-column-on-pandas-why-numpy-min

3条答案

按热度按时间

mftmpeh81#

使用以下代码并获得结果：

df3 = df2.set_index('customers')
df2['values'].fillna(df2['neighbors'].apply(lambda x: df3.loc[x, 'values'].mean()))

产量(平均值)：

0   12.00
1   10.00
2   10.00
3   10.50
4   11.00
5   12.00
Name: values, dtype: float64

您可以将mean更改为min：

df2['values'].fillna(df2['neighbors'].apply(lambda x: df3.loc[x, 'values'].min()))

输出(分钟)：

0   12.00
1   10.00
2   10.00
3   10.00
4   11.00
5   12.00
Name: values, dtype: float64

对value列执行所需结果

赞(0）回复(0）举报 2022-11-10

mspsb9vt2#

在您的neighbors列中有一个空列表，这会对np.min抛出错误，但即使对于空列表，np.mean也可以。

import numpy as np

print(np.mean([])) 

# Output

# nan

print(np.min([])) 

# Throws error

# ValueError: zero-size array to reduction operation minimum which has no identity

赞(0）回复(0）举报 2022-11-10

bxpogfeg3#

最好使用neighbors列中的空数组调整来更新transformation函数。这里有一个可能奏效的变通办法。

def transformation(custom_df):
    dic = dict(zip(custom_df['customers'], custom_df['values']))
    custom_df['values'] = np.where(custom_df['values'].isna() & (custom_df['valid_neighbors'] >= 1),
                                   custom_df['neighbors'].apply(
                                       lambda row: np.min([dic[v] for v in row if dic.get(v)]) if len(row) else 0),
                                   custom_df['values'])
    return custom_df

赞(0）回复(0）举报 2022-11-10

我来回答

如何从Pandas的嵌套列表列中获取最小值？为什么numpy.min()在numpy.ean()可以工作的情况下不能工作？

3条答案

相关问题

热门标签

最新问答