pandas dataframe新列(列表)与值从多列.最快的方式

au9on6nz  于 2023-02-11  发布在  其他
关注(0)|答案(4)|浏览(106)

我有这样一个 Dataframe :
| 姓名|食物|体育|
| - ------|- ------|- ------|
| 汤姆|海鲜饭|网球,篮球|
| 尼克|鸡|篮球|
| 东尼|鸡|足球|
| 玛丽亚|鸡|篮球|
我想创建一个新列(包含当前列值的列表),如下所示:
| 姓名|食物|体育|列表列|
| - ------|- ------|- ------|- ------|
| 汤姆|海鲜饭|网球,篮球|[汤姆,海鲜饭,网球,篮球]|
| 尼克|鸡|篮球|[尼克,鸡,篮球]|
| 东尼|鸡|足球|[托尼,鸡,足球]|
| 玛丽亚|鸡|篮球|[玛丽亚,鸡,篮球]|
这是我目前计算/添加新列的方式:

data = {'Name':['Tom', 'nick', 'krish', 'jack'], 
        'Food':['Paella', 'Chicken', 'Chicken', 'Chicken'], 
       'Sport':['Tennis, Basketball','Basketball','Football','Tennis']}

df = pd.DataFrame(data)  

def df_prepare(data):

   
    return (data.fillna('0')
                    .rename(columns={'Sport': 'Courses'})
                    .assign(listcolumn = lambda df:df['Name'].str.split(",") +
                                            df['Food'].str.split(",") +
                                            df['Courses'].str.split(",")))

dataframe_done = df_prepare(df)

有没有一种替代方法可以更快地创建新列?这只是一个示例 Dataframe 。真实的的 Dataframe 有数千行

gdx19jrr

gdx19jrr1#

使用理解:

df['listcolumn'] = [','.join(row).split(',') for idx, row in df.iterrows()]
print(df)

# Output
    Name     Food              Sport                         listcolumn
0    Tom   Paella  Tennis,Basketball  [Tom, Paella, Tennis, Basketball]
1   Nick  Chicken         Basketball        [Nick, Chicken, Basketball]
2   Tony  Chicken           Football          [Tony, Chicken, Football]
3  Maria  Chicken         Basketball       [Maria, Chicken, Basketball]

如果列数较少,则只需:

df['listcolumn'] = (df['Name'] + ',' + df['Food'] + ',' + df['Sport']).str.split(',')
print(df)

# Output
    Name     Food              Sport                         listcolumn
0    Tom   Paella  Tennis,Basketball  [Tom, Paella, Tennis, Basketball]
1   Nick  Chicken         Basketball        [Nick, Chicken, Basketball]
2   Tony  Chicken           Football          [Tony, Chicken, Football]
3  Maria  Chicken         Basketball       [Maria, Chicken, Basketball]
ecbunoof

ecbunoof2#

为了提高性能,请将值转换为numpy数组,并将列表解析与joinsplit一起使用:

df['listcolumn'] = [','.join(x).split(',') for x in df.to_numpy()]
print (df)
   dName     Food              Sport                         listcolumn
0    Tom   Paella  Tennis,Basketball  [Tom, Paella, Tennis, Basketball]
1   Nick  Chicken         Basketball        [Nick, Chicken, Basketball]
2   Tony  Chicken           Football          [Tony, Chicken, Football]
3  Maria  Chicken         Basketball       [Maria, Chicken, Basketball]
ecfdbz9o

ecfdbz9o3#

另一种可能的解决方案:

df['listcolumn'] = df.apply(lambda x: ', '.join(x), axis=1).str.split(',')

输出:

Name     Food               Sport                            listcolumn
0    Tom   Paella  Tennis, Basketball  [Tom,  Paella,  Tennis,  Basketball]
1   nick  Chicken          Basketball         [nick,  Chicken,  Basketball]
2  krish  Chicken            Football          [krish,  Chicken,  Football]
3   jack  Chicken              Tennis             [jack,  Chicken,  Tennis]
mspsb9vt

mspsb9vt4#

您可以使用apply方法和pd.Series来创建新列。下面是如何执行此操作的示例:

df['listcolumn'] = df.apply(lambda x: [x['Name'], x['Food'], *x['Sport'].split(',')], axis=1)

此lambda函数根据Name、Food和Sport列中的值创建一个新列表,apply将lambda函数应用于 Dataframe 的每一行,创建一个新系列,然后将其分配给 Dataframe 中的新列。

相关问题