Pandas -将列值合并到新列中的列表中

h5qlskok 于 2023-08-01 发布在其他

关注(0)|答案(3)|浏览(125)

我有一个Python Pandas dataframe df：

d = [['hello', 1, 'GOOD', 'long.kw'],
     [1.2, 'chipotle', np.nan, 'bingo'],
     ['various', np.nan, 3000, 123.456]]
t = pd.DataFrame(data=d, columns=['A','B','C','D'])

字符串
看起来像这样：

print(t)
         A         B     C        D
0    hello         1  GOOD  long.kw
1      1.2  chipotle   NaN    bingo
2  various       NaN  3000  123.456

型
我尝试创建一个新列，它是A、B、C和D中的值的list。所以它看起来像这样：

t['combined']                                             

Out[125]: 
0        [hello, 1, GOOD, long.kw]
1        [1.2, chipotle, nan, bingo]
2        [various, nan, 3000, 123.456]
Name: combined, dtype: object

型
我正在尝试这个代码：

t['combined'] = t.apply(lambda x: list([x['A'],
                                        x['B'],
                                        x['C'],
                                        x['D']]),axis=1)

型
返回以下错误：

ValueError: Wrong number of items passed 4, placement implies 1

型
让我感到困惑的是，如果我删除了我想放在列表中的一列（或者向dataframe添加另一列，而我没有添加到列表中），我的代码就可以工作。
例如，运行以下代码：

t['combined'] = t.apply(lambda x: list([x['A'],
                                        x['B'],
                                        x['D']]),axis=1)

型
如果我只需要3列，返回这个值就很完美了：

print(t)
         A         B     C        D                 combined
0    hello         1  GOOD  long.kw      [hello, 1, long.kw]
1      1.2  chipotle   NaN    bingo   [1.2, chipotle, bingo]
2  various       NaN  3000  123.456  [various, nan, 123.456]

型
我完全不知道为什么请求“组合”列表由数据框架中的所有列组成会产生错误，但选择除1列以外的所有列创建“组合”列表的工作效果与预期的一样。

pandas

来源：https://stackoverflow.com/questions/43898035/pandas-combine-column-values-into-a-list-in-a-new-column

3条答案

按热度按时间

thigvfpy1#

试试这个：

t['combined']= t.values.tolist()

t
Out[50]: 
         A         B     C        D                       combined
0    hello         1  GOOD  long.kw      [hello, 1, GOOD, long.kw]
1     1.20  chipotle   NaN    bingo    [1.2, chipotle, nan, bingo]
2  various       NaN  3000   123.46  [various, nan, 3000, 123.456]

字符串

赞(0）回复(0）举报 2023-08-01

hwazgwia2#

另一种方法是在底层numpy数组上调用list()

t['combined_arr'] = list(t.values)

字符串
应该注意的是，这产生的柱与使用.tolist()略有不同。从下面可以看出，tolist()创建了一个嵌套列表，而list()创建了一个数组列表。

t['combined_list'] = t[['A', 'B']].values.tolist()
t['combined_arr'] = list(t[['A', 'B']].values)

t.iloc[0, 4]  # ['hello', 1]
t.iloc[0, 5]  # array(['hello', 1], dtype=object)

型
根据使用情况，保留ndarray类型有时会很有用。
如果你想合并列而不带NaN值，那么最快的方法是在检查NaN值的同时循环遍历行。与NaN!=NaN一样，最快的检查是检查一个值是否等于它本身。

t['combined'] = [[e for e in row if e==e] for row in t.values.tolist()]

         A     B     C        D                     combined
0    hello   1.0  GOOD  long.kw  [hello, 1.0, GOOD, long.kw]
1      1.2  10.0   NaN    bingo           [1.2, 10.0, bingo]  <-- no NaN
2  various   NaN  3000  123.456     [various, 3000, 123.456]  <-- no NaN

型
更完整的检查是使用内置math模块中的isnan。

import math
t['combined'] = [[e for e in row if not (isinstance(e, float) and math.isnan(e))] for row in t.values.tolist()]

型
要合并非NaN值的特定列，请首先选择列：

cols = ['A', 'B']
t['combined'] = [[e for e in row if e==e] for row in t[cols].values.tolist()]

型

赞(0）回复(0）举报 2023-08-01

qnakjoqk3#

这里有一个方法与NaN

t.assign(combined = pd.Series(d))

字符串
输出量：

A         B     C        D                       combined
0    hello         1  GOOD  long.kw      [hello, 1, GOOD, long.kw]
1      1.2  chipotle   NaN    bingo    [1.2, chipotle, nan, bingo]
2  various       NaN  3000  123.456  [various, nan, 3000, 123.456]

型
这里有一个没有NaN的方法

t.assign(combined = t.stack().groupby(level=0).agg(list))

型
输出量：

A         B     C        D                   combined
0    hello         1  GOOD  long.kw  [hello, 1, GOOD, long.kw]
1      1.2  chipotle   NaN    bingo     [1.2, chipotle, bingo]
2  various       NaN  3000  123.456   [various, 3000, 123.456]

型

赞(0）回复(0）举报 2023-08-01

我来回答

Pandas -将列值合并到新列中的列表中

3条答案

相关问题

热门标签

最新问答