pandas 获取单元格满足条件的列名列表,并将其存储在df中每行的单独列中

b1uwtaje  于 2023-01-19  发布在  其他
关注(0)|答案(2)|浏览(157)

我有一个df,结构类似下面的数据,有更多的列和更多的行。如何得到预期的结果。我的代码有一个错误-我解释为列表不能保存在df单元格?如何避免循环,如果可能的话?

data = [[0, 0, 1], [0, 1, 0], [1, 0, 0], [1, 0, 1], [0, 0, 0]]

df = pd.DataFrame(data, columns=["choice_a", "choice_b", "choice_c"])

Expected result

       choice_a  choice_b    choice_c  choices
index                  
0        0       0           1         ['c']
1        0       1           0         ['b']
2        1       0           0         ['a']
3        1.      0.          1         ['a','b']
4        0.      0.          0         NA

我的代码

df['choices']=0
for i in np.arange(df.shape[0]):
    choice_list = []
    for j in np.arange(len(df.columns)):
        if df.iloc[i,j]==1:
            choice_list.append(df.columns[j].split('_')[1])
    df.iloc[i,4]=choice_list

我收到的错误

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
/var/folders/65/3mqr9fpn37jf2xt2pxbcgp_w0000gn/T/ipykernel_1513/2279334138.py in <module>
      5         if main_dataset.iloc[i,j]==1:
      6             choice_list.append(main_dataset.columns[j].split('_')[1])
----> 7     main_dataset.iloc[i,5]=choice_list

~/opt/anaconda3/lib/python3.9/site-packages/pandas/core/indexing.py in __setitem__(self, key, value)
    714 
    715         iloc = self if self.name == "iloc" else self.obj.iloc
--> 716         iloc._setitem_with_indexer(indexer, value, self.name)
    717 
    718     def _validate_key(self, key, axis: int):

~/opt/anaconda3/lib/python3.9/site-packages/pandas/core/indexing.py in _setitem_with_indexer(self, indexer, value, name)
   1689         if take_split_path:
   1690             # We have to operate column-wise
-> 1691             self._setitem_with_indexer_split_path(indexer, value, name)
   1692         else:
   1693             self._setitem_single_block(indexer, value, name)

~/opt/anaconda3/lib/python3.9/site-packages/pandas/core/indexing.py in _setitem_with_indexer_split_path(self, indexer, value, name)
   1744                     return self._setitem_with_indexer((pi, info_axis[0]), value[0])
   1745 
-> 1746                 raise ValueError(
   1747                     "Must have equal len keys and value "
   1748                     "when setting with an iterable"

ValueError: Must have equal len keys and value when setting with an iterable
afdcj2ne

afdcj2ne1#

使用列表解析,过滤器列名按_拆分为新列,最后,如有必要,将空列表替换为缺失值并添加Series.where

cols = df.columns.str.split('_').str[1].to_numpy()
df['choices'] = [list(cols[x == 1]) for x in df.to_numpy()]

df['choices'] = df['choices'].where(df['choices'].astype(bool))
print (df)
   choice_a  choice_b  choice_c choices
0         0         0         1     [c]
1         0         1         0     [b]
2         1         0         0     [a]
3         1         0         1  [a, c]
4         0         0         0     NaN
cwdobuhd

cwdobuhd2#

另一种可能的解决方案:

df['choices'] = df.apply(lambda x: df.columns[x == 1].str.replace(
    r'^.*_', '', regex=True).values if x.any() else np.nan, axis=1)

输出:

choice_a  choice_b  choice_c choices
0         0         0         1     [c]
1         0         1         0     [b]
2         1         0         0     [a]
3         1         0         1  [a, c]
4         0         0         0     NaN

相关问题