pandas 在groupby之后根据发生次数创建二进制列

dauxcl2d  于 2023-08-01  发布在  其他
关注(0)|答案(2)|浏览(133)

一个空df w特定的感兴趣列(col 1 -5)

  1. dfw_columns = pd.DataFrame({
  2. "col1": [],
  3. "col2": [],
  4. "col3": [],
  5. "col4": [],
  6. "col5": []
  7. })

字符串
df w实际条目

  1. df = pd.DataFrame({
  2. "Name": ["abc", "abc", "abc", "def", "def", "ghi", "ghi"],
  3. "colids": ["col1", "col33", np.nan, "col5", "col1", "col2", np.nan]
  4. })


根据每个Name和Colid在df中的引用(1或0),将值放置在dfw_columns中。
所需输出(填充空dfw_columns后)

  1. desireddf = pd.DataFrame({
  2. "Name": ["abc", "def", "ghi"],
  3. "col1": [1,1, 0],
  4. "col2": [0,0, 1],
  5. "col3": [0,0, 0],
  6. "col4": [0,0, 0],
  7. "col5": [0,1,0]
  8. })
  9. desireddf

mbskvtky

mbskvtky1#

IIUC,您可以pd.crosstab + .reindex

  1. cols_of_interest = ['col1', 'col2', 'col3', 'col4', 'col5']
  2. out = pd.crosstab(df['Name'], df['colids']).reindex(columns=cols_of_interest, fill_value=0)
  3. print(out)

字符串
图纸:

  1. colids col1 col2 col3 col4 col5
  2. Name
  3. abc 1 0 0 0 0
  4. def 1 0 0 0 1
  5. ghi 0 1 0 0 0

展开查看全部
ruoxqz4g

ruoxqz4g2#

使用pivot如下。

  1. df = pd.DataFrame({
  2. "Name": ["abc", "abc", "abc", "def", "def", "ghi", "ghi"],
  3. "colids": ["col1", "col3", np.nan, "col5", "col1", "col2", np.nan]
  4. })
  5. df = df.dropna()
  6. df['value'] = 1
  7. df.pivot(index = 'Name', columns = 'colids', values = 'value').fillna(0)

字符串
结果如下(注意,df不包括col4)。

  1. colids col1 col2 col3 col5
  2. Name
  3. abc 1.0 0.0 1.0 0.0
  4. def 1.0 0.0 0.0 1.0
  5. ghi 0.0 1.0 0.0 0.0

展开查看全部

相关问题