pandas 如果其他列上有重复的行,如何创建一个包含json列表的列?

h22fl7wq  于 2022-12-16  发布在  其他
关注(0)|答案(3)|浏览(113)

我有一个Pandas Dataframe ,如下所示:

buyer_id    car      color   year
john        ferrari  yellow  2022
eric        ferrari  red     2022
john        mercedes black   1990
victoria    audi     yellow  2017

我想创建一个新列(每行中的JSON列表。
创建一个列“identical”,每行包含一个列表:

  • 如果在'buyer_id'中只找到一个买方,则列表中的一个元素:

[{'汽车':...,'颜色':...,'年份':...}]

  • 如果“buyer_id”中的多行上有相同的买方

[{“汽车”:“法拉利”,“颜色”:“黄色”、“年份”:2022年},{“汽车”:“梅赛德斯”,“颜色”:“黑色”、“年份”:1990年]
预期产出:

buyer_id   car      color   year  identical
    john       ferrari  yellow  2022  [{'car':'ferrari', 'color': 'yellow ', 'year': 2022},{'car':'mercedes', 'color': 'black', 'year': 1990}]
    eric       ferrari  red     2022  [{'car':'ferrari', 'color': 'red', 'year': 2022}]
    john       mercedes black   1990  [[{'car':'ferrari', 'color': 'yellow ', 'year': 2022},{'car':'mercedes', 'color': 'black', 'year': 1990}]
    victoria   audi     yellow  2017  [{'car':'audi', 'color': 'yellow', 'year': 2017}]

我不知道如何与Pandas做这件事,如果它是可能的。

pvabu6sv

pvabu6sv1#

您可以将GroupBy.applyto_jsonorient="records"参数一起使用:

s = (df.groupby('buyer_id')
       .apply(lambda g: g.drop('buyer_id', axis=1)
                         .to_json(orient='records'))
    )
df2 = df.merge(s.rename('identical'), left_on='buyer_id', right_index=True)

或在适当位置:

s = (df.set_index('buyer_id')
       .groupby(level='buyer_id')
       .apply(lambda g: g.to_json(orient='records'))
    )
df['identical'] = df['buyer_id'].map(s)

输出:

buyer_id       car   color  year                                                                                        identical
0      john   ferrari  yellow  2022  [{"car":"ferrari","color":"yellow","year":2022},{"car":"mercedes","color":"black","year":1990}]
1      eric   ferrari     red  2022                                                    [{"car":"ferrari","color":"red","year":2022}]
2      john  mercedes   black  1990  [{"car":"ferrari","color":"yellow","year":2022},{"car":"mercedes","color":"black","year":1990}]
3  victoria      audi  yellow  2017                                                    [{"car":"audi","color":"yellow","year":2017}]
bf1o4zei

bf1o4zei2#

试试看:

to_dict = lambda x: x.to_dict('records')
df['identical'] = df['buyer_id'].map(df.set_index('buyer_id') \
                                       .groupby('buyer_id').apply(to_dict))
print(df)

# Output
   buyer_id       car   color  year                                                                                                   identical
0      john   ferrari  yellow  2022  [{'car': 'ferrari', 'color': 'yellow', 'year': 2022}, {'car': 'mercedes', 'color': 'black', 'year': 1990}]
1      eric   ferrari     red  2022                                                          [{'car': 'ferrari', 'color': 'red', 'year': 2022}]
2      john  mercedes   black  1990  [{'car': 'ferrari', 'color': 'yellow', 'year': 2022}, {'car': 'mercedes', 'color': 'black', 'year': 1990}]
3  victoria      audi  yellow  2017                                                          [{'car': 'audi', 'color': 'yellow', 'year': 2017}]

要将列导出为JSON,可以用途:

>>> df['identical'].to_json(orient='records', indent=2)
[
  [
    {
      "car":"ferrari",
      "color":"yellow",
      "year":2022
    },
    {
      "car":"mercedes",
      "color":"black",
      "year":1990
    }
  ],
  [
    {
      "car":"ferrari",
      "color":"red",
      "year":2022
    }
  ],
  [
    {
      "car":"ferrari",
      "color":"yellow",
      "year":2022
    },
    {
      "car":"mercedes",
      "color":"black",
      "year":1990
    }
  ],
  [
    {
      "car":"audi",
      "color":"yellow",
      "year":2017
    }
  ]
]
6vl6ewon

6vl6ewon3#

def function1(dd:pd.DataFrame):
    return dd.assign(identical=dd.iloc[:,1:].to_json(orient="records"))

df1.groupby('buyer_id').apply(function1)

 buyer_id   car      color   year  identical
    john       ferrari  yellow  2022  [{'car':'ferrari', 'color': 'yellow ', 'year': 2022},{'car':'mercedes', 'color': 'black', 'year': 1990}]
    eric       ferrari  red     2022  [{'car':'ferrari', 'color': 'red', 'year': 2022}]
    john       mercedes black   1990  [[{'car':'ferrari', 'color': 'yellow ', 'year': 2022},{'car':'mercedes', 'color': 'black', 'year': 1990}]
    victoria   audi     yellow  2017  [{'car':'audi', 'color': 'yellow', 'year': 2017}]

相关问题