pandas 我需要使用另外两列来标识值

ppcbkaq5 于 2023-01-24 发布在其他

关注(0)|答案(1)|浏览(93)

我有一个叫做data的 Dataframe ，看起来像这样：
| 组织标识|提交日期|提交金额|
| - ------|- ------|- ------|
| 一百二十三|2020年6月1日|五万|
| 一百二十三|2020年6月1日|五万|
| 一百二十三|二○二一年六月一日|六万|
| 二百三十四|2019年7月1日|三万|
| 二百三十四|2020年7月1日|四万|
| 二百三十四|2021年7月1日|五万|
我希望 Dataframe 如下所示：
| 组织标识|日期_1|日期_2|日期_3|修正案_1|修正案2|修正案3|
| - ------|- ------|- ------|- ------|- ------|- ------|- ------|
| 一百二十三|2020年6月1日|二○二一年六月一日|2022年6月1日|五万|五万|六万|
| 二百三十四|2019年7月1日|2020年7月1日|2021年7月1日|三万|四万|五万|
我通过以下方法获得了date列和org_id列：

dates = data.groupby('org_id').apply(lambda x: x['commit_date'].unique()) #get all unique commit_date for the org_id
    dates = dates.apply(pd.Series) #put each unique commit_date into it's own column, NaN if the org_id doesn't have enough commit_dates
    c_dates = pd.DataFrame() #create empty dataframe
    c_dates['org_id'] = dates.index #I had to specify each col bc the 
    dates df was too hard to work with.
    c_dates['date_1'] = dates[0].values.tolist()
    c_dates['date_2'] = dates[1].values.tolist()
    c_dates['date_3'] = dates[2].values.tolist()

我不知道如何获取amt_1、amt_2和amt_3列。我不能只是重复日期列代码，否则将错过org_id_123的重复50000。因为c_dates Dataframe 与原始数据 Dataframe 的长度不匹配，我不能只是将c_dates与数据进行比较。
令人兴奋的更新!我还没有完全解决我的问题，但我已经取得了一些进展：

dates = data.groupby(['org_id','commit_amt']).apply(lambda x: x['commit_date'].unique()) #get all unique commit_date for the org_id
dates = dates.apply(pd.Series) #put each unique commit_date into it's own column, NaN if the org_id doesn't have enough commit_dates

提供了所需的数据，但未按所需格式进行格式化。结果如下所示：
| 组织标识|提交金额|||
| - ------|- ------|- ------|- ------|
| 一百二十三|五万|2020年6月1日|二○二一年六月一日|
| 一百二十三|六万|2022年6月1日||
| 二百三十四|三万|2019年7月1日||
| 二百三十四|四万|2020年7月1日||
| 二百三十四|五万|2021年7月1日||
我将感谢任何帮助，让我到我想要的格式。我最终希望能够采取amt_1和amt_2之间的差异，等等。
希望这是有道理的。
感谢编辑这篇文章的英雄，从而教我如何制作table!
令人兴奋的消息!!我已经解决了我的问题!!!
长话短说，我需要的功能是unstack。我现在很累，但明天，我会用解决方案编辑这个! w00t!

pandas

来源：https://stackoverflow.com/questions/75191335/need-to-identify-value-using-two-other-columns

1条答案

按热度按时间

e0bqpujr1#

我认为你可以使用panda.pivot（）来调整你的日期。但是使用pivot（）有一个问题，那就是你不能有重复的值。首先我认为你应该删除重复的行，然后再使用pivot。

data = data.drop_duplicates()
data.pivot(index='org_id', columns=['commit_amt'], values=['commit_date'])

赞(0）回复(0）举报 2023-01-24

我来回答

pandas 我需要使用另外两列来标识值

1条答案

相关问题

热门标签

最新问答