pandas 添加对的缺失值

6kkfgxo0  于 2023-11-15  发布在  其他
关注(0)|答案(3)|浏览(112)

我有一个框架:

  1. [{'Date': Timestamp('2023-01-01 00:00:00'),'Sex':'M', 'Value':11, 'Target':5, 'A':48},
  2. {'Date': Timestamp('2023-01-01 00:00:00'),'Sex':'F', 'Value':25, 'Target':7, 'A':20},
  3. {'Date': Timestamp('2023-01-10 00:00:00'),'Sex':'M', 'Value':45, 'Target':6, 'A':20},
  4. {'Date': Timestamp('2023-01-10 00:00:00'),'Sex':'F', 'Value':5, 'Target':2, 'A':16},
  5. {'Date': Timestamp('2023-01-20 00:00:00'),'Sex':'M', 'Value':10, 'Target':8, 'A':30}]
  6. {'Date': Timestamp('2023-01-20 00:00:00'),'Sex':'M', 'Value':1, 'Target':18, 'A':3}]

个字符
并喜欢填补缺失日期:2023-01-20性别:F为0至Value、Target和A结果:

  1. Date Sex Value Target A
  2. 0 2023-01-01 M 11 5 48
  3. 1 2023-01-01 F 25 7 20
  4. 2 2023-01-10 M 45 6 20
  5. 3 2023-01-10 F 5 2 16
  6. 4 2023-01-20 M 10 8 30
  7. 5 2023-01-20 M 1 18 3
  8. 6 2023-01-20 F 0 0 0

daupos2t

daupos2t1#

您可以执行两次merge,一次是将日期和M/F合并组合,然后将缺失的组合添加到原始数据中。

  1. out = (df[['Date']]
  2. .drop_duplicates()
  3. .merge(pd.Series(['M', 'F'], name='Sex'), how='cross')
  4. .merge(df, how='left').fillna(0).convert_dtypes()
  5. )

字符串
或者,使用janitorcomplete

  1. import janitor
  2. out = df.complete('Date', {'Sex': ['M', 'F']}, fill_value=0)


输出量:

  1. Date Sex Value Target A
  2. 0 2023-01-01 M 11 5 48
  3. 1 2023-01-01 F 25 7 20
  4. 2 2023-01-10 M 45 6 20
  5. 3 2023-01-10 F 5 2 16
  6. 4 2023-01-20 M 10 8 30
  7. 5 2023-01-20 M 1 18 3
  8. 6 2023-01-20 F 0 0 0


另一种选择是构建一个新的框架并运行外部/左侧合并(类似于complete内部所做的):

  1. index = pd.MultiIndex.from_product([df.Date.unique(),
  2. df.Sex.unique()],
  3. names = ['Date', 'Sex'])
  4. index = pd.DataFrame([], index = index)
  5. index.merge(df, on=['Date','Sex'],how='left').fillna(0)
  6. Date Sex Value Target A
  7. 0 2023-01-01 M 11.0 5.0 48.0
  8. 1 2023-01-01 F 25.0 7.0 20.0
  9. 2 2023-01-10 M 45.0 6.0 20.0
  10. 3 2023-01-10 F 5.0 2.0 16.0
  11. 4 2023-01-20 M 10.0 8.0 30.0
  12. 5 2023-01-20 M 1.0 18.0 3.0
  13. 6 2023-01-20 F 0.0 0.0 0.0

展开查看全部
vsmadaxz

vsmadaxz2#

你可以构造新的索引,用默认的fill_value重新建立索引,然后重置索引:

  1. new_idx = pd.MultiIndex.from_product(
  2. [df.Date.unique(), df.Sex.unique()], names=["Date", "Sex"]
  3. )
  4. df = df.set_index(["Date", "Sex"]).reindex(new_idx, fill_value=0).reset_index()
  5. print(df)

字符串
打印:

  1. Date Sex Value Target A
  2. 0 2023-01-01 M 11 5 48
  3. 1 2023-01-01 F 25 7 20
  4. 2 2023-01-10 M 45 6 20
  5. 3 2023-01-10 F 5 2 16
  6. 4 2023-01-20 M 10 8 30
  7. 5 2023-01-20 F 0 0 0

展开查看全部
atmip9wb

atmip9wb3#

  1. new_idx = pd.MultiIndex.from_product(
  2. [df['Date'].unique(), df['Sex'].unique()], names=["Date", "Sex"]
  3. )
  4. df1 = pd.DataFrame(index=new_idx).reset_index()
  5. df2 = pd.merge(df.set_index(["Date", "Sex"]),df1.set_index(["Date",
  6. "Sex"]), how="outer", left_index=True, right_index=True).reset_index()
  7. df2.fillna(0)
  8. print(df2)

字符串
打印:

  1. Date Sex Value Target A
  2. 0 2023-01-01 F 25.0 7.0 75.0
  3. 1 2023-01-01 M 11.0 5.0 90.0
  4. 2 2023-01-10 F 5.0 2.0 95.0
  5. 3 2023-01-10 M 45.0 6.0 36.0
  6. 4 2023-01-20 F 0 0 0
  7. 5 2023-01-20 M 10.0 8.0 43.0
  8. 6 2023-01-20 M 1.0 18.0 3.0

展开查看全部

相关问题