pandas 添加对的缺失值

6kkfgxo0  于 2023-11-15  发布在  其他
关注(0)|答案(3)|浏览(94)

我有一个框架:

[{'Date': Timestamp('2023-01-01 00:00:00'),'Sex':'M', 'Value':11, 'Target':5, 'A':48},
{'Date': Timestamp('2023-01-01 00:00:00'),'Sex':'F', 'Value':25, 'Target':7, 'A':20},
{'Date': Timestamp('2023-01-10 00:00:00'),'Sex':'M', 'Value':45, 'Target':6, 'A':20},
{'Date': Timestamp('2023-01-10 00:00:00'),'Sex':'F', 'Value':5, 'Target':2, 'A':16},
{'Date': Timestamp('2023-01-20 00:00:00'),'Sex':'M', 'Value':10, 'Target':8, 'A':30}]
{'Date': Timestamp('2023-01-20 00:00:00'),'Sex':'M', 'Value':1, 'Target':18, 'A':3}]

个字符
并喜欢填补缺失日期:2023-01-20性别:F为0至Value、Target和A结果:

Date    Sex Value   Target  A
0   2023-01-01  M   11  5   48
1   2023-01-01  F   25  7   20
2   2023-01-10  M   45  6   20
3   2023-01-10  F   5   2   16
4   2023-01-20  M   10  8   30
5   2023-01-20  M   1   18  3
6   2023-01-20  F   0   0   0

daupos2t

daupos2t1#

您可以执行两次merge,一次是将日期和M/F合并组合,然后将缺失的组合添加到原始数据中。

out = (df[['Date']]
 .drop_duplicates()
 .merge(pd.Series(['M', 'F'], name='Sex'), how='cross')
 .merge(df, how='left').fillna(0).convert_dtypes()
)

字符串
或者,使用janitorcomplete

import janitor

out = df.complete('Date', {'Sex': ['M', 'F']}, fill_value=0)


输出量:

Date Sex  Value  Target   A
0 2023-01-01   M     11       5  48
1 2023-01-01   F     25       7  20
2 2023-01-10   M     45       6  20
3 2023-01-10   F      5       2  16
4 2023-01-20   M     10       8  30
5 2023-01-20   M      1      18   3
6 2023-01-20   F      0       0   0


另一种选择是构建一个新的框架并运行外部/左侧合并(类似于complete内部所做的):

index = pd.MultiIndex.from_product([df.Date.unique(), 
                                    df.Sex.unique()], 
                                    names = ['Date', 'Sex'])
index = pd.DataFrame([], index = index)
index.merge(df, on=['Date','Sex'],how='left').fillna(0)
        Date Sex  Value  Target     A
0 2023-01-01   M   11.0     5.0  48.0
1 2023-01-01   F   25.0     7.0  20.0
2 2023-01-10   M   45.0     6.0  20.0
3 2023-01-10   F    5.0     2.0  16.0
4 2023-01-20   M   10.0     8.0  30.0
5 2023-01-20   M    1.0    18.0   3.0
6 2023-01-20   F    0.0     0.0   0.0

vsmadaxz

vsmadaxz2#

你可以构造新的索引,用默认的fill_value重新建立索引,然后重置索引:

new_idx = pd.MultiIndex.from_product(
    [df.Date.unique(), df.Sex.unique()], names=["Date", "Sex"]
)
df = df.set_index(["Date", "Sex"]).reindex(new_idx, fill_value=0).reset_index()

print(df)

字符串
打印:

Date Sex  Value  Target   A
0 2023-01-01   M     11       5  48
1 2023-01-01   F     25       7  20
2 2023-01-10   M     45       6  20
3 2023-01-10   F      5       2  16
4 2023-01-20   M     10       8  30
5 2023-01-20   F      0       0   0

atmip9wb

atmip9wb3#

new_idx = pd.MultiIndex.from_product(
[df['Date'].unique(), df['Sex'].unique()], names=["Date", "Sex"]
)
df1 = pd.DataFrame(index=new_idx).reset_index()

df2 = pd.merge(df.set_index(["Date", "Sex"]),df1.set_index(["Date", 
"Sex"]), how="outer", left_index=True, right_index=True).reset_index()
df2.fillna(0)

print(df2)

字符串
打印:

Date    Sex Value   Target  A
0   2023-01-01  F   25.0    7.0 75.0
1   2023-01-01  M   11.0    5.0 90.0
2   2023-01-10  F   5.0     2.0 95.0
3   2023-01-10  M   45.0    6.0 36.0
4   2023-01-20  F   0       0   0
5   2023-01-20  M   10.0    8.0 43.0
6   2023-01-20  M   1.0    18.0  3.0

相关问题