根据Pandas中的Map找出缺失值

efzxgjgh  于 2022-12-09  发布在  其他
关注(0)|答案(3)|浏览(143)

我有2个数据集:

df_1.head(4)

region  postal_code
Adrar   1000
Broko   5633
Conan   4288
Cymus   7435

df_2.head(4)

Name     Charges   region    postal_code   Revenue    
Lia        HG       Pintol    4522           345
Joss       PX       Inend     7455           142
Amph       CT                 5633           148
Andrew     UY       Liven     9033           147

第二个数据集在“region”列中有许多缺失值......但我们可以使用第一个数据集通过匹配postal_code的值来获得这些缺失值......例如,在df_2的第三行中,“region”列缺失,但通过将其相应的postal_code与df_1进行匹配,我们可以找到它的区域为“Broko”......有人能建议如何对其进行编码吗

vyswwuz2

vyswwuz21#

您可以使用布尔索引和map

m = df2['region'].isna()
df2.loc[m, 'region'] = (df2.loc[m, 'postal_code']
                           .map(df1.set_index('postal_code')['region'])
                        )

另一种效率较低的方法可能是:

df2['region'] = (df2['region']
                 .fillna(df2['postal_code']
                         .map(df1.set_index('postal_code')['region']))
                )

# or in place
df2['region'].update(df2['postal_code']
                     .map(df1.set_index('postal_code')['region']))

输出量:

Name Charges  region  postal_code  Revenue
0     Lia      HG  Pintol         4522      345
1    Joss      PX   Inend         7455      142
2    Amph      CT   Broko         5633      148
3  Andrew      UY   Liven         9033      147
z9smfwbn

z9smfwbn2#

示例

data1 = {'region': {0: 'Adrar', 1: 'Broko', 2: 'Conan', 3: 'Cymus'},
         'postal_code': {0: 1000, 1: 5633, 2: 4288, 3: 7435}}
data2 = {'Name': {0: 'Lia', 1: 'Joss', 2: 'Amph', 3: 'Andrew'},
         'Charges': {0: 'HG', 1: 'PX', 2: 'CT', 3: 'UY'},
         'region': {0: 'Pintol', 1: 'Inend', 2: None, 3: 'Liven'},
         'postal_code': {0: 4522, 1: 7455, 2: 5633, 3: 9033},
         'Revenue': {0: 345, 1: 142, 2: 148, 3: 147}}
df1 = pd.DataFrame(data1)
df2 = pd.DataFrame(data2)

代码

使用mapfillna

mapper = df1.set_index('postal_code')['region']
df2.assign(region=df2['region'].fillna(df2['postal_code'].map(mapper)))

实验结果:

Name    Charges region  postal_code Revenue
0   Lia     HG      Pintol  4522        345
1   Joss    PX      Inend   7455        142
2   Amph    CT      Broko   5633        148
3   Andrew  UY      Liven   9033        147
bq3bfh9z

bq3bfh9z3#

试试这个:

mapper = dict(df1.values[:, ::-1])
df2['region'] = df2['region'].combine_first(df2['postal_code'].replace(mapper))

相关问题