Pandas合并和连接未拾取正确值

disho6za 于 2023-02-20 发布在其他

关注(0)|答案(1)|浏览(109)

联接不起作用。示例数据和代码如下。
查找文件：

helper    time  Loc   FUEL        Rep      KM
 0.1|A100|A    0.1   A100  100.00%     -3.93     659
 0.1|A200|A    0.1   A200  100.00%     -4.49     628
 0.22|A100|B   0.22  A100  90.00%      -1.49     511
 ...

导入查找文件后，执行以下命令删除所有空格，因为之前有一个键错误。我猜可能是列中有一些空格问题。

dflookup.columns = dflookup.columns.to_series().apply(lambda x: x.strip())

这里的主文件：

time     user        loc    dist   flightKM  loc2      helper1       
0.1      PilotL1     A100   A      140       A200      0.1|A200|A  
0.22     PilotL2     B100   B      230       A100      0.22|A100|B 
...

期望主df的输出

time    user      loc    dist   flightKM  loc2   helper1      Rep2    FUEL2    
0.1     PilotL1   A100   A      140       A200   0.1|A200|A   -3.93  100%
0.22    PilotL2   B100   B      230       A100   0.22|A100|B  -1.49  90%
...

已尝试SO中提供的一些解决方案。尚未获得修复。目标：使用左、右连接上的辅助列进行匹配，以将两列（Rep、Fuel）从查找添加到dfmain中。

- 问题：**想要一些提示来解决左，连接问题，因为它没有找到所有和正确的值从查找"Rep，FEUL"到dfmain。打开一个快速修复以及提示，以优化代码无论如何，因为这只是一个基本的py脚本与可能的特殊操作。

代码：

dfmain['Loc'] = dfmain['Loc'].str.replace(' ', '')
    #creating a helper1 column in dfmain by concat columns as left, 
    right joins didnot allow a multi column in join operator
    
    dfmain['helper1'] = dfmain[['time', 'loc2', 'dist']].apply(
            lambda x: '|'.join(x.dropna().astype(str)),
            axis=1
        )
    
    #search merge
    dfmain = pd.merge(
    left=dfmain,
    right=dflookup[['helper', 'Rep', 'FUEL']],
    left_on='helper1',
    right_on='helper',
    how='left')

#tidy up
dfmain.rename(columns={'Rep':'Rep2'}, inplace=True)
dfmain.rename(columns={'FUEL':'FUEL2'}, inplace=True)
big_df = big_df.drop(columns=['helper'])

为便于审查：

print("minimum reproducible code and dataset")

dflookup = pd.DataFrame([('falcon', 'bird', 100),
                          ('parrot', 'bird', 50),
                          ('lion', 'mammal', 50),
                          ('monkey', 'mammal', 100)],
                          columns=['type', 'class', 'years'],
                          index=[0, 2, 3, 1])

dfmain = pd.DataFrame([('Target','falcon', 'bird', 389.0),
                          ('Shout','parrot', 'bird', 24.0),
                          ('Roar','lion', 'mammal', 80.5),
                          ('Jump','monkey','mammal', np.nan),
                          ('Sing','parrot','bird', 72.0)],
                          columns=['name','type', 'class', 'max_speed'],
                          index=[0, 2, 3, 1, 2])

pandas

来源：https://stackoverflow.com/questions/75482636/pandas-merge-and-join-not-picking-up-correct-values

1条答案

按热度按时间

u59ebvdq1#

现在找到了一个/解决方案。如果其他人在python开始的时候可能会面临这个问题。
问题：显然dist列只有一个空格，这使得helper1连接标识符返回false。
最糟糕的是，由于未知/不确定的原因，以下命令没有执行任何操作来从原始列dist中删除此空间。

dfmain.columns = dfmain.columns.to_series().apply(lambda x: x.strip())

因此，必须在创建helper1之前使用以下命令执行列独占空间删除。

dfmain['dist'] = dfmain['dist'].str.replace(' ', '')

如果有人知道为什么第一行使用.apply（lambda）...没有像预期的那样删除空格，请在这里作为答案或注解告诉我。

赞(0）回复(0）举报 2023-02-20

我来回答

Pandas合并和连接未拾取正确值

1条答案

相关问题

热门标签

最新问答