pandas 按组保留两列值匹配的行,否则按组保留第一行

cld4siwp  于 2022-11-20  发布在  其他
关注(0)|答案(1)|浏览(160)

我想在df 1上左连接df 2,然后按组保留匹配的行,如果没有匹配的组,那么我想保留组的第一行,以获得df 3(所需的结果)。我希望你们能帮助我找到最佳解决方案。
下面是我创建两个 Dataframe 的代码和所需的结果。

import pandas as pd
import numpy as np

market = ['SP', 'SP', 'SP']
underlying = ['TSLA', 'GOOG', 'MSFT']
  
# DF1
df = pd.DataFrame(list(zip(market, underlying)),
               columns=['market', 'underlying'])

market2 = ['SP', 'SP', 'SP', 'SP', 'SP']
underlying2 = [None, 'TSLA', 'GBX', 'GBM', 'GBS']
client2 = [17, 12, 100, 21, 10]

# DF2
df2 = pd.DataFrame(list(zip(market2, underlying2, client2)),
               columns=['market', 'underlying', 'client'])

market3 = ['SP', 'SP', 'SP']
underlying3 = ['TSLA', 'GOOG', 'MSFT']
client3 = [12, 17, 17]

# Desired
df3 = pd.DataFrame(list(zip(market3, underlying3, client3)),
               columns =['market', 'underlying', 'client'])

# This works but feels sub optimal
df3 = pd.merge(df, 
               df2,
               how='left',
               on=['market', 'underlying'])
df3 = pd.merge(df3, 
               df2,
               how='left',
               on=['market'])
df3 = df3.drop_duplicates(['market', 'underlying_x'])
df3['client'] = df3['client_x'].combine_first(df3['client_y'])
df3 = df3.drop(labels=['underlying_y', 'client_x', 'client_y'], axis=1)
df3 = df3.rename(columns={'underlying_x': 'underlying'})

希望你们能帮忙,谢谢!

ht4b089n

ht4b089n1#

存储第一个值(如果market中的每个值都是“SP”,则可能不需要groupby),合并并填充第一个值:

fill_value = df2.groupby('market').client.first()

# if you are interested in filtering for None:
fill_value = df2.set_index('market').loc[lambda df: df.underlying.isna(), 'client']

(df
.merge(
    df2, 
    on = ['market', 'underlying'], 
    how = 'left')
.set_index('market')
.fillna({'client':fill_value}, downcast='infer')
)

       underlying  client
market                   
SP           TSLA      12
SP           GOOG      17
SP           MSFT      17

相关问题