pandas 基于另一列值查找两列的最佳组合

uidvcgyl  于 2023-01-28  发布在  其他
关注(0)|答案(2)|浏览(135)

我的 Dataframe 如下所示

index       Client  Manager   Score
0             1        1      0.89
1             1        2      0.78
2             1        3      0.65
3             2        1      0.91
4             2        2      0.77
5             2        3      0.97
6             3        1      0.35
7             3        2      0.61
8             3        3      0.81
9             4        1      0.69
10            4        2      0.22
11            4        3      0.93
12            5        1      0.78
13            5        2      0.55
14            5        3      0.44
15            6        1      0.64
16            6        2      0.99
17            6        3      0.22

预期输出如下所示

index  Client   Manager  Score
0        1        1      0.89
1        2        3      0.97
2        3        2      0.61
3        4        3      0.93
4        5        1      0.78
5        6        2      0.99

我们有3个经理和6个客户。我希望每个经理有2个客户基于最高分。每个经理应该只有唯一的客户,所以如果一个客户是好的两个经理,我们需要采取第二个最好的分数,以此类推。我可以有你的建议吗?谢谢你提前。

von4xj4u

von4xj4u1#

df = df.drop("index", axis=1)
df = df.sort_values("Score").iloc[::-1,:]
df

selected_client = []
selected_manager = []
selected_df = []
iter_rows = df.iterrows()

for i,d in iter_rows:
    client = int(d.to_frame().loc[["Client"],[i]].values[0][0])
    manager = int(d.to_frame().loc[["Manager"],[i]].values[0][0]) 

    if client not in selected_client and selected_manager.count(manager) != 2:            
        selected_client.append(client)
        selected_manager.append(manager)

        selected_df.append(d)

result = pd.concat(selected_df, axis=1, sort=False)
print(result.T)
nbewdwxp

nbewdwxp2#

试试这个:

df = df.sort_values('Score',ascending = False) #sort values to prioritize high scores

d = {i:[] for i in df['Manager']} #create an empty dictionary to fill in the client/manager pairs

n = 2 #set number of clients per manager
for c,m in zip(df['Client'],df['Manager']): #iterate over client and manager pairs
    if len(d.get(m))<n and c not in [c2 for i in d.values() for c2,m2 in i]: #if there are not already two pairs, and if the client has not already been added, append the pair to the list
        d.get(m).append((c,m))
    else:
        pass

ndf = pd.merge(df,pd.DataFrame([k for v in d.values() for k in v],columns = ['Client','Manager'])).sort_values('Client') #filter for just the pairs found above.

输出:

index  Client  Manager  Score
3      0       1        1   0.89
1      5       2        3   0.97
5      7       3        2   0.61
2     11       4        3   0.93
4     12       5        1   0.78
0     16       6        2   0.99

相关问题