从pandas列创建一个以另一列的值为条件的字典

cxfofazt  于 2024-01-04  发布在  其他
关注(0)|答案(4)|浏览(100)

我有一个网球数据集,看起来像下面这样:

tourney_id = ['French Open 2018','French Open 2018','Wimbledon 2018','Wimbledon 2018','Australian Open 2019','Australian Open 2019','US Open 2019','US Open 2019']
player_name = ['Novak Djokovic','Roger Federer','Andy Murray','Rafael Nadal','John Isner','Novak Djokovic','Andy Murray','Roger Federer']
match_num = [103, 103, 217, 217, 104, 104, 243, 243]

df = pd.DataFrame(list(zip(tourney_id, player_name, match_num)),
            columns =['TournamentID','Name','MatchID'])

字符串
我想创建一个字典,其中键是球员,项目也是球员(对手)。所以基于我的数据集,它看起来像下面这样:

{'Novak Djokovic': ['Roger Federer','John Isner'],
 'Roger Federer': ['Novak Djokovic','Andy Murray'],
 'Andy Murray': ['Rafael Nadal','Roger Federer'],
 'Rafael Nadal': ['Andy Murray'],
 'John Isner': ['Novak Djokovic']}


我想找出那些在TournamentID和MatchID上有相同值的球员。
我最后尝试的是:df.set_index(['TournamentID','MatchID'])['Name'].to_dict(),但这不是我想要的。
有没有人能帮我指出正确的方向?
谢谢你,谢谢你

jdzmm42g

jdzmm42g1#

使用set操作:

out = {}

for _, g in df.groupby(['TournamentID', 'MatchID'])['Name']:
    for n in g:
        out.setdefault(n, set()).update(set(g)-{n})

字符串
输出量:

{'Novak Djokovic': {'John Isner', 'Roger Federer'},
 'Roger Federer': {'Andy Murray', 'Novak Djokovic'},
 'John Isner': {'Novak Djokovic'},
 'Andy Murray': {'Rafael Nadal', 'Roger Federer'},
 'Rafael Nadal': {'Andy Murray'}}


你也可以使用networkx来计算每个玩家的对手的图,然后遍历节点以获得直接的neighbors

import networkx as nx

G = nx.compose_all(nx.complete_graph(set(g)) for _, g in
                   df.groupby(['TournamentID', 'MatchID'])['Name'])

out = {n: list(nx.neighbors(G, n)) for n in G}


输出量:

{'Novak Djokovic': ['Roger Federer', 'John Isner'],
 'Roger Federer': ['Novak Djokovic', 'Andy Murray'],
 'John Isner': ['Novak Djokovic'],
 'Andy Murray': ['Rafael Nadal', 'Roger Federer'],
 'Rafael Nadal': ['Andy Murray']}


图表说明:


的数据

5f0d552i

5f0d552i2#

以下是您的操作方法:

import pandas as pd

tourney_id = ['French Open 2018','French Open 2018','Wimbledon 2018','Wimbledon 2018','Australian Open 2019','Australian Open 2019','US Open 2019','US Open 2019']
player_name = ['Novak Djokovic','Roger Federer','Andy Murray','Rafael Nadal','John Isner','Novak Djokovic','Andy Murray','Roger Federer']
match_num = [103, 103, 217, 217, 104, 104, 243, 243]

df = pd.DataFrame(list(zip(tourney_id, player_name, match_num)),
                columns=['TournamentID', 'Name', 'MatchID'])

# Create a DataFrame with pairs of players in each match
df_pairs = pd.merge(df, df, how='inner', on=['TournamentID', 'MatchID'])
df_pairs = df_pairs[df_pairs['Name_x'] != df_pairs['Name_y']]  # Remove rows where a player is paired with themselves

# Group by each player and aggregate the opponents into a list
opponents_dict = df_pairs.groupby('Name_x')['Name_y'].agg(list).to_dict()

print(opponents_dict)

字符串
输出量:

{
  "Andy Murray": [
    "Rafael Nadal",
    "Roger Federer"
  ],
  "John Isner": [
    "Novak Djokovic"
  ],
  "Novak Djokovic": [
    "Roger Federer",
    "John Isner"
  ],
  "Rafael Nadal": [
    "Andy Murray"
  ],
  "Roger Federer": [
    "Novak Djokovic",
    "Andy Murray"
  ]
}

jdzmm42g

jdzmm42g3#

使用collections.defaultdict对象(在同一场比赛中累积球员):

from collections import defaultdict

plays = defaultdict(list)
for _, (n1, n2) in df.groupby(['TournamentID','MatchID'])['Name']:
    plays[n1].append(n2)
    plays[n2].append(n1)

plays = dict(plays)
print(plays)

个字符

5lhxktic

5lhxktic4#

opponents_dict = {key: group['Name'].tolist() 
for key, group in df.groupby(['TournamentID', 'MatchID'])['Name'] 
    if len(group) > 1}
        print(opponents_dict)

字符串

相关问题