pandas 如何将value_counts()输出转换为 Dataframe ?

x8goxv8g  于 2023-04-28  发布在  其他
关注(0)|答案(3)|浏览(212)

我有一个这样的dataframe:

match       team1       team2       winner
1            MI           KKR        MI
2            DD           CSK        DD
3            RCB          DC         RCB.....

我想计算的是在锦标赛中一支球队赢了另一支球队多少次。比如MI vs KKR:
MI:10
KKR:5
所以我写了一个这样的函数:

def comparator(team1):
    mt1=matches[((matches['team1']==team1)|(matches['team2']==team1))]
    teams=['MI','KKR','RCB','DC','CSK','RR','DD','GL','KXIP','SRH','RPS','KTK','PW']
    teams.remove(team1)
    opponents=teams.copy()
    for i in opponents:
        mt2=mt1[(((mt1['team1']==i)|(mt1['team2']==i)))&((mt1['team1']==team1)|(mt1['team2']==team1))].winner.value_counts()
        print(mt2)
comparator('MI')

现在在函数中,mt2打印出team1和team2各自获胜的正确值。输出如下:

MI     13
KKR     5
Name: winner, dtype: int64
MI     11
RCB     8
Name: winner, dtype: int64

现在输出是正确的,但格式不合适。我想将下面的输出转换为 Dataframe 。
我尝试将值追加到列表中,但它不起作用,因为行名称:赢家,dtype:int64也被添加到列表中。
如何将其转换为dataframe?

zte4gxcn

zte4gxcn1#

我认为你需要:
如果需要索引作为列,则添加Series.reset_index

mask = (((mt1['team1']==i)|(mt1['team2']==i)))&((mt1['team1']==team1)|(mt1['team2']==team1))
mt2 = mt1.loc[mask, 'winner'].value_counts().reset_index()

或者如果需要将Series转换为一列DataFrame,则添加Series.to_frame

mask = (((mt1['team1']==i)|(mt1['team2']==i)))&((mt1['team1']==team1)|(mt1['team2']==team1))
mt2 = mt1.loc[mask, 'winner'].value_counts().to_frame()

另外,最好使用locboolean mask并定义列。

w80xi6nr

w80xi6nr2#

您可以稍微简化搜索,或者无论如何使其更具可读性

def my_comp(df, team):
    matches_with_team = df[(df[['team1', 'team2']] == team).any(axis=1)]
    other_teams = (set(matches_with_team['team1']) ^ set(matches_with_team['team2'])) - {team}
    comparison_df = pd.DataFrame(index=other_teams, columns=['wins', 'losses'])
    comparison_df.index.name = 'opponent'
    for opponent in other_teams:
        matches_against_opponents = matches_with_team[(matches_with_team[['team1', 'team2']] == opponent).any(axis=1)]
        winners = matches_against_opponents['winner'].value_counts().reindex([team, opponent])
        # print(winners)
        comparison_df.loc[opponent] = [winners[team], winners[opponent]]
    return comparison_df.fillna(0).astype(int)

my_comp(df, 'MI')

wins    losses
opponent        
KKR     1.0     0

现在您可以创建一个巨大的DataFrame来覆盖所有结果

all_teams = sorted(set(df['team1']) ^ set(df['team2']))

所有团队

['CSK', 'DC', 'DD', 'KKR', 'MI', 'RCB']

使用此输入运行时:

team1   team2   winner
match           
1   MI      KKR     MI
2   DD      CSK     DD
3   RCB     DC      RCB
4   RCB     CSK     RCB

pd.concat((my_comp(df, team) for team in teams), keys=teams).groupby(level=[0, 1]).sum()

wins    losses
    opponent        
CSK     DD      0       1
        RCB     0       1
DC      RCB     0       1
DD      CSK     1       0
KKR     MI      0       1
MI      KKR     1       0
RCB     CSK     1       0
        DC      1       0
xuo3flqw

xuo3flqw3#

虽然不完全是你的答案,但将valu_counts转换为 Dataframe 的方法也是我的问题。我发现最简单的方法是“

pd.DataFrame(df['class'].value_counts()).reset_index()

相关问题