pandas 根据单独行中的值创建列

eiee3dmh  于 2023-02-11  发布在  其他
关注(0)|答案(2)|浏览(97)

下面是我的 Dataframe 示例:

df = pd.DataFrame([['Arsenal FC', '', ''],
                   ['In', 'Age', 'Nat.'],
                   ['Leandro Trossard', 28, 'Belgium'],
                   ['Jakub Kiwior', 22, 'Poland'],
                   ['Jorginho', 32, 'Italy'],
                   ['Chelsea FC', '', ''],
                   ['In', 'Age', 'Nat.'],
                   ['Enzo Fernández ', 22, 'Argentina'],
                   ['Mykhaylo Mudryk', 22, 'Ukraine'],
                  ], columns=['Player', 'Age', 'Nat.'])

我想创建一个新的列"Club",它接受"Player"中单元格的字符串值,并将其附加到下面的播放器。
棘手的部分是将正确的clubs分配给正确的players
这是我想要的输出:

df = pd.DataFrame([['In', 'Age', 'Nat.'],
                   ['Leandro Trossard', 28, 'Belgium', 'Arsenal FC'],
                   ['Jakub Kiwior', 22, 'Poland', 'Arsenal FC'],
                   ['Jorginho', 32, 'Italy', 'Arsenal FC'],
                   ['In', 'Age', 'Nat.'],
                   ['Enzo Fernández ', 22, 'Argentina', 'Chelsea FC'],
                   ['Mykhaylo Mudryk', 22, 'Ukraine', 'Chelsea FC'],
                  ], columns=['Player', 'Age', 'Nat.', 'Club'])

我找不到与这个问题相关的其他问题了。这在python中可能吗?

atmip9wb

atmip9wb1#

一个选项是将布尔掩码用于maskffill

# which rows are empty string on Age?
m1 = df['Age'].ne('')
# which row are not internal headers?
m2 = df['Player'].ne('Player')

out = df[m1&m2].assign(Club=df.loc[m2, 'Player'].mask(m1).ffill())

输出:

Player Age       Nat.        Club
2  Leandro Trossard  28    Belgium  Arsenal FC
3      Jakub Kiwior  22     Poland  Arsenal FC
4          Jorginho  32      Italy  Arsenal FC
7   Enzo Fernández   22  Argentina  Chelsea FC
8   Mykhaylo Mudryk  22    Ukraine  Chelsea FC

中间体:

Player     m1        mask       ffill
0        Arsenal FC  False  Arsenal FC  Arsenal FC
2  Leandro Trossard   True         NaN  Arsenal FC
3      Jakub Kiwior   True         NaN  Arsenal FC
4          Jorginho   True         NaN  Arsenal FC
5        Chelsea FC  False  Chelsea FC  Chelsea FC
7   Enzo Fernández    True         NaN  Chelsea FC
8   Mykhaylo Mudryk   True         NaN  Chelsea FC
保留In/Age/Nat行
# which rows are empty string on Age?
m1 = df['Age'].ne('')
# which row are not internal headers?
m2 = df['Player'].ne('In')

out = df[m1].assign(Club=df.loc[m2, 'Player'].mask(m1).ffill())

输出:

Player  Age       Nat.        Club
1                In  Age       Nat.         NaN
2  Leandro Trossard   28    Belgium  Arsenal FC
3      Jakub Kiwior   22     Poland  Arsenal FC
4          Jorginho   32      Italy  Arsenal FC
6                In  Age       Nat.         NaN
7   Enzo Fernández    22  Argentina  Chelsea FC
8   Mykhaylo Mudryk   22    Ukraine  Chelsea FC
vdzxcuhz

vdzxcuhz2#

编辑:

df = pd.DataFrame([['Arsenal FC', '', ''],
                   ['In', 'Age', 'Nat.'],
                   ['Leandro Trossard', 28, 'Belgium'],
                   ['Jakub Kiwior', 22, 'Poland'],
                   ['Jorginho', 32, 'Italy'],
                   ['Chelsea FC', '', ''],
                   ['In', 'Age', 'Nat.'],
                   ['Enzo Fernández ', 22, 'Argentina'],
                   ['Mykhaylo Mudryk', 22, 'Ukraine'],
                  ], columns=['Player', 'Age', 'Nat.'])

clubs = []
current_club = None
for i, row in df.iterrows():
    if row['Player'] in ['Arsenal FC', 'Chelsea FC']:
        current_club = row['Player']
    elif row['Player'] == 'In':
        continue
    else:
        clubs.append(current_club)

df['Club'] = clubs

print(df)

输出:

Player Age     Nat.       Club
0      Arsenal FC        Arsenal FC
1             In   Age     Nat.      NaN
2  Leandro Trossard  28   Belgium  Arsenal FC
3     Jakub Kiwior  22   Poland  Arsenal FC
4         Jorginho  32    Italy  Arsenal FC
5       Chelsea FC        Chelsea FC
6             In   Age     Nat.      NaN
7  Enzo Fernández   22 Argentina  Chelsea FC
8  Mykhaylo Mudryk  22   Ukraine  Chelsea FC

编辑2:多个俱乐部名称

clubs = ['Arsenal FC', 'Chelsea FC', 'Other Club 1', 'Other Club 2', ..., 'Other Club n']

def get_club(row, clubs):
    if row['Player'] in clubs:
        return row['Player']
    else:
        return ''

df['Club'] = ''
club = ''
for index, row in df.iterrows():
    if row['Player'] in clubs:
        club = row['Player']
    else:
        df.at[index, 'Club'] = club

df = df[df['Club'] != ''].reset_index(drop=True)
df['Club'] = df.apply(lambda x: get_club(x, clubs), axis=1)

相关问题