pandas 从数据框列中删除数字

h5qlskok  于 2023-05-27  发布在  其他
关注(0)|答案(1)|浏览(115)

我在一个很好的地方格式化一个dataframe的方式,我想它从大学足球时间表,我抢了从ESPN

year = datetime.today().year-1

url = 'https://www.espn.com/college-football/team/schedule/_/id/213/season/'+str(year)

schedule = pd.read_html(url)[0][[0,1]]
schedule.columns = ["Date", "Opponent"]

remove_list = ["DATE","Regular","Bowl"]

schedule = schedule[~schedule["Date"].str.contains('|'.join(remove_list))].reset_index(drop = True)
schedule['Opponent'] = schedule['Opponent'].str.replace("vs", '').str.replace("*", '').str.replace("@", '')

date_list = (schedule['Date'].str[5:]+', '+str(year))

final_date_list = []
for d in date_list:
  d = datetime.strptime(d, '%b %d, %Y')
  d = datetime.strftime(d, '%Y%d%m')
  final_date_list.append(d)

schedule['Date'] = schedule['Date'].str[5:]+', '+str(year)

schedule['Date'] = pd.DataFrame(final_date_list)

schedule

然而,我想做的就是从当前表中删除数字:

Date    Opponent
0   20210409    12 Wisconsin
1   20211109    Ball State
2   20211809    22 Auburn
3   20212509    Villanova
4   20210210    Indiana
5   20210910    3 Iowa
6   20212310    Illinois
7   20213010    5 Ohio State
8   20210611    Maryland
9   20211311    6 Michigan
10  20212011    Rutgers
11  20212711    12 Michigan State
12  20210101    21 Arkansas
hjqgdpho

hjqgdpho1#

您可以使用str.replace来删除前导数字和空格:

df['Opponent'] = df['Opponent'].str.replace(r'^\d+\s+', '', regex=True)

输出(用于示例数据):

Date        Opponent
0   20210409       Wisconsin
1   20211109      Ball State
2   20211809          Auburn
3   20212509       Villanova
4   20210210         Indiana
5   20210910            Iowa
6   20212310        Illinois
7   20213010      Ohio State
8   20210611        Maryland
9   20211311        Michigan
10  20212011         Rutgers
11  20212711  Michigan State
12  20210101        Arkansas

相关问题