如何将表格webscrap到Excel电子表格的两个不同工作表上？

pinkon5k 于 2023-04-07 发布在其他

关注(0)|答案(1)|浏览(142)

这是我从这两个链接中抓取表的代码。它不会崩溃。“https://racing.hkjc.com/racing/information/English/Jockey/JockeyRanking.aspx““https://racing.hkjc.com/racing/information/English/Trainers/TrainerRanking.aspx“
但是，当我运行它时，两个表似乎相互重叠，并且打印在同一张工作表上而不是不同的工作表上，有什么方法可以解决这个问题吗？

import pandas as pd
from bs4 import BeautifulSoup
from playwright.sync_api import sync_playwright

def scrape_ranking(url, sheet_name):
    with sync_playwright() as pw:
        browser = pw.chromium.launch()
        page = browser.new_page()
        page.goto(url, wait_until="networkidle")
        soup = BeautifulSoup(page.content(), "html.parser")
        table = soup.select_one(".table_bd")

        if table is None:
            print("Table not found.")
        else:
            df = pd.read_html(str(table))[0]
            df.to_excel("hkjc.xlsx", sheet_name=sheet_name, index=True)

# Scrape TrainerRanking page
url_trainer = "https://racing.hkjc.com/racing/information/English/Trainers/TrainerRanking.aspx"
scrape_ranking(url_trainer, "TrainerRanking")

# Scrape JockeyRanking page
url_jockey = "https://racing.hkjc.com/racing/information/English/Jockey/JockeyRanking.aspx"
scrape_ranking(url_jockey, "JockeyRanking")

print("done")

excel

来源：https://stackoverflow.com/questions/75915842/how-do-i-webscrape-tables-onto-two-different-sheets-on-the-excel-spreadsheet

1条答案

按热度按时间

abithluo1#

尝试在append模式下使用ExcelWriter：

df = pd.read_html(str(table))[0]
            with pd.ExcelWriter("hkjc.xlsx", 
                                engine="openpyxl", 
                                mode='a', if_sheet_exists='new') as writer:
                df.to_excel(writer, sheet_name=sheet_name, index=True)

如果你使用if_sheet_exists='replace'，如果已经有一个sheet_name工作表，它将覆盖;if_sheet_exists='overlay'将在这种情况下添加行。

赞(0）回复(0）举报 2023-04-07

我来回答

如何将表格webscrap到Excel电子表格的两个不同工作表上？

1条答案

相关问题

热门标签

最新问答