python 使用Selenium读取HTML表的所有值

vohkndzv 于 2023-02-28 发布在 Python

关注(0)|答案(2)|浏览(200)

我尝试读取下面html表中的所有元素并将其转换为 Dataframe ，但所有数值都没有被我的get_attribute函数记录。我也尝试使用.get_attribute('td')，.get_attribute('tr')和.get_attribute('outerHTML')，但仍然得到下面的结果。我尝试使用以下代码

bond_totals_table = driver.find_element(By.XPATH,'/html/body/form[2]/table/tbody/tr/td/table/body').get_attribute('td')
bond_totals_table = pd.read_html(bond_totals_table, flavor = 'bs4')

0   Increment Number    Action  Current Acres   Add Delete  Acres for Calculation   Adjusted Amount Status  Bond?
1   NaN Existing Modify New Closed Reactivate Reconcile NaN NaN NaN NaN NaN ACT INA PH1 PH2 PH3 TRM Yes No
2   NaN Existing Modify New Closed Reactivate Reconcile NaN NaN NaN NaN NaN ACT INA PH1 PH2 PH3 TRM Yes No
3   NaN Existing Modify New Closed Reactivate Reconcile NaN NaN NaN NaN NaN ACT INA PH1 PH2 PH3 TRM Yes No
4   NaN Existing Modify New Closed Reactivate Reconcile NaN NaN NaN NaN NaN ACT INA PH1 PH2 PH3 TRM Yes No
5   NaN Existing Modify New Closed Reactivate Reconcile NaN NaN NaN NaN NaN ACT INA PH1 PH2 PH3 TRM Yes No
6   NaN Existing Modify New Closed Reactivate Reconcile NaN NaN NaN NaN NaN ACT INA PH1 PH2 PH3 TRM Yes No
7   NaN Existing Modify New Closed Reactivate Reconcile NaN NaN NaN NaN NaN ACT INA PH1 PH2 PH3 TRM Yes No

看起来表格曾经是可调的，但现在不是了，get attribute函数不知何故没有得到灰色单元格中的显示值。

python

来源：https://stackoverflow.com/questions/75583619/read-all-values-of-html-table-with-selenium

2条答案

按热度按时间

b1uwtaje1#

要读取HTML表的所有值，您需要将visibility_of_element_located()的诱导WebDriverWait的 * <table> * 元素作为目标，并提取 * outerHTML *，如下所示：

import pandas as pd

bond_totals_table_data = WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.XPATH, "//html/body/form[2]/table/tbody/tr/td/table"))).get_attribute('outerHTML')
bond_totals_table = pd.read_html(bond_totals_table_data)
print(bond_totals_table)

参考文献

您可以在以下内容中找到一些相关的详细讨论：

赞(0）回复(0）举报 2023-02-28

6pp0gazn2#

你可以使用Beautiful Soup w/ Panda，下面是一个从CDC表格中阅读的例子：

with webdriver.Firefox() as browser:
    browser.get("https://www.cdc.gov/nchs/nhis/shs/tables.htm")
    html = browser.page_source
    soup = BeautifulSoup(html, "html.parser")
    tbl = soup.select_one("#example")
    df = pd.read_html(str(tbl))
    print(df[0])

赞(0）回复(0）举报 2023-02-28

我来回答

python 使用Selenium读取HTML表的所有值

2条答案

参考文献

相关问题

热门标签

最新问答