我尝试读取下面html表中的所有元素并将其转换为 Dataframe ,但所有数值都没有被我的get_attribute
函数记录。我也尝试使用.get_attribute('td')
,.get_attribute('tr')
和.get_attribute('outerHTML')
,但仍然得到下面的结果。我尝试使用以下代码
bond_totals_table = driver.find_element(By.XPATH,'/html/body/form[2]/table/tbody/tr/td/table/body').get_attribute('td')
bond_totals_table = pd.read_html(bond_totals_table, flavor = 'bs4')
0 Increment Number Action Current Acres Add Delete Acres for Calculation Adjusted Amount Status Bond?
1 NaN Existing Modify New Closed Reactivate Reconcile NaN NaN NaN NaN NaN ACT INA PH1 PH2 PH3 TRM Yes No
2 NaN Existing Modify New Closed Reactivate Reconcile NaN NaN NaN NaN NaN ACT INA PH1 PH2 PH3 TRM Yes No
3 NaN Existing Modify New Closed Reactivate Reconcile NaN NaN NaN NaN NaN ACT INA PH1 PH2 PH3 TRM Yes No
4 NaN Existing Modify New Closed Reactivate Reconcile NaN NaN NaN NaN NaN ACT INA PH1 PH2 PH3 TRM Yes No
5 NaN Existing Modify New Closed Reactivate Reconcile NaN NaN NaN NaN NaN ACT INA PH1 PH2 PH3 TRM Yes No
6 NaN Existing Modify New Closed Reactivate Reconcile NaN NaN NaN NaN NaN ACT INA PH1 PH2 PH3 TRM Yes No
7 NaN Existing Modify New Closed Reactivate Reconcile NaN NaN NaN NaN NaN ACT INA PH1 PH2 PH3 TRM Yes No
看起来表格曾经是可调的,但现在不是了,get attribute
函数不知何故没有得到灰色单元格中的显示值。
2条答案
按热度按时间b1uwtaje1#
要读取HTML表的所有值,您需要将visibility_of_element_located()的诱导WebDriverWait的 *
<table>
* 元素作为目标,并提取 *outerHTML
*,如下所示:参考文献
您可以在以下内容中找到一些相关的详细讨论:
6pp0gazn2#
你可以使用Beautiful Soup w/ Panda,下面是一个从CDC表格中阅读的例子: