我目前正在尝试从以下网页结构中获取href
:
<div style="something> # THIS IS THE MAIN DIV I CAN GET
<div class="aegieogji"> # First ROW sub-div under the main div
<div class="aegegaegeg"> # SUB-SUB-DIV
<a class=egaiegeigaegeigaegge", href="link_I_need">Text</a> # First HREF
<div class="eagegeg"> # SUB-SUB-DIV
<a class=egaegegaegaeg", href="link_I_need">Text</a> # Second HREF
<div class="agaeheahrhrahrhr"> # SUB-SUB-DIV
<a class=arhrharhrahrah", href="link_I_need">Text</a> # Third HREF
<div class="argagragragaw"> # Second ROW subdiv under the main div
<div class="aarhrahrah"> # SUB=SUB-DIV
<a class=arhahrhahr", href="link_I_need">Text</a> # First HREF
<div class="ahrrahrae"> # SUB-SUB-DIV
<a class=eagregargreg", href="link_I_need">Text</a> # Second HREF
<div class="ergrgegaegr"> # SUB-SUB-DIV
<a class=aegaegregrege", href="link_I_need">Text</a> # Third HREF
...
...
</div>
使用Python Selenium和ChromeDriver,我可以读取主div "something"
:
main_elem = browser.find_element(By.XPATH, "/html/body/div[2]/div/div/div/div[1]/div/div/div/div[1]/div[1]/div[2]/section/main/article/div[2]/div/div[1]")
现在,我正在努力使用正确的Selenium来获取href
下所有sub-sub-div的所有链接。
你知不知道我怎么能轻易得到这些?谢谢
PS:我可以看到第一个sub-sub-div有以下xpath:
/html/body/div[2]/div/div/div/div[1]/div/div/div/div[1]/div[1]/div[2]/section/main/article/div[2]/div/div[1]/div[1]
然后是第二个:
/html/body/div[2]/div/div/div/div[1]/div/div/div/div[1]/div[1]/div[2]/section/main/article/div[2]/div/div[1]/div[2]
依此类推,而第二行子-子-div xpath
是:
/html/body/div[2]/div/div/div/div[1]/div/div/div/div[1]/div[1]/div[2]/section/main/article/div[2]/div/div[2]/div[1]
所以是div[2]
而不是div[1]
等等。
3条答案
按热度按时间wixjitnu1#
有了主(父)元素后,就可以获取包含
href
属性的所有子元素并获取它们的值,如下所示:s5a0g9ez2#
要提取所有 *
href
* 属性的值,必须为visibility_of_all_elements_located()引入WebDriverWait,并且可以使用以下locator strategies之一:kb5ga3dv3#
非常感谢你的帮助。我合并了两条评论,找到了我的案例的解决方案: