Python Selenium从main中选择所有的href< div>

kninwzqo  于 2023-01-26  发布在  Python
关注(0)|答案(3)|浏览(113)

我目前正在尝试从以下网页结构中获取href

<div style="something> # THIS IS THE MAIN DIV I CAN GET
    <div class="aegieogji"> # First ROW sub-div under the main div
        <div class="aegegaegeg"> # SUB-SUB-DIV
            <a class=egaiegeigaegeigaegge", href="link_I_need">Text</a> # First HREF
        <div class="eagegeg"> # SUB-SUB-DIV
            <a class=egaegegaegaeg", href="link_I_need">Text</a> # Second HREF
        <div class="agaeheahrhrahrhr"> # SUB-SUB-DIV
            <a class=arhrharhrahrah", href="link_I_need">Text</a> # Third HREF

    <div class="argagragragaw"> # Second ROW subdiv under the main div
        <div class="aarhrahrah"> # SUB=SUB-DIV
            <a class=arhahrhahr", href="link_I_need">Text</a> # First HREF
        <div class="ahrrahrae"> # SUB-SUB-DIV
            <a class=eagregargreg", href="link_I_need">Text</a> # Second HREF
        <div class="ergrgegaegr"> # SUB-SUB-DIV
            <a class=aegaegregrege", href="link_I_need">Text</a> # Third HREF
        ...
        ...
</div>

使用Python Selenium和ChromeDriver,我可以读取主div "something"

main_elem = browser.find_element(By.XPATH, "/html/body/div[2]/div/div/div/div[1]/div/div/div/div[1]/div[1]/div[2]/section/main/article/div[2]/div/div[1]")

现在,我正在努力使用正确的Selenium来获取href下所有sub-sub-div的所有链接。
你知不知道我怎么能轻易得到这些?谢谢
PS:我可以看到第一个sub-sub-div有以下xpath:

/html/body/div[2]/div/div/div/div[1]/div/div/div/div[1]/div[1]/div[2]/section/main/article/div[2]/div/div[1]/div[1]

然后是第二个:

/html/body/div[2]/div/div/div/div[1]/div/div/div/div[1]/div[1]/div[2]/section/main/article/div[2]/div/div[1]/div[2]

依此类推,而第二行子-子-div xpath是:

/html/body/div[2]/div/div/div/div[1]/div/div/div/div[1]/div[1]/div[2]/section/main/article/div[2]/div/div[2]/div[1]

所以是div[2]而不是div[1]等等。

wixjitnu

wixjitnu1#

有了主(父)元素后,就可以获取包含href属性的所有子元素并获取它们的值,如下所示:

children = main_elem.find_elements(By.XPATH, ".//a[href]")
for child in children:
    href = child.get_attribute("href")
    print(href)
s5a0g9ez

s5a0g9ez2#

要提取所有 * href * 属性的值,必须为visibility_of_all_elements_located()引入WebDriverWait,并且可以使用以下locator strategies之一:

  • 使用 * CSS选择器 *:
print([my_elem.get_attribute("href") for my_elem in WebDriverWait(driver, 20).until(EC.visibility_of_all_elements_located((By.CSS_SELECTOR, "div[style='something'] div div>a")))])
  • 使用 * XPATH *:
print([my_elem.get_attribute("href") for my_elem in WebDriverWait(driver, 20).until(EC.visibility_of_all_elements_located((By.XPATH, "//div[@style='something']//div//div/a")))])
      • 注意**:您必须添加以下导入:
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
kb5ga3dv

kb5ga3dv3#

非常感谢你的帮助。我合并了两条评论,找到了我的案例的解决方案:

# read the main DIV with XPATH 
...
# read all the sub-divs
link_elems = element.find_elements(By.XPATH, './/div//div//div/a')
# retrieve the href
for link_elem in link_elems:
    sub_div = link_elem.find_elements(By.XPATH, '//a[starts-with(@href, "/p/")]')
    for sub in sub_div:
        post_href = sub.get_attribute("href")

相关问题