Python selenium :如何从span类获取文本

8zzbczxx  于 2022-12-13  发布在  Python
关注(0)|答案(2)|浏览(240)

我想使用Selenium webdriver从span类获取文本下面是我尝试过但没有结果的代码
输出应:商城列表- 72
代码试验:

driver.find_element(By.XPATH,"//h2[@class='jxuftiz4 jwegzro5 hl4rid49 icdlwmnq gvxzyvdx aeinzg81']//span").get_attribute("innerText")

下面是html

<div class="bdao358l om3e55n1 g4tp4svg alzwoclg cqf1kptm jez8cy9q gvxzyvdx i0rxk2l3 laatuukc gjezrb0y abh4ulrg">
    <div class="om3e55n1 g4tp4svg bdao358l alzwoclg cqf1kptm jez8cy9q gvxzyvdx">
        <div class="bdao358l om3e55n1 g4tp4svg alzwoclg cqf1kptm cgu29s5g dnr7xe2t">
            <div class="bdao358l om3e55n1 g4tp4svg alzwoclg cqf1kptm jez8cy9q gvxzyvdx r227ecj6 gt60zsk1">
                <div class="bdao358l om3e55n1 g4tp4svg tccefgj0 ebnioo9u m733yx0p"></div>
            </div>
            <div class="bdao358l om3e55n1 g4tp4svg alzwoclg cqf1kptm jez8cy9q gvxzyvdx r227ecj6 gt60zsk1">
                <div class="om3e55n1 g4tp4svg bdao358l alzwoclg cqf1kptm jez8cy9q gvxzyvdx o9wcebwi h6ft4zvz">
                    <div class="bdao358l om3e55n1 g4tp4svg alzwoclg cqf1kptm cgu29s5g dnr7xe2t">
                        <div class="bdao358l om3e55n1 g4tp4svg alzwoclg cqf1kptm jez8cy9q gvxzyvdx">
                            <div class="alzwoclg cqf1kptm siwo0mpr gu5uzgus">
                                <div class="jroqu855 nthtkgg5">
                                    <h2 class="jxuftiz4 jwegzro5 hl4rid49 icdlwmnq gvxzyvdx aeinzg81" dir="auto">
                                        <span class="gvxzyvdx aeinzg81 t7p7dqev gh25dzvf exr7barw b6ax4al1 gem102v4 ncib64c9 mrvwc6qr sx8pxkcf f597kf1v cpcgwwas m2nijcs8 hxfwr5lz hpj0pwwo sggt6rq5 innypi6y pbevjfx6 ztn2w49o" dir="auto">
                                            <span class="b6ax4al1 lq84ybu9 hf30pyar om3e55n1 tr46kb4q" style="-webkit-box-orient: vertical; -webkit-line-clamp: 2; display: -webkit-box;">Marketplace listings – 72</span>
                                        </span>
                                    </h2>
                                </div>
                            </div>
                        </div>
                    </div>
                </div>
            </div>
        </div>
    </div>
</div>
wgxvkvu9

wgxvkvu91#

请尝试改用以下xpath表达式:

driver.find_element(By.XPATH,"//span[@dir='auto']/span").get_attribute("innerText")

我用HTML示例测试了该表达式,并确认它可以正常工作。
输出

'Marketplace listings – 72'
oalqel3c

oalqel3c2#

考虑给定HTML中的有效部分:

<h2 class="jxuftiz4 jwegzro5 hl4rid49 icdlwmnq gvxzyvdx aeinzg81" dir="auto">
    <span class="gvxzyvdx aeinzg81 t7p7dqev gh25dzvf exr7barw b6ax4al1 gem102v4 ncib64c9 mrvwc6qr sx8pxkcf f597kf1v cpcgwwas m2nijcs8 hxfwr5lz hpj0pwwo sggt6rq5 innypi6y pbevjfx6 ztn2w49o" dir="auto">
        <span class="b6ax4al1 lq84ybu9 hf30pyar om3e55n1 tr46kb4q" style="-webkit-box-orient: vertical; -webkit-line-clamp: 2; display: -webkit-box;">Marketplace listings – 72</span>
    </span>
</h2>

要提取文本***Marketplace listings - 72***,理想情况下需要为visibility_of_element_located()导出WebDriverWait,并且可以使用以下locator strategies之一:

  • 使用 CSS_SELECTORtext 属性:
print(WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.CSS_SELECTOR, "h2[dir="auto"] span[dir="auto"] > span[style*='webkit-box-orient']"))).text)
  • 使用 XPATHget_attribute("innerHTML")
print(WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.XPATH, "//h2[@dir="auto"]/span[@dir="auto"]/span[contains(@style, 'webkit-box-orient') and text()]"))).get_attribute("innerHTML"))

*注意:必须添加以下导入:

from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC

您可以在How to retrieve the text of a WebElement using Selenium - Python中找到相关讨论

相关问题