使用Selenium Python进行findElement方法网页抓取时出现问题

slmsl1lt  于 2023-02-12  发布在  Python
关注(0)|答案(2)|浏览(175)

我是一名建筑修复专业的学生,我正在学习刮擦。我正在从西班牙的教堂收集数据。为此,我正在与Catastro网站合作。我正在收集数据,但我在获取图像的src时遇到了麻烦。
接下来,我把我创建的代码的一部分抛出一个错误在# Get the URL of the image part.当我从浏览器手动访问时,如果我能找到图像,但我找不到方法用Selenium.这可能是因为元素在嵌套的::before中?

import requests
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from bs4 import BeautifulSoup

# Start a webdriver session using Firefox
driver = webdriver.Firefox()

# Go to the website
driver.get("https://www1.sedecatastro.gob.es/Cartografia/mapa.aspx?refcat=9271101WJ9197A&from=OVCBusqueda&pest=rc&final=&RCCompleta=9271101WJ9197A0001BR&ZV=NO&ZR=NO&anyoZV=&tematicos=&anyotem=&del=2&mun=900")

# Wait until the map element is present and click on its center
map_element = WebDriverWait(driver, 10).until(
    EC.presence_of_element_located((By.XPATH, '//*[@id="map"]'))
)
driver.execute_script("arguments[0].scrollIntoView(true);", map_element)
map_element.click()

# Get the URL of the image
img_element = driver.find_element_by_xpath('//*[@id="ImgFachada0"]')

# Get the src attribute of the image element
img_src = img_element.get_attribute("src")

# Print the src of the image
print(img_src)
jyztefdp

jyztefdp1#

在执行以下代码之前,您需要首先处理一个帧:

# Get the URL of the image
img_element = driver.find_element_by_xpath('//*[@id="ImgFachada0"]')

解决方案:-使用以下代码切换到框架,然后执行其他操作

driver.switch_to.frame(driver.find_element(By.XPATH,"//div[@class='modal-content']//iframe"))

供您参考的完整工作代码:

driver = webdriver.Chrome()
driver.maximize_window()
driver.implicitly_wait(20)
driver.get("https://www1.sedecatastro.gob.es/Cartografia/mapa.aspx?refcat=9271101WJ9197A&from=OVCBusqueda&pest=rc&final=&RCCompleta=9271101WJ9197A0001BR&ZV=NO&ZR=NO&anyoZV=&tematicos=&anyotem=&del=2&mun=900")
element = WebDriverWait(driver, 10).until(EC.presence_of_element_located((By.XPATH, "//*[@id='map']"))).click()
time.sleep(3)
driver.switch_to.frame(driver.find_element(By.XPATH,"//div[@class='modal-content']//iframe"))
img_element = WebDriverWait(driver, 10).until(EC.presence_of_element_located((By.XPATH, "//*[@id='ImgFachada0']")))
img_src = img_element.get_attribute("src")
print(img_src)

控制台输出:

https://www1.sedecatastro.gob.es/Cartografia/FXCC/FotoFachada.aspx?refcat=9271101WJ9197A0001BR&del=2&mun=900&from=OVCListaBienes&captcha=bf9e5588d83361af1bffe7521e86dd68ea6a3f0b

Process finished with exit code 0

在iframe上执行操作后,不要忘记切换回主页:

#To switch back from iframe
driver.switch_to.default_content()

**HTML中的iframe供您参考:**x1c 0d1x

jvidinwx

jvidinwx2#

所需的<img>元素位于<iframe>

溶液
要提取 * src * 属性的值,您必须:

driver.get('https://www1.sedecatastro.gob.es/Cartografia/mapa.aspx?refcat=9271101WJ9197A&from=OVCBusqueda&pest=rc&final=&RCCompleta=9271101WJ9197A0001BR&ZV=NO&ZR=NO&anyoZV=&tematicos=&anyotem=&del=2&mun=900')
WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.CSS_SELECTOR, "button.btn.btn-sm.btn-sec-inverted"))).click()
map_element = WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.XPATH, "//div[@id='map']")))
driver.execute_script("arguments[0].scrollIntoView(true);", map_element)
map_element.click()
WebDriverWait(driver, 20).until(EC.frame_to_be_available_and_switch_to_it((By.XPATH,"//iframe[contains(@src, 'OVCListaBienes')]")))
print(WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.XPATH, "//img[@id='ImgFachada0']"))).get_attribute("src"))
driver.quit()
      • 注意**:您必须添加以下导入:
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
  • 控制台输出:
https://www1.sedecatastro.gob.es/Cartografia/FXCC/FotoFachada.aspx?refcat=9271101WJ9197A0001BR&del=2&mun=900&from=OVCListaBienes&captcha=8a799d3f10ec7a9ec8f6937d450581bd75d2b750

参考

您可以在以下位置找到一些相关讨论:

  • 通过Selenium和python切换到iframe
  • selenium.common.exceptions.NoSuchElementException:消息:没有此元素:尝试使用selenium单击"下一步"按钮时无法定位元素
  • python中 selenium :无此类元素异常:消息:没有此元素:找不到元素

相关问题