selenium和bs4都无法在页面中找到div

mum43rcc  于 2023-02-12  发布在  其他
关注(0)|答案(1)|浏览(152)

我正在试着抓取Craigslist结果页面,但是bs4和selenium都找不到页面中的元素,尽管我可以使用dev工具检查它们。结果在类为cl-search-result的列表项中,但是返回的soup似乎没有任何结果。
这是我到目前为止的脚本。看起来甚至返回的汤也和我用dev工具检查时看到的html不一样。我期望这个脚本返回42个项目,这是搜索结果的数量。
下面是脚本:

  1. import time
  2. import datetime
  3. from collections import namedtuple
  4. import selenium.webdriver as webdriver
  5. from selenium.webdriver.firefox.service import Service
  6. from selenium.webdriver.support.ui import Select
  7. from selenium.webdriver.firefox.options import Options
  8. from selenium.webdriver.common.by import By
  9. from selenium.webdriver.common.keys import Keys
  10. from selenium.common.exceptions import ElementNotInteractableException
  11. from bs4 import BeautifulSoup
  12. import pandas as pd
  13. import os
  14. user_agent = 'Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:109.0) Gecko/20100101 Firefox/109.0'
  15. firefox_driver_path = os.path.join(os.getcwd(), 'geckodriver.exe')
  16. firefox_service = Service(firefox_driver_path)
  17. firefox_option = Options()
  18. firefox_option.set_preference('general.useragent.override', user_agent)
  19. browser = webdriver.Firefox(service=firefox_service, options=firefox_option)
  20. browser.implicitly_wait(7)
  21. url = 'https://baltimore.craigslist.org/search/sss#search=1~list~0~0'
  22. browser.get(url)
  23. soup = BeautifulSoup(browser.page_source, 'html.parser')
  24. print(soup)
  25. posts_html= soup.find_all('li', {'class': 'cl-search-result'})
  26. print('Collected {0} listings'.format(len(posts_html)))
whlutmcx

whlutmcx1#

下面的代码对我很有效,它可以打印出来:收集了120个列表

  1. url = 'https://baltimore.craigslist.org/search/sss#search=1~list~0~0'
  2. browser = webdriver.Chrome()
  3. browser.get(url)
  4. sleep(3)
  5. soup = BeautifulSoup(browser.page_source, 'html.parser')
  6. posts_html= soup.find_all('li', {'class': 'cl-search-result'})
  7. print('Collected {0} listings'.format(len(posts_html)))

相关问题