selenium和bs4都无法在页面中找到div

mum43rcc 于 2023-02-12 发布在其他

关注(0)|答案(1)|浏览(152)

我正在试着抓取Craigslist结果页面，但是bs4和selenium都找不到页面中的元素，尽管我可以使用dev工具检查它们。结果在类为cl-search-result的列表项中，但是返回的soup似乎没有任何结果。
这是我到目前为止的脚本。看起来甚至返回的汤也和我用dev工具检查时看到的html不一样。我期望这个脚本返回42个项目，这是搜索结果的数量。
下面是脚本：

import time
import datetime
from collections import namedtuple
import selenium.webdriver as webdriver
from selenium.webdriver.firefox.service import Service
from selenium.webdriver.support.ui import Select
from selenium.webdriver.firefox.options import Options
from selenium.webdriver.common.by import By
from selenium.webdriver.common.keys import Keys
from selenium.common.exceptions import ElementNotInteractableException
from bs4 import BeautifulSoup
import pandas as pd
import os
user_agent = 'Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:109.0) Gecko/20100101 Firefox/109.0'
firefox_driver_path = os.path.join(os.getcwd(), 'geckodriver.exe')
firefox_service = Service(firefox_driver_path)
firefox_option = Options()
firefox_option.set_preference('general.useragent.override', user_agent)
browser = webdriver.Firefox(service=firefox_service, options=firefox_option)
browser.implicitly_wait(7)
url = 'https://baltimore.craigslist.org/search/sss#search=1~list~0~0'
browser.get(url)
soup = BeautifulSoup(browser.page_source, 'html.parser') 
print(soup)
posts_html= soup.find_all('li', {'class': 'cl-search-result'})
   
print('Collected {0} listings'.format(len(posts_html)))

selenium

来源：https://stackoverflow.com/questions/75393827/both-selenium-and-bs4-cannot-find-div-in-page

1条答案

按热度按时间

whlutmcx1#

下面的代码对我很有效，它可以打印出来：收集了120个列表

url = 'https://baltimore.craigslist.org/search/sss#search=1~list~0~0'
browser = webdriver.Chrome()
browser.get(url)
sleep(3)
soup = BeautifulSoup(browser.page_source, 'html.parser') 
posts_html= soup.find_all('li', {'class': 'cl-search-result'})
print('Collected {0} listings'.format(len(posts_html)))

赞(0）回复(0）举报 2023-02-12

我来回答

selenium和bs4都无法在页面中找到div

1条答案

相关问题

热门标签

最新问答