我正在试着抓取Craigslist结果页面,但是bs4和selenium都找不到页面中的元素,尽管我可以使用dev工具检查它们。结果在类为cl-search-result
的列表项中,但是返回的soup似乎没有任何结果。
这是我到目前为止的脚本。看起来甚至返回的汤也和我用dev工具检查时看到的html不一样。我期望这个脚本返回42个项目,这是搜索结果的数量。
下面是脚本:
import time
import datetime
from collections import namedtuple
import selenium.webdriver as webdriver
from selenium.webdriver.firefox.service import Service
from selenium.webdriver.support.ui import Select
from selenium.webdriver.firefox.options import Options
from selenium.webdriver.common.by import By
from selenium.webdriver.common.keys import Keys
from selenium.common.exceptions import ElementNotInteractableException
from bs4 import BeautifulSoup
import pandas as pd
import os
user_agent = 'Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:109.0) Gecko/20100101 Firefox/109.0'
firefox_driver_path = os.path.join(os.getcwd(), 'geckodriver.exe')
firefox_service = Service(firefox_driver_path)
firefox_option = Options()
firefox_option.set_preference('general.useragent.override', user_agent)
browser = webdriver.Firefox(service=firefox_service, options=firefox_option)
browser.implicitly_wait(7)
url = 'https://baltimore.craigslist.org/search/sss#search=1~list~0~0'
browser.get(url)
soup = BeautifulSoup(browser.page_source, 'html.parser')
print(soup)
posts_html= soup.find_all('li', {'class': 'cl-search-result'})
print('Collected {0} listings'.format(len(posts_html)))
1条答案
按热度按时间whlutmcx1#
下面的代码对我很有效,它可以打印出来:收集了120个列表