python-3.x 使用selenium只从Airbnb上抓取评论的文本?

blmhpbnm  于 2023-04-08  发布在  Python
关注(0)|答案(1)|浏览(132)

我试图只从Airbnb网页上提取评论。
我尝试了以下代码

from bs4 import BeautifulSoup
from selenium import webdriver
from webdriver_manager.chrome import ChromeDriverManager
driver = webdriver.Chrome(ChromeDriverManager().install())
import time

airbnb_url = 'https://www.airbnb.co.in/rooms/605371928419351152?adults=1&category_tag=Tag%3A677&children=0&enable_m3_private_room=false&infants=0&pets=0&search_mode=flex_destinations_search&check_in=2023-04-09&check_out=2023-04-14&federated_search_id=da4d5c1e-7ad2-4539-8658-5f27dde826f8&source_impression_id=p3_1680264622_sNnLDFQJLlbBR4%2Fw'
driver.get(airbnb_url)
driver.maximize_window()
time.sleep(5)
page=BeautifulSoup(driver.page_source, 'lxml')
print(page)
print('**********************************************')
containers = page.select('[data-review-id] span.dir-ltr')
for container in containers:
    print(container.text)
    print('**********************************************')

预期缓解

This was a great find for us. Akash was our host for the stay. He was quite proactive in sharing with us instructions/information pertaining to the stay, followed-through diligently with the requisite actions, and was very responsive to the queries/additional-requests placed by us. Coming to the property itself - it's exactly as it looks in the pictures and perhaps prettier at this time of the year when there is a chill in the weather. The views from the living room and the 1st floor rooms are amazing. We couldnt try the pool owing to the chilly weather but it's kept quite clean and so is the garden area. It's around 40 mins drive from most of the to-visit places in/around nashik. The place is quite secure and the staff stays at the property.Coming to the people, Sonu and Shobha did their best to make us feel-at-home. Jeetendra was very keen to make this stay a foodie's delight with his expansive menu and the barbeque. And ofcourse we loved hanging out with Cocktail. Thank you all!!
**********************************************
We had an amazing  stay. The staff was very competent and made our stay fabulous. We will highly recommend this place to our family and friends.
**********************************************
We really enjoyed two days at this property. The villa as well as the entire property are well maintained. The view of lake as well as mountains are breathtaking. Very helpful care takers and food food.

目前没有输出。

6ljaweal

6ljaweal1#

您必须打开包含所有评论的模态,以便选择可以找到它们,因为它们是动态加载和呈现的。
要找到打开模态的触发器,您可以使用顶部的review anker,并使用css selector或任何其他策略。
请注意,你仍然可以使用time模块,但我建议你仔细看看WebDriverWait

from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
示例
from bs4 import BeautifulSoup
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from webdriver_manager.chrome import ChromeDriverManager
driver = webdriver.Chrome(ChromeDriverManager().install())

airbnb_url = 'https://www.airbnb.co.in/rooms/605371928419351152?adults=1&category_tag=Tag%3A677&children=0&enable_m3_private_room=false&infants=0&pets=0&search_mode=flex_destinations_search&check_in=2023-04-09&check_out=2023-04-14&federated_search_id=da4d5c1e-7ad2-4539-8658-5f27dde826f8&source_impression_id=p3_1680264622_sNnLDFQJLlbBR4%2Fw'
driver.get(airbnb_url)
driver.maximize_window()
WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.CSS_SELECTOR, 'button[aria-label^="Rated"]'))).click()
WebDriverWait(driver, 20).until(EC.visibility_of_all_elements_located((By.CSS_SELECTOR, '[data-review-id]')))

page=BeautifulSoup(driver.page_source, 'lxml')
print('**********************************************')
containers = page.select('[data-review-id] span.dir-ltr')
for container in containers:
    print(container.text)
    print('**********************************************')
示例二

它不需要BeautifulSoup,你也可以直接用selenium提取文本:

...
driver.get(airbnb_url)
driver.maximize_window()
WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.CSS_SELECTOR, 'button[aria-label^="Rated"]'))).click()
print('**********************************************')
for e in WebDriverWait(driver, 20).until(EC.visibility_of_all_elements_located((By.CSS_SELECTOR, '[data-review-id] span.dir-ltr'))):
    print(e.text)
    print('**********************************************')
...
输出
**********************************************
This was a great find for us. Akash was our host for the stay. He was quite proactive in sharing with us instructions/information pertaining to the stay, followed-through diligently with the requisite actions, and was very responsive to the queries/additional-requests placed by us. Coming to the property itself - it's exactly as it looks in the pictures and perhaps prettier at this time of the year when there is a chill in the weather. The views from the living room and the 1st floor rooms are amazing. We couldnt try the pool owing to the chilly weather but it's kept quite clean and so is the garden area. It's around 40 mins drive from most of the to-visit places in/around nashik. The place is quite secure and the staff stays at the property.Coming to the people, Sonu and Shobha did their best to make us feel-at-home. Jeetendra was very keen to make this stay a foodie's delight with his expansive menu and the barbeque. And ofcourse we loved hanging out with Cocktail. Thank you all!!
**********************************************
We had an amazing  stay. The staff was very competent and made our stay fabulous. We will highly recommend this place to our family and friends.
**********************************************
We really enjoyed two days at this property. The villa as well as the entire property are well maintained. The view of lake as well as mountains are breathtaking. Very helpful care takers and food food.
**********************************************

相关问题