如何在Python中使用Selenium提取文本元素?

xoefb8l8  于 2022-11-24  发布在  Python
关注(0)|答案(4)|浏览(218)

请考虑:

我正在使用Selenium从App Storehttps://apps.apple.com/us/app/bank-of-america-private-bank/id1096813830
我尝试提取文本字段“作为主题Maven,我们的团队非常有吸引力...”
我试着按类查找元素

review_ratings = driver.find_elements_by_class_name('we-truncate we-truncate--multi-line we-truncate--interactive ember-view we-customer-review__body')
review_ratingsList = []
for e in review_ratings:
review_ratingsList.append(e.get_attribute('innerHTML'))
review_ratings

但它返回一个空列表[]
代码有什么问题吗?或者有更好的解决方案吗?

n3schb8v

n3schb8v1#

使用RequestsBeautiful Soup

import requests
from bs4 import BeautifulSoup

url = 'https://apps.apple.com/us/app/bank-of-america-private-bank/id1096813830'

res = requests.get(url)
soup = BeautifulSoup(res.text,'lxml')
item = soup.select_one("blockquote > p").text
print(item)

输出量:

As subject matter experts, our team is very engaging and focused on our near and long term financial health!
k2arahey

k2arahey2#

您可以使用WebDriverWait来等待元素的可见性并获取文本。请检查good Selenium locator

from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC

#...

wait = WebDriverWait(driver, 5)
review_ratings = wait.until(EC.visibility_of_all_elements_located((By.CSS_SELECTOR, ".we-customer-review")))
for review_rating in review_ratings:
    starts = review_rating.find_element_by_css_selector(".we-star-rating").get_attribute("aria-label")
    title = review_rating.find_element_by_css_selector("h3").text
    review = review_rating.find_element_by_css_selector("p").text
fgw7neuy

fgw7neuy3#

将 selenium 与Beautiful Soup混合。
使用Web驱动程序:

from bs4 import BeautifulSoup
from selenium import webdriver

browser = webdriver.Chrome()
url = "https://apps.apple.com/us/app/bank-of-america-private-bank/id1096813830"
browser.get(url)
innerHTML = browser.execute_script("return document.body.innerHTML")

bs = BeautifulSoup(innerHTML, 'html.parser')

bs.blockquote.p.text

输出量:

Out[22]: 'As subject matter experts, our team is very engaging and focused on our near and long term financial health!'
tvmytwxo

tvmytwxo4#

使用WebDriverWait并等待presence_of_all_elements_located,然后使用以下CSS选择器。

from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC

driver = webdriver.Chrome()
driver.get("https://apps.apple.com/us/app/bank-of-america-private-bank/id1096813830")
review_ratings = WebDriverWait(driver, 20).until(EC.presence_of_all_elements_located((By.CSS_SELECTOR, '.we-customer-review__body p[dir="ltr"]')))
review_ratingsList = []
for e in review_ratings:
    review_ratingsList.append(e.get_attribute('innerHTML'))
print(review_ratingsList)

输出:

['As subject matter experts, our team is very engaging and focused on our near and long term financial health!', 'Very much seems to be an unfinished app. Can’t find secure message alert. Or any alerts for that matter. Most of my client team is missing from the “send to” list. I have other functions very useful, when away from my computer.']

相关问题