如何刮谷歌的快速回答框?

af7jpaap  于 2022-11-08  发布在  PyCharm
关注(0)|答案(2)|浏览(130)

我想刮谷歌的快速回答框(如所选文本):

我已经检查了网站上问的其他问题,但没有帮助。我该怎么做呢?

cngwdvgl

cngwdvgl1#

我想这可能对你有帮助,在搜索中给出了黄金率

import requests
from bs4 import BeautifulSoup

headers = {
    'User-agent':
    "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.102 Safari/537.36 Edge/18.19582"
}

r = requests.get('https://www.google.com/search?q=gold+rate+india&safe=active&rlz=1C1GCEB_enIN960IN960&ei=9qksYc76FeeS4-EP-8iQ8AY&oq=gold+rate+india&gs_lcp=Cgdnd3Mtd2l6EAMyCAgAEIAEELEDMgUIABCABDIFCAAQgAQyBQgAEIAEMgUIABCABDIHCAAQsQMQCjIFCAAQgAQyBQgAEIAEMgUIABCABDIHCAAQsQMQCjoHCAAQRxCwAzoKCAAQsAMQQxCLAzoNCAAQsAMQyQMQQxCLAzoQCAAQgAQQsQMQyQMQRhCAAjoKCAAQgAQQsQMQCjoNCAAQgAQQsQMQgwEQCjoLCAAQgAQQsQMQgwE6BQgAELEDOgcIABCABBAKOgsIABCABBCxAxDJAzoFCAAQkgM6CgguEIAEELEDEAo6CAgAELEDEIMBOgoIABCxAxCDARAKOhMILhCxAxCDARDHARDRAxBDEJMCOgcIABCxAxBDOgQIABBDOgYIABAKEEM6CggAELEDEIMBEEM6CQgAEMkDEAoQQzoICAAQsQMQkQI6CAguEIAEELEDOgUIABCRAjoOCAAQsQMQgwEQyQMQkQI6CwgAELEDEMkDEJECSgUIOhIBMUoFCDwSATNKBAhBGABQgytY4oQBYN2GAWgFcAJ4BIABiQiIAYRBkgEQMC43LjEwLjQuNC4wLjEuMZgBAKABAbABAMgBCrgBAsABAQ&sclient=gws-wiz&ved=0ahUKEwjOza2jvNjyAhVnyTgGHXskBG4Q4dUDCA8&uact=5', headers=headers)
soup = BeautifulSoup(r.text, 'lxml')

result = soup.find('div', class_='vlzY6d')
print(result.text)
rekjcdws

rekjcdws2#

Beautiful Soup library最适合这个任务。要找到所需的选择器,可以使用select_one()方法。该方法接受一个要搜索的选择器。要获得所需的元素,需要使用.kno-rdesc类引用常规div,并选择其中的span标记。结果选择器如下所示:.kno-rdesc span。该方法将返回html元素。为了从该元素中提取文本,必须使用text方法。

下面是使用上述方法的代码片段:

result = soup.select_one(".kno-rdesc span").text
print(result)

另外,请确保您使用的请求头user-agent是“真实的”用户访问,因为默认的requestsuser-agentpython-requests,网站知道它很可能是一个发送请求的脚本。
在线IDE中的代码和完整示例:

from bs4 import BeautifulSoup
import requests, lxml

# https://docs.python-requests.org/en/master/user/quickstart/#passing-parameters-in-urls

params = {
    "q": "Narendra Modi",
    "hl": "en",  # language
    "gl": "us"   # country of the search, US -> USA
}

# https://docs.python-requests.org/en/master/user/quickstart/#custom-headers

headers = {
    "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/103.0.0.0 Safari/537.36",
}

html = requests.get("https://www.google.com/search", params=params, headers=headers, timeout=30)
soup = BeautifulSoup(html.text, "lxml")

result = soup.select_one(".kno-rdesc span").text
print(result)

输出量:

Narendra Damodardas Modi is an Indian politician serving as the 14th and current prime minister of India since 2014. Modi was the chief minister of Gujarat from 2001 to 2014 and is the Member of Parliament from Varanasi.

另外,你也可以使用SerpApi的Google Organic Results API。它是一个付费的API,有免费的计划。
不同的是,它将绕过来自Google或其他搜索引擎的阻止,因此最终用户不必弄清楚如何做,维护解析,而只需考虑检索什么数据。
要集成的示例代码:

from serpapi import GoogleSearch
import os

params = {
  # https://docs.python.org/3/library/os.html#os.getenv
  "api_key": os.getenv("API_KEY"),  # your serpapi api key
  "engine": "google",               # search engine
  "q": "Narendra Modi"              # search query
  # other parameters
}

search = GoogleSearch(params)  # where data extraction happens on the SerpApi backend
result_dict = search.get_dict()    # JSON -> Python dict

result = result_dict["knowledge_graph"]["description"]
print(result)

输出量:

Narendra Damodardas Modi is an Indian politician serving as the 14th and current prime minister of India since 2014. Modi was the chief minister of Gujarat from 2001 to 2014 and is the Member of Parliament from Varanasi.

免责声明,我为SerpApi工作。

相关问题