python selenium web scraper

y0u0uwnf  于 2023-06-28  发布在  Python
关注(0)|答案(2)|浏览(125)

我写了一个简单的网页刮刀,使用excel文件与ean代码搜索网站上的产品,并采取他们的价格。我有一个大问题,当我打开第一个浏览器instation,我接受cookie和安装本地化的商店,我想报废

driver.get("https://www.castorama.pl")
driver.switch_to.frame(driver.find_element(By.CLASS_NAME, "truste_popframe"))
time.sleep(1)
accept = driver.find_element(By.CLASS_NAME, "call").click()
driver.switch_to.default_content()
postcode = driver.find_element(By.CLASS_NAME, "_1a41e483").send_keys("38-500")
time.sleep(1)
submit_postcode = driver.find_element(By.XPATH, "//span[normalize-space()='Dodaj']").click()
results = []

这工作正常,但问题开始时,我想使用一个函数与循环搜索,然后这个本地化是没有设置,需要再次设置每一次

def getdata(symbol):
    driver.get(f"https://www.castorama.pl/search?term={symbol}")
  
    name = WebDriverWait(driver,2).until(
        EC.presence_of_element_located((By.ID, "product-title"))
        )
    price = WebDriverWait(driver,2).until(
        EC.presence_of_element_located((By.CLASS_NAME, "_5d34bd7a"))
        )
    records = {
      'ean': symbol,
      'cena': price.text,
      'name': name.text
    }
    
    return records

你有办法解决它吗?

k4aesqcs

k4aesqcs1#

代替selenium,你可以使用他们的分页API直接以Json形式获取结果:

import requests

params = {
    "include": "content",
    "page[number]": "1",
    "page[size]": "24",
    "searchTerm": "farba",  # <-- the search term
    "storeId": 1543         # <-- add store id here
}

headers = {
    'Authorization': 'Atmosphere atmosphere_app_id=kingfisher-EPgbIZbIpBzipu0bKltAFm1xler30LKmaF4vJH96'
}

api_url = 'https://api.kingfisher.com/v2/mobile/products/CAPL'

for params['page[number]'] in range(1, 2):  # <-- increase number of pages here
    data = requests.get(api_url, params=params, headers=headers).json()
    for r in data['data']:
        a = r['attributes']
        print(f'{a["pricing"]["currentUnitPrice"]["amountIncTax"]:>8} {a["pricing"]["currencyCode"]:<4} {a["name"]}')

图纸:

30.79 PLN  Farba Dulux EasyCare jasny spokój 2,5 l
    7.25 PLN  Farba Dekoral Ściany i Sufity 10 l + 20% gratis
    29.4 PLN  Farba Dulux EasyCare idealne cappuccino 5 l
     4.7 PLN  Farba biała mat 10 l
    20.0 PLN  Farba Dulux Ściany i Sufity najpopularniejszy szary 5 l
   25.99 PLN  Farba Dekoral Voice of Color miętowy pastelowy 2,5 l
    11.8 PLN  Farba Dulux Premium White 10 l
    29.4 PLN  Farba Dulux EasyCare designerski szary 5 l
   37.99 PLN  Farba Dulux EasyCare Kuchnia i Łazienka biały 2,5 l
   37.99 PLN  Farba Dulux EasyCare Kuchnia karmelowe latte 2,5 l
    14.8 PLN  Farba lateksowa Tikkurila Super White 10 l
   37.99 PLN  Farba Dulux EasyCare Kuchnia i Łazienka złoty pieprz 2,5 l

...and so on.

编辑:按代码搜索:

import requests

# store_id=1593 - warsaw
# sorte_id=1543 - wroclaw
store_id = 1543
product_id = '5908305642893'

params = {"searchTerm": f"{product_id}_CAPL", "storeId": store_id}

api_url = "https://api.kingfisher.com/v2/mobile/products/CAPL"
headers = {
    "Authorization": "Atmosphere atmosphere_app_id=kingfisher-EPgbIZbIpBzipu0bKltAFm1xler30LKmaF4vJH96"
}

data = requests.get(api_url, params=params, headers=headers).json()
a = data["data"][0]["attributes"]
print(
    f'{a["pricing"]["currentUnitPrice"]["amountIncTax"]:>8} {a["pricing"]["currencyCode"]:<4} {a["name"]}'
)

图纸:

0.3 PLN  Ziemia uniwersalna 50 l
rsaldnfx

rsaldnfx2#

谢谢,添加了一个函数和循环来搜索Excel文件中的每个条形码行。但我只是要求一个提示如何将结果保存在Excel文件中。在以前的这个程序中,我使用了pandas,但我还没有JSON的经验。

import requests
import pandas as pd

df = pd.read_excel('eanCS.xlsx', dtype="int64") 
mylist = df['ean'].tolist()

store_id = 2793
product_id = mylist

def getdata(symbol):
   params = {"searchTerm": f"{symbol}_CAPL", "storeId": store_id}
   data = requests.get(api_url, params=params, headers=headers).json()
   a = data["data"][0]["attributes"]
   print(f'{a["pricing"]["currentPrice"]["amountIncTax"]:>4} {a["pricing"] 
   ["currencyCode"]:<4} {a["ean"]:<4} {a["name"]:<4} ')

api_url = "https://api.kingfisher.com/v2/mobile/products/CAPL"
headers = {
  "Authorization": "Atmosphere atmosphere_app_id=kingfisher- 
   EPgbIZbIpBzipu0bKltAFm1xler30LKmaF4vJH96"
    }
 for item in mylist:
   getdata(item)

相关问题