pandas 在Python中使用Selenium抓取站点所需的帮助

yc0p9oo0  于 2023-04-28  发布在  Python
关注(0)|答案(4)|浏览(105)

我尝试用selenium来抓取NBA球员的名字和预测的梦幻得分(不是单一的统计DFS)。我已经用selenium来自动点击NBA,并选择梦幻得分选项卡。
从那里,我看到球员在一个网格,我会喜欢刮点和每个球员的名字。我试图通过网格循环,但我不认为我这样做的权利
有人能看看我的代码并指出正确的方向吗?

from selenium import webdriver
from selenium.webdriver.common.keys import Keys
import time
import pandas as pd

PATH = "C:\Program Files (x86)\chromedriver.exe"
driver = webdriver.Chrome(PATH)

driver.get("https://app.prizepicks.com/")

popup = driver.find_element_by_class_name("close").click()
NBA = driver.find_element_by_xpath("//div[@class='name'][normalize-space()='NBA']").click()
fantasyScore = driver.find_element_by_xpath("//div[@class='segment-selector-button']").click()

projections = driver.find_element_by_class_name('projections')

nbaPlayers = []

for projection in projections:
    
    names = projection.find_element_by_xpath('.//*[@id="projections"]/div/div/div[1]/div[2]/div[1]/div[3]/div[1]').text
    points= projection.fine_element_by_xpath('.//*[@id="projections"]/div/div/div[1]/div[2]/div[2]/div[1]/text()').text
    print(names, points)
    
    players = {
        'Name': names,
        'FantasyPoints':points,
        }
    
    nbaPlayers.append(players)

df = pd.DataFrame(nbaPlayers)
print(df)

driver.quit()

编辑:6.12.21 5:22 PM CST这是我的代码的第一部分,被C. Peck修复了(谢谢!)代码的下一部分也不成功。

from selenium import webdriver
from selenium.webdriver.common.keys import Keys
import time
import pandas as pd
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC

#sample data
pp = {'Player Name':['Donovan Mitchell', 'Kawhi Leonard', 'Rudy Gobert', 'Paul George','Reggie Jackson', 'Jordan Clarkson'],
      'Fantasy Score': [46.0, 50.0, 40.0, 44.0, 25.0, 26.5]}

#Creating a dataframe from dictionary
dfNBA = pd.DataFrame(pp)

#Scraping ESPN
PATH = "C:\Program Files (x86)\chromedriver.exe"
driver = webdriver.Chrome(PATH)
driver.get("https://www.espn.com/")

#Clicking the search button
driver.find_element_by_xpath("//a[@id='global-search-trigger']").click() 

#sending data to the search button
driver.find_element_by_xpath("//input[@placeholder='Search Sports, Teams or Players...']").send_keys(dfNBA.iloc[0,:].values[0])
WebDriverWait(driver, 20).until(EC.presence_of_all_elements_located((By.CSS_SELECTOR, ".search_results__details")))
playerPage = driver.find_element_by_css_selector(".search_results__details").click()

#Scraping data from last 10 games
points = driver.find_element_by_xpath("//*[@id='fittPageContainer']/div[2]/div[5]/div/div[1]/div[1]/section/div/div[3]/div/div/div[2]/table/tbody/tr[1]/td[13]")
rebs = driver.find_element_by_xpath("//*[@id='fittPageContainer'']/div[2]/div[5]/div/div[1]/div[1]/section/div/div[3]/div/div/div[2]/table/tbody/tr[1]/td[7]")                                    
asts = driver.find_element_by_xpath("//*[@id='fittPageContainer']/div[2]/div[5]/div/div[1]/div[1]/section/div/div[3]/div/div/div[2]/table/tbody/tr[1]/td[8]")
blks = driver.find_element_by_xpath("//*[@id='fittPageContainer']/div[2]/div[5]/div/div[1]/div[1]/section/div/div[3]/div/div/div[2]/table/tbody/tr[1]/td[9]")
stls = driver.find_element_by_xpath("//*[@id='fittPageContainer']/div[2]/div[5]/div/div[1]/div[1]/section/div/div[3]/div/div/div[2]/table/tbody/tr[1]/td[10]")
tnvrs = driver.find_element_by_xpath("//*[@id='fittPageContainer']/div[2]/div[5]/div/div[1]/div[1]/section/div/div[3]/div/div/div[2]/table/tbody/tr[1]/td[12]")

projectedPoints = points+(rebs*1.2)+(asts*1.5)+(blks*3)+(stls*3)-(tnvrs*1)
print(projectedPoints)

#my final table should look like:
#Index   Name           FantasyPoints  ESPN L10 Avg
#0     Donovan Mitchell      46           27.8

这个项目的目标是首先刮PrizePicks并获得NBA球员的名字和幻想得分,然后使用存储的 Dataframe 数据将刮取的数据存储到 Dataframe 中,我试图遍历每一行,然后取球员的名字并将其插入ESPN搜索框。这应该会打开球员页面。在球员页面上,我试图刮取点,篮板,助攻,抢断,盖帽,然后使用projectedPoints变量中的公式将它们相加
因此,最终,我将能够计算出每个玩家的预计分数,并将这些分数与从Prize Picks中刮取的幻想分数进行比较。使用此比较,我将决定玩家是否会超过或低于幻想分数

qmelpv7a

qmelpv7a1#

如果没有Selenium,你可以更容易地做到这一点,因为数据是从API动态加载的:

import pandas as pd
import requests

params = (
    ('league_id', '7'),
    ('per_page', '250'),
    ('projection_type_id', '1'),
    ('single_stat', 'true'),
)

session = requests.Session() 
response = session.get('https://api.prizepicks.com/projections', data=params)

df1 = pd.json_normalize(response.json()['included'])
df1 = df1[df1['type'] == 'new_player']

df2 = pd.json_normalize(response.json()['data'])

df = pd.DataFrame(zip(df1['attributes.name'], df2['attributes.line_score']), columns=['name', 'points'])

输出:
| | 名称|点|
| --------------|--------------|--------------|
| 0|多诺万·米切尔|四十六|
| 1|科怀·伦纳德|五十|
| 二|鲁迪·戈贝尔|四十|
| 三|保罗·乔治|四十四|
| 四|迈克·康利|二十九点五|

c8ib6hqw

c8ib6hqw2#

我对你的代码做了一些修改,现在我认为它给出了你想要的输出。
我所做的更改:
1.当你使用.click()时,将变量定义为element.click()是没有用的,所以我去掉了它们。
1.您希望使用find_elements而不是find_element来获取要迭代的WebElements数组。

  1. namespoints的xpath不太正确,所以我修复了它们。
    1.我需要引入WebDriverWait,以便在查找projection元素时使它们存在。这需要以下导入:
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC

所以你的最终代码可能是:

from selenium import webdriver
from selenium.webdriver.common.keys import Keys
import time
import pandas as pd
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC

PATH = "C:\Program Files (x86)\chromedriver.exe"
driver = webdriver.Chrome(PATH)
driver.get("https://app.prizepicks.com/")
driver.find_element_by_class_name("close").click()
driver.find_element_by_xpath("//div[@class='name'][normalize-space()='NBA']").click()
driver.find_element_by_xpath("//div[@class='segment-selector-button']").click()
projections = WebDriverWait(driver, 20).until(
 EC.presence_of_all_elements_located((By.CSS_SELECTOR, ".projection")))

nbaPlayers = []

for projection in projections:

    names = projection.find_element_by_xpath('.//div[@class="name"]').text
    points= projection.find_element_by_xpath('.//div[@class="presale-score"]').get_attribute('innerHTML')
    print(names, points)

    players = {
        'Name': names,
        'FantasyPoints':points,
        }

    nbaPlayers.append(players)

df = pd.DataFrame(nbaPlayers)
print(df)

driver.quit()

输出为:

Donovan Mitchell 46.0
Kawhi Leonard 50.0
Rudy Gobert 40.0
Paul George 44.0
Mike Conley 29.5
Reggie Jackson 25.0
Jordan Clarkson 25.0
Marcus Morris 23.0
Bojan Bogdanovic 25.0
Ivica Zubac 16.0
Royce O'Neale 22.0
Nicolas Batum 19.0
Joe Ingles 22.0
Patrick Beverley 10.0
                Name FantasyPoints
0   Donovan Mitchell          46.0
1      Kawhi Leonard          50.0
2        Rudy Gobert          40.0
3        Paul George          44.0
4        Mike Conley          29.5
5     Reggie Jackson          25.0
6    Jordan Clarkson          25.0
7      Marcus Morris          23.0
8   Bojan Bogdanovic          25.0
9        Ivica Zubac          16.0
10     Royce O'Neale          22.0
11     Nicolas Batum          19.0
12        Joe Ingles          22.0
13  Patrick Beverley          10.0
9jyewag0

9jyewag03#

我看不到有匹配projections类名的元素,但如果它们应该在那里,你应该使用find_elements而不是find_element
我猜你的代码应该是这样的,以实现球员的名字和分数:

nbaPlayers = []

players = driver.find_elements_by_xpath("//div[@class='player']")
for player in players:
    name = player.find_element_by_xpath('.//div[@class='name']').text
    points = player.find_element_by_xpath('./../..//div[@class='presale-score']').text
    print(names, points)
    data = {
        'Name': name,
        'FantasyPoints':points,
        }
    
    nbaPlayers.append(data)
nkcskrwz

nkcskrwz4#

看你在用

projections = driver.find_element_by_class_name('projections')

像这样循环:

for projection in projections:

这将返回一个sing web元素。但是你的需求需要一个球员的名字列表,对吗?
所以使用find_elements代替:

names = driver.find_elements_by_css_selector("div.player div.name")
for player_name in names:
    print(player_name.text)

同样,您可以复制评级,CSS_SELECTOR将是div.score div

相关问题