在R中的体育桌的Web抓取

xu3bshqb  于 2023-10-13  发布在  其他
关注(0)|答案(2)|浏览(105)

我需要你的帮助/建议,从下面的链接网页抓取表信息,使用R或Python!https://euroleaguefantasy.euroleaguebasketball.net/en/stats-fantasy-euroleague
到目前为止,我已经尝试了rvest包,但没有运气!

url <- "https://euroleaguefantasy.euroleaguebasketball.net/en/stats-fantasy-euroleague"

library(rvest)
read_html(url)
#> {html_document}
#> <html lang="en">
#> [1] <head>\n<meta http-equiv="Content-Type" content="text/html; charset=UTF-8 ...
#> [2] <body class="loading">\n<app-root></app-root><button id="ot-sdk-btn" clas ...

创建于2023-10-06附带reprex v2.0.2
我无法检索,或者我不知道如何检索,从这里的任何内容!以来
read_html(URL)[1]read_html(URL)[2]不提供内容
任何关于我如何继续的想法或建议都非常感谢!

hlswsv35

hlswsv351#

您尝试抓取的页面是一个动态网页。这意味着表格内容不存在于您通过read_html下载的html中。相反,html包含从API获取json格式的数据以填充表的JavaScript代码。这个JavaScript会自动在你的浏览器中运行,这就是为什么你会看到这个表,但是当你使用read_html时,它不会在R中运行。
你可以用两种方法之一来解决这个问题。要么使用浏览器自动化(如Selenium),要么使用浏览器的控制台查找将返回原始数据的API请求。我通常发现第二种解决方案可以更好地控制您如何读取和处理数据,并将在这里展示。
首先,从浏览器的控制台获取url并将其放入R中(我将url分成几部分,并使用paste将它们重新组合在一起,以便它们适合屏幕)

url <- paste0("https://www.dunkest.com/api/stats/table",
"?season_id=15&mode=dunkest&stats_type=avg",
"&weeks[]=1&rounds[]=1&rounds[]=2&teams[]=31",
"&teams[]=32&teams[]=33&teams[]=34&teams[]=35",
"&teams[]=36&teams[]=37&teams[]=38&teams[]=39",
"&teams[]=40&teams[]=41&teams[]=42&teams[]=43",
"&teams[]=44&teams[]=45&teams[]=46&teams[]=47",
"&teams[]=48&positions[]=1&positions[]=2",
"&positions[]=3&player_search=&min_cr=4",
"&max_cr=35&sort_by=pdk&sort_order=desc&iframe=yes")

现在我们做

jsonlite::read_json(url) |>
  lapply(as.data.frame) |> 
  lapply(\(x) sapply(x, as.character)) |>
  dplyr::bind_rows()
#> # A tibble: 111 x 42
#>    id    gp    first_name last_n~1 cr    team_id team_~2 team_~3 posit~4 posit~5
#>    <chr> <chr> <chr>      <chr>    <chr> <chr>   <chr>   <chr>   <chr>   <chr>  
#>  1 1185  1     Matt       Thomas   8.3   31      BER     ALBA B~ 1       G      
#>  2 1187  1     Shabazz    Napier   13.3  35      CZV     Crvena~ 1       G      
#>  3 1198  1     Achille    Polonara 9.8   47      VIR     Virtus~ 2       F      
#>  4 1206  1     Timothe    Luwawu-~ 10.3  40      ASV     LDLC A~ 2       F      
#>  5 1213  1     Tornike    Shengel~ 12.3  47      VIR     Virtus~ 2       F      
#>  6 1218  1     Shane      Larkin   14.5  32      EFS     Anadol~ 1       G      
#>  7 1222  1     Milos      Teodosic 12.3  35      CZV     Crvena~ 1       G      
#>  8 1228  1     Jan        Vesely   12.8  37      BAR     FC Bar~ 3       C      
#>  9 1230  1     Balsa      Koprivi~ 6.6   44      PAR     Partiz~ 3       C      
#> 10 1231  1     Lorenzo    Brown    14.8  41      MTA     Maccab~ 1       G      
#> # ... with 101 more rows, 32 more variables: pdk <chr>, plus <chr>, min <chr>,
#> #   starter <chr>, pts <chr>, ast <chr>, reb <chr>, stl <chr>, blk <chr>,
#> #   blka <chr>, fgm <chr>, fgm_tot <chr>, fga <chr>, fga_tot <chr>, tpm <chr>,
#> #   tpm_tot <chr>, tpa <chr>, tpa_tot <chr>, ftm <chr>, ftm_tot <chr>,
#> #   fta <chr>, fta_tot <chr>, oreb <chr>, dreb <chr>, tov <chr>, pf <chr>,
#> #   fouls_received <chr>, plus_minus <chr>, fgp <chr>, tpp <chr>, ftp <chr>,
#> #   slug <chr>, and abbreviated variable names 1: last_name, 2: team_code, ...

创建于2023-10-06附带reprex v2.0.2

enyaitl3

enyaitl32#

library(httr2)
library(tidyverse)

"https://www.dunkest.com/api/stats/table?season_id=15&mode=dunkest&stats_type=avg&weeks%5B%5D=1&rounds%5B%5D=1&rounds%5B%5D=2&teams%5B%5D=31&teams%5B%5D=32&teams%5B%5D=33&teams%5B%5D=34&teams%5B%5D=35&teams%5B%5D=36&teams%5B%5D=37&teams%5B%5D=38&teams%5B%5D=39&teams%5B%5D=40&teams%5B%5D=41&teams%5B%5D=42&teams%5B%5D=43&teams%5B%5D=44&teams%5B%5D=45&teams%5B%5D=46&teams%5B%5D=47&teams%5B%5D=48&positions%5B%5D=1&positions%5B%5D=2&positions%5B%5D=3&player_search=&min_cr=4&max_cr=35&sort_by=pdk&sort_order=desc&iframe=yes" %>%
  request() %>%
  req_perform() %>%
  resp_body_json(simplifyVector = TRUE) %>%
  as_tibble() 

# A tibble: 111 × 42
   id    gp    first_name last_name  cr    team_id team_code team_name position_id position pdk   plus  min   starter pts   ast   reb   stl   blk  
   <chr> <chr> <chr>      <chr>      <chr> <chr>   <chr>     <chr>     <chr>       <chr>    <chr> <chr> <chr> <chr>   <chr> <chr> <chr> <chr> <chr>
 1 1185  1     Matt       Thomas     8.3   31      BER       ALBA Ber… 1           G        13.0  0.2   23.3  1       11.0  0.0   3.0   0.0   0.0  
 2 1187  1     Shabazz    Napier     13.3  35      CZV       Crvena Z… 1           G        22.0  0.3   20.7  1       21.0  1.0   3.0   2.0   0.0  
 3 1198  1     Achille    Polonara   9.8   47      VIR       Virtus S… 2           F        2.0   -0.4  10.7  0       0.0   1.0   3.0   0.0   1.0  
 4 1206  1     Timothe    Luwawu-ca… 10.3  40      ASV       LDLC ASV… 2           F        9.0   -0.1  30.0  1       13.0  0.0   1.0   2.0   0.0  
 5 1213  1     Tornike    Shengelia  12.3  47      VIR       Virtus S… 2           F        21.0  0.3   31.0  1       17.0  4.0   4.0   0.0   0.0  
 6 1218  1     Shane      Larkin     14.5  32      EFS       Anadolu … 1           G        26.0  0.4   32.5  1       16.0  12.0  0.0   2.0   0.0  
 7 1222  1     Milos      Teodosic   12.3  35      CZV       Crvena Z… 1           G        -0.9  -0.6  14.6  0       3.0   2.0   3.0   1.0   0.0  
 8 1228  1     Jan        Vesely     12.8  37      BAR       FC Barce… 3           C        20.9  0.3   22.0  1       16.0  1.0   7.0   2.0   0.0  
 9 1230  1     Balsa      Koprivica  6.6   44      PAR       Partizan… 3           C        9.0   0.1   11.2  1       6.0   0.0   3.0   0.0   2.0  
10 1231  1     Lorenzo    Brown      14.8  41      MTA       Maccabi … 1           G        27.5  0.4   25.0  1       22.0  8.0   1.0   2.0   0.0  
# ℹ 101 more rows
# ℹ 23 more variables: blka <chr>, fgm <chr>, fgm_tot <chr>, fga <chr>, fga_tot <chr>, tpm <chr>, tpm_tot <chr>, tpa <chr>, tpa_tot <chr>,
#   ftm <chr>, ftm_tot <chr>, fta <chr>, fta_tot <chr>, oreb <chr>, dreb <chr>, tov <chr>, pf <chr>, fouls_received <chr>, plus_minus <chr>,
#   fgp <chr>, tpp <chr>, ftp <chr>, slug <chr>
# ℹ Use `print(n = ...)` to see more rows

相关问题