抓取网页并使用Rvest放入表格中

pprl5pva  于 2023-03-10  发布在  其他
关注(0)|答案(1)|浏览(119)

我试着测试刮卡等级,并从本页放入每个等级的列中:https://www.psacard.com/pop/basketball-cards/1986/fleer/36766
我唯一能得到任何东西的方法是使用这段代码,但是我也试过html_table(),它没有得到任何东西。

read_html("https://www.psacard.com/pop/basketball-cards/1986/fleer/36766") %>% 
  html_text2() %>% 
  .[1]

我希望有一个 Dataframe 与列为每个等级1-10为每个球员作为一行。

5lhxktic

5lhxktic1#

library(tidyverse)
library(httr2)

"https://www.psacard.com/Pop/GetSetItems" %>%
  request() %>%
  req_headers(Accept = "application/json") %>%
  req_body_form(
    draw = 1,
    start = 0,
    length = 300,
    headingID = 36766,
    categoryID = 20019,
    isPSADNA = "false"
  ) %>%
  req_perform() %>% 
  resp_body_json(simplifyVector = TRUE) %>% 
  pluck("data") %>% 
  as_tibble()

# A tibble: 133 × 39
   SpecID SubjectN…¹ SortO…² Variety CardN…³ CardN…⁴ GradeN0 Grade1Q Grade1 Grade…⁵ Grade…⁶ Grade2Q Grade2 Grade…⁷ Grade3Q Grade3 Grade…⁸ Grade4Q Grade4 Grade…⁹ Grade5Q Grade5
    <int> <chr>        <dbl> <chr>   <chr>     <int>   <int>   <int>  <int>   <int>   <int>   <int>  <int>   <int>   <int>  <int>   <int>   <int>  <int>   <int>   <int>  <int>
 1      0 TOTAL POP…       0 NA      NA           NA     653       8    262       3     116       3    440      36       2    952      59       6   2809     114      10   6273
 2 299514 Kareem Ab…    1146 ""      1             1       5       0      1       0       0       0      3       1       0     22       1       0     80       0       0    209
 3 299516 Alvan Ada…    2570 ""      2             2       0       0      0       0       0       0      0       0       0      4       0       0     13       0       0     27
 4 299517 Mark Agui…    5348 ""      3             3       0       0      0       0       0       0      1       0       0      2       0       0     11       0       0     30
 5 299518 Danny Ain…    6400 ""      4             4       0       0      0       0       0       0      2       0       0      3       0       0     10       0       0     24
 6 299519 John Bagl…    8378 ""      5             5       0       0      0       0       0       0      0       0       0      0       0       0      4       1       0     17
 7 299520 Thurl Bai…   10567 ""      6             6       0       0      0       0       0       0      0       0       0      0       0       0      7       0       0     22
 8 299521 Charles B…   10922 ""      7             7       5       0      6       0       3       0     26       2       0     39       2       0    129       4       0    264
 9 299524 Benoit Be…   12281 ""      8             8       0       0      0       0       0       0      0       0       0      0       0       0      9       0       0     16
10 299525 Larry Bird   14279 ""      9             9       0       0      3       0       2       0      4       1       0     16       0       0     46       0       0    131
# … with 123 more rows, 17 more variables: Grade5_5 <int>, Grade6Q <int>, Grade6 <int>, Grade6_5 <int>, Grade7Q <int>, Grade7 <int>, Grade7_5 <int>, Grade8Q <int>,
#   Grade8 <int>, Grade8_5 <int>, Grade9Q <int>, Grade9 <int>, Grade10 <int>, Total <int>, GradeTotal <int>, HalfGradeTotal <int>, QualifiedGradeTotal <int>, and abbreviated
#   variable names ¹​SubjectName, ²​SortOrder, ³​CardNumber, ⁴​CardNumberSort, ⁵​Grade1_5Q, ⁶​Grade1_5, ⁷​Grade2_5, ⁸​Grade3_5, ⁹​Grade4_5
# ℹ Use `print(n = ...)` to see more rows, and `colnames()` to see all variable names

相关问题