css Rvest:网页搜罗日本棒球网站

carvr3hs  于 2023-04-01  发布在  其他
关注(0)|答案(1)|浏览(64)

我尝试npb.jp使用R中的rvest包从www.example.com网站中抓取两个表。我尝试使用CSS选择器为2个表,但没有用。问题是否在于网页的格式?
验证码:

html  <- read_html("https://npb.jp/bis/eng/2022/stats/std_c.html")
css <- "#stdivmaintbl > table > tbody > tr > td > div:nth-child(1)"
nodes <-  html_nodes(html, css)
table <-  html_table(nodes)[[1]]

df <- data.frame(table)

代码正在阅读html,但似乎找不到表。
感谢任何帮助。

vyswwuz2

vyswwuz21#

不管什么原因,当我试图直接读取URL时,我得到了一个关于证书的错误,所以我将源html复制并粘贴到一个文件中,而不是使用URL阅读它。我假设我从文件中读取的内容应该仍然与您从互联网上读取的内容相同。这对我来说很有效:

library(rvest)
library(magrittr)

# this is where I saved the page's html
# assuming you don't have the same certificate problem I had, 
# you could use this instead: url <- "https://npb.jp/bis/eng/2022/stats/std_c.html"
url <- "baseball.html"

table <- url %>% read_html() %>% html_nodes(".stdtblmain") %>% html_table()

table[[1]]
> table[[1]]
# A tibble: 27 × 239
   X1        X2    X3    X4    X5    X6    X7    X8    X9    X10   X11   X12   X13   X14   X15   X16   X17   X18   X19   X20  
   <chr>     <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
 1 "TeamGWL… "Tea… G     W     L     T     PCT   "GB"  ""    Home  Road  ""    "vsS" vsDB  vsT   vsG   vsC   vsD   Int   Toky…
 2 "Team"    "G"   W     L     T     PCT   GB    ""    ""    Home  Road  ""    ""    vsS   vsDB  vsT   vsG   vsC   vsD   Int  
 3 "Tokyo Y… ""    Toky… 143   80    59    4     ""    ""    .576  --    ""    ""    37-34 43-2… ***   16-9  13-1… 11-1… 16-8…
 4 ""        "Tok… NA    NA    NA    NA    NA    ""    ""    NA    NA    ""    ""    NA    NA    NA    NA    NA    NA    NA   
 5 "YOKOHAM… ""    YOKO… 143   73    68    2     ""    ""    .518  8.0   ""    ""    41-3… 32-3… 9-16  ***   16-9  13-1… 8-17 
 6 ""        "YOK… NA    NA    NA    NA    NA    ""    ""    NA    NA    ""    ""    NA    NA    NA    NA    NA    NA    NA   
 7 "Hanshin… ""    Hans… 143   68    71    4     ""    ""    .489  12.0  ""    ""    37-3… 31-3… 11-1… 9-16  ***   14-1… 9-14…
 8 ""        "Han… NA    NA    NA    NA    NA    ""     NA   NA    NA     NA   ""    NA    NA    NA    NA    NA    NA    NA   
 9 "Yomiuri… ""    Yomi… 143   68    72    3     ".48… "12.… 35-3… 33-3… "13-… "11-… 10-1… ***   13-12 13-12 8-10  NA    NA   
10 ""        "Yom… NA    NA    NA    NA    NA     NA    NA   NA    NA     NA    NA   NA    NA    NA    NA    NA    NA    NA   
# … with 17 more rows, and 219 more variables: X21 <chr>, X22 <chr>,

相关问题