如何在R和Python中抓取带有度数符号的表？

unftdfkk 于 2022-12-27 发布在 Python

关注(0)|答案(1)|浏览(89)

我正在尝试刮擦此website中的表
首先，我尝试使用here中的R，代码如下：

url <- paste0("https://artofproblemsolving.com/wiki/index.php/Polygon")
library(tidyverse)
library(rvest)
h <- read_html(url)
class(h)
tab <- h |> html_nodes("table")
tab[[1]]
tab <- tab[[1]] |> html_table()
class(tab)
tab

$\circ$ 的最后两列缺失;当我用here中的代码尝试Python时，同样的问题发生了：

import pandas as pd
URL = "https://artofproblemsolving.com/wiki/index.php/Polygon"
#tables = pd.read_html(URL,match="Number of Sides")
tables=pd.read_html(URL,attrs = {'class' : 'wikitable'})
print(tables)
print("There are : ",len(tables)," tables")
print("Take look at table 0")
tables[0]

我想知道你是否可以帮我解决这个问题，或者建议一个新的方法来刮整个表从链接。谢谢！

来源：https://stackoverflow.com/questions/74920768/how-to-scrape-a-table-with-degree-symbol-in-r-and-python

1条答案

按热度按时间

ryevplcw1#

这里有一个解决方案。度是在图像元素中，所以你必须提取它们的“alt”属性。

suppressPackageStartupMessages({
  library(dplyr)
  library(rvest)
})

link <- "https://artofproblemsolving.com/wiki/index.php/Polygon"
page <- read_html(link)

df1 <- page %>%
  html_element('table.wikitable') %>%
  html_table()

angles <- page %>%
  html_element('table.wikitable') %>%
  html_elements('img.latex') %>%
  html_attr('alt') %>%
  gsub("[^[:digit:]]+", "", .) %>%
  as.integer() %>%
  matrix(ncol = 2, byrow = TRUE)

df1[2:3] <- angles
df1
#> # A tibble: 5 × 3
#>   `Number of Sides` `Sum of Interior angles` Individual angle measure in regul…¹
#>               <int>                    <int>                               <int>
#> 1                 3                      180                                  60
#> 2                 4                      360                                  90
#> 3                 5                      540                                 108
#> 4                 6                      720                                 120
#> 5                 8                     1080                                 135
#> # … with abbreviated variable name
#> #   ¹`Individual angle measure in regular polygon`

创建于2022年12月26日，使用reprex v2.0.2

赞(0）回复(0）举报 2022-12-27

我来回答

如何在R和Python中抓取带有度数符号的表？

1条答案

相关问题

热门标签

最新问答