回收- scrapy 清单和独立存储项目

l3zydbqr  于 2023-05-20  发布在  其他
关注(0)|答案(1)|浏览(91)

我有一个与rvest报废网站的斗争,特别是
1.元素。
这是我练习的页面:https://www.edx.org/professional-certificate/harvardx-computer-science-for-artifical-intelligence
CS50的计算机科学导论每周6-18小时,为期12周介绍计算机科学和编程艺术的智力企业。
CS50's Introduction to Artificial Intelligence with Python每周10-30小时,共7周在这个人工智能入门课程中学习在Python中使用机器学习。工作前景
....
下面是我的代码:

> html_nodes(page, xpath='//*[@id="main-content"]/div[3]/div/div[3]/div/div/div/ol') %>% html_text()
[1] "HarvardX's Computer Science for Artificial Intelligence Professional Certificate CS50's Introduction to Computer Science6–18 hours per week, for 12 weeksAn introduction to the intellectual enterprises of computer science and the art of programming.View the course CS50's Introduction to Artificial Intelligence with Python10–30 hours per week, for 7 weeksLearn to use machine learning in Python in this introductory course on artificial intelligence.View the courseJob OutlookEmployment of software developers is projected to grow 24% from 2016 to 2026, much faster than the average for all occupations. (source: Occupational Outlook Handbook)The median pay for software developers in the U.S. in 2018 was $105,590 per year. (source: Occupational Outlook Handbook)"

当我尝试使用xpath元素'//*[@id=“main-content”]/div[3]/div/div[3]/div/div/div/ol/li[2]'时,我只得到其中一个:

html_nodes(page, xpath='//*[@id="main-content"]/div[3]/div/div[3]/div/div/div/ol/li[2]') %>% html_text()
[1] " CS50's Introduction to Computer Science6–18 hours per week, for 12 weeksAn introduction to the intellectual enterprises of computer science and the art of programming.View the course"

你知道我可以做到这一点,而不必指定三个xpath独立?”“我找不到路

u3r8eeie

u3r8eeie1#

我想你是在3个可折叠部分中的文本之后。如果是:

library(rvest)
library(tidyverse)

url <- "https://www.edx.org/professional-certificate/harvardx-computer-science-for-artifical-intelligence"

page <- read_html(url)

page %>% 
  html_nodes("div.path-details") %>% 
  .[2:4] %>% 
  html_text

# [1] "CS50's Introduction to Computer Science6–18 hours per week, for 12 weeksAn introduction to the intellectual enterprises of computer science and the art of programming.View the course"                                                                                                                     
# [2] "CS50's Introduction to Artificial Intelligence with Python10–30 hours per week, for 7 weeksLearn to use machine learning in Python in this introductory course on artificial intelligence.View the course"                                                                                                  
# [3] "Job OutlookEmployment of software developers is projected to grow 24% from 2016 to 2026, much faster than the average for all occupations. (source: Occupational Outlook Handbook)The median pay for software developers in the U.S. in 2018 was $105,590 per year. (source: Occupational Outlook Handbook)"

相关问题