我有下面的代码,我想我已经快开始工作了。我可以得到一个包含每个锚元素值的选择器数组,其中包含一个包含字符串class_id
的id
。我试图做的是得到所有这些锚元素的文本节点子节点。有人能告诉我怎么做吗?谢谢。
import scrapy;
# with open('../econ.html', 'r') as f:
#html_string = f.read()
econ_headers = {'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8',
'Content-Type': 'application/x-www-form-urlencoded',
'Origin': 'https://pisa.ucsc.edu',
'Accept-Language': 'en-us',
'Host': 'pisa.ucsc.edu',
'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/14.1.1 Safari/605.1.15',
'Referer': 'https://pisa.ucsc.edu/class_search/',
'Accept-Encoding': ['gzip', 'deflate', 'br'],
'Connection': 'keep-alive',
'Content-Type': 'application/x-www-form-urlencoded'}
class EconSpider(scrapy.Spider):
name = "econ"
def start_requests(self):
urls = [
'https://pisa.ucsc.edu/class_search/index.php'
]
for url in urls:
yield scrapy.Request(url=url, method="POST", headers=econ_headers, body='action=results&binds[:term]=2210&binds[:subject]=ECON&binds[:reg_status]=O&rec_start=0&rec_dur=1000', callback=self.parse)
def parse(self, response):
page = response.url.split("/")[-2]
print(response.xpath('//a[contains(@id, "class_id")] *::text'))
filename = f'class-{page}.html'
with open(filename, 'wb') as f:
f.write(response.body)
self.log(f'Saved file {filename}')
1条答案
按热度按时间b4qexyjb1#
注意:
::text
只适用于css选择器print(response.xpath('//a[contains(@id, "class_id")]/text()').getall())
个