我正试图从房地产网站上搜集数据https://www.spitogatos.gr/. 我从robots.txt上看到的是:终极robots.txt机器人和用户代理拦截器我只想每天刮一次网站,这是用scrapy刮的一种方式吗?先谢谢你
import scrapy
class MainprojectSpider(scrapy.Spider):
name = 'mainProject'
allowed_domains = ['www.spitogatos.gr']
user_agent = 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like
Gecko) Chrome/91.0.4472.124 Safari/537.36'
#start_urls = ['https://www.spitogatos.gr/']
def start_requests(self):
yield scrapy.Request(url='https://www.spitogatos.gr', callback = self.parse,
headers= {'User Agent':self.user_agent})
def parse(self, response):
print(response.xpath('//h2[@class="text thin h1"]/text()').extract())#just dummy
def set_user_agent(self, request):
request.headers['User-Agent'] = self.user_agent
return request
暂无答案!
目前还没有任何答案,快来回答吧!