scrapy 如何降低数据请求的速度

mum43rcc  于 2022-11-09  发布在  其他
关注(0)|答案(1)|浏览(263)

当我想得到更少的数据没有问题,但当我想得到更多的数据,我采取错误429。我看了一个零碎的文档,但它没有帮助。我认为问题是速度。因为在6秒的响应计数是210,我不知道如何放慢它。顺便说一句,我尝试了DOWNLOAD_DELAY = [1],但没有工作太多。

这是代码:

class WanikaniSpider(scrapy.Spider):
name = 'japandict'
allowed_domains = ['www.wanikani.com']         
url = ('https://www.wanikani.com/kanji/')
start_urls = []
kanjis = ["愛", "暗", "位", "偉", "易", "違", "育", "因", "引", "泳", "越", "園", "演", "煙", "遠", "押", "横", "王", "化", "加", "科", "果", "過", "解", "回", "皆", "絵", "害", "格", "確", "覚", "掛", "割", "活", "寒", "完", "官", "感", "慣", "観", "関", "顔", "願", "危", "喜", "寄", "幾", "期", "機", "規", "記", "疑", "議", "客", "吸", "求", "球", "給", "居", "許", "供", "共", "恐", "局", "曲", "勤", "苦", "具", "偶", "靴", "君", "係", "形", "景", "経", "警", "迎", "欠", "決", "件", "権", "険", "原", "現", "限", "呼", "互", "御", "誤", "交", "候", "光", "向", "好", "幸", "更", "構", "港", "降", "号", "合", "刻", "告", "込", "困", "婚", "差", "座", "最", "妻", "才", "歳", "済", "際", "在", "罪", "財", "昨", "察", "殺", "雑", "参", "散", "産", "賛", "残", "市", "師", "指", "支", "資", "歯", "似", "次", "治", "示", "耳", "辞", "式", "識", "失", "実", "若", "取", "守", "種", "酒"]
liste=[]
for kanji in kanjis:
    liste.append(kanji)
    nurl = url + kanji
    start_urls.append(nurl)
file =  open("n3kanji.txt","w",encoding="utf-8")
file1 = open("n3onyomi.txt","w",encoding="utf-8")
file2 = open("n3kunyomi.txt","w",encoding="utf-8") 
file3 = open("n3meanings.txt","w",encoding="utf-8")   

def parse(self, response):
    print(response.url)
    kanjiicon = response.xpath('//*[@id="main"]/body/div[1]/div[3]/div/div/header/h1/span/text()').getall()
    meanings = response.xpath('//*[@id="meaning"]/div[1]/p/text()').getall()
    reading = response.xpath('//*[@id="reading"]/div') 
    for onkun in reading:
        onyomi= onkun.xpath('//*[@id="reading"]/div/div[1]/p/text()').getall()
        kunyomi= onkun.xpath('//*[@id="reading"]/div/div[2]/p/text()').getall()                
    for x in kanjiicon:
        yield{'kanjiicon': x.strip()}
        self.file.write(x + "\n")
        self.file.close
    for y in onyomi:
        yield{'onyomi': y.strip()}
        self.file1.write(y + "\n" +"\r")
        self.file1.close
    for z in kunyomi:
        yield{'kunyomi': z.strip()}
        self.file2.write(z + "\n" + "\r")
        self.file2.close
    for m in meanings:
        yield{'meanings': m.strip()}
        self.file3.write(m + "\n")
        self.file3.close`

谢谢你的帮助。

9fkzdhlc

9fkzdhlc1#

您可以通过在spider上或项目的主settings.py文件中设置自定义设置,使用多种方法来降低spider的速度。
其中一些设置包括并发请求、下载延迟、每个域的并发请求、每个IP的并发请求、自动节流启用
例如:

class WanikaniSpider(scrapy.Spider):
    name = 'japandict'
    allowed_domains = ['www.wanikani.com']         
    url = ('https://www.wanikani.com/kanji/')
    start_urls = []
    custom_settings = {
        CONCURRENT_REQUESTS: 1,
        DOWNLOAD_DELAY: 10,
        CONCURRENT_REQUESTS_PER_DOMAIN: 1,
        AUTOTHROTTLE_ENABLED: True,
        AUTOTHROTTLE_START_DELAY: 3,
        AUTOTHROTTLE_TARGET_CONCURRENCY: 1
    }

相关问题