python Scrapy automation [关闭]

qzlgjiam 于 2024-01-05 发布在 Python

关注(0)|答案(2)|浏览(330)

已关闭。此问题需要更多focused。目前不接受回答。
**要改进此问题吗？**更新此问题，使其仅针对editing this post的一个问题。

昨天就关门了。
Improve this question的
我是一名初级数据科学家，正在从事一个项目
在我抓取了几个网站后，他们让我自动化抓取过程
我使用scrapy作为这个问题的框架，并使用mongoDB来存储数据。
我做了我的研究，我发现scrapyd和气流可以让你这样做。
我从气流开始，但我发现它相当复杂的气流dags检测我的scrapy项目。根据您的专业知识，什么是最好的方式来自动化scrapy尽可能简单。
谢谢你的帮助

python

来源：https://stackoverflow.com/questions/77750638/scrapy-automation

2条答案

按热度按时间

jw5wzhpr1#

也许你可以使用Crawlab，点击here！我在工作中使用过5. 0版本，我认为它很好。

赞(0）回复(0）举报 2024-01-05

hkmswyz62#

你可以试试Scrapyd
使用mongo的示例代码结构

import scrapy
from scrapy.crawler import CrawlerProcess
from pymongo import MongoClient
class MySpider(scrapy.Spider):
    name = 'myspider'
    start_urls = ['https://your-url']  
    def parse(self, response):
        # Extract data using Scrapy's selectors or regular expressions
        title = response.css('title::text').get()
        content = response.xpath('//p/text()').getall()
        # Connect to MongoDB
        client = MongoClient('localhost', 27017)  
        db = client['mydatabase']  
        collection = db['mycollection']  
        
        collection.insert_one({'title': title, 'content': content})
# spider locally
process = CrawlerProcess()
process.crawl(MySpider)
process.start()
# Scrapyd deployment:
# scrapyd-deploy <target-project> -p myspider

字符串

展开查看全部

赞(0）回复(0）举报 2024-01-05

我来回答

python Scrapy automation [关闭]

2条答案

相关问题

热门标签

最新问答