javascript 如何快速抓取动态网页[已关闭]

cbwuti44 于 2023-01-07 发布在 Java

关注(0)|答案(1)|浏览(131)

19小时前关门了。
Improve this question
我有一个小的webapp，它使用nodejs和Pupeteer来抓取动态网页。webapp向节点服务发送请求，然后节点服务抓取内容并返回。

webapp  -- http --> Nodejs + Pupeteer -- fetch -> html
 |                    |
 |<- - - response - - |

但是这个过程有点太长了!
那么有没有比Pupeteer更好的解决方案呢？

1条答案

如果只想返回远程页面的“纯”HTML内容，可以在爬行器中使用node-fetch。

import fetch from 'node-fetch';

fetch('https://google.com')
    .then(res => res.text())
    .then(text => console.log(text));

Puppeteer使用了无头Chrome，这使得它运行速度更慢，需要更多内存，但另一方面也使得它更容易使用cookie\脚本等。