Golang Colly Scraping -网站验证码捕捉我的刮

pw9qyyiw 于 2024-01-04 发布在 Go

关注(0)|答案(1)|浏览(97)

我确实为亚马逊产品标题做了Scraping，但亚马逊验证码捕获了我的scraper。我试了10次- go run main.go（8次捕获了我- 2次我刮了产品标题）
我研究了这个问题，但我没有找到任何解决方案的golang（只有python）有没有任何解决方案给我？

package main

import (
    "fmt"
    "strings"0

    "github.com/gocolly/colly"
)

func main() {

    // Create a Collector specifically for Shopify
    c := colly.NewCollector(
        colly.AllowedDomains("www.amazon.com", "amazon.com"),
    )
    c.OnHTML("div", func(h *colly.HTMLElement) {
        capctha := h.Text
        title := h.ChildText("span#productTitle")
        fmt.Println(strings.TrimSpace(title))
        fmt.Println(strings.TrimSpace(capctha))
    })

    // Start the collector
    c.Visit("https://www.amazon.com/Bluetooth-Over-Ear-Headphones-Foldable-Prolonged/dp/B07K5214NZ")
}

字符串
输出量：
请输入您在下面看到的字符抱歉，我们只需要确保您不是机器人。为了获得最佳效果，请确保您的浏览器接受cookie。

来源：https://stackoverflow.com/questions/68131475/golang-colly-scraping-website-captcha-catches-my-scrape

1条答案

按热度按时间

9jyewag01#

如果你不介意一个不同的包，我写了一个包来搜索HTML（本质上是围绕github.com/tdewolff/parse的薄 Package ）：

package main

import (
   "github.com/89z/parse/html"
   "net/http"
   "os"
)

func main() {
   req, err := http.NewRequest(
      "GET", "https://www.amazon.com/dp/B07K5214NZ", nil,
   )
   req.Header = http.Header{
      "User-Agent": {"Mozilla"},
   }
   res, err := new(http.Transport).RoundTrip(req)
   if err != nil {
      panic(err)
   }
   defer res.Body.Close()
   lex := html.NewLexer(res.Body)
   lex.NextAttr("id", "productTitle")
   os.Stdout.Write(lex.Bytes())
}

字符串
测试结果：

Bluetooth Headphones Over-Ear, Zihnic Foldable Wireless and Wired Stereo
Headset Micro SD/TF, FM for Cell Phone,PC,Soft Earmuffs &Light Weight for
Prolonged Waring(Rose Gold)

型
https://github.com/89z/parse

赞(0）回复(0）举报 2024-01-04

我来回答

Golang Colly Scraping -网站验证码捕捉我的刮

1条答案

相关问题

热门标签

最新问答