python-3.x 网页搜罗：404错误页面404错误页面

yqkkidmi 于 2023-04-22 发布在 Python

关注(0)|答案(2)|浏览(96)

我正在尝试抓取以下页面：http://usbcdirectory.com/listing/1-us-black-chambers
我用的是python 3.5.0
下面是我的代码：
urllib.request.urlopen('http://usbcdirectory.com/listing/1-us-black-chambers')
使用上面我得到404找不到错误。然而，页面存在，当我从浏览器打开。
我试图寻找这个问题的解决方案，在这里我发现：
1)将urllib改为requests：我已经这样做，并得到404错误的状态码

>>>requests.get('http://usbcdirectory.com/listing/1-us-black-chambers')

Request <404>

2)我检查了我的链接是正确的
3)我试着寻找页面是否是使用javascript生成的。我相信不是。
这里的网页有什么问题？他们是以某种方式阻止了抓取还是URL有问题？

python-3.x

来源：https://stackoverflow.com/questions/46843293/web-scraping-page-exists-but-getting-404-using-requests-urllib

2条答案

按热度按时间

gorkyyrv1#

正如你所猜测的，他们可能会阻止你的请求。你可以传递自定义头来模拟你的请求，更像是来自真实的浏览器的请求：

import requests

url = 'http://usbcdirectory.com/listing/1-us-black-chambers'
headers = {'Accept': 'text/html'}
response = requests.get(url, headers=headers)
print(response.status_code)

赞(0）回复(0）举报 2023-04-22

jk9hmnmh2#

它发生在我身上一样.感谢分享解决方案.我也试图使用我的个人用户代理代码和它的工作.我用这个代码：

import requests

url = 'http://usbcdirectory.com/listing/1-us-black-chambers'
headers = {'User-Agent': 'your user agent'}
response = requests.get(url, headers=headers)
print(response.status_code)

赞(0）回复(0）举报 2023-04-22

我来回答

python-3.x 网页搜罗：404错误页面404错误页面

2条答案

相关问题

热门标签

最新问答