Python 3.4 urllib.request错误(http403)

8yoxcaq7 于 2023-03-31 发布在 Python

关注(0)|答案(3)|浏览(212)

我正在尝试打开和解析一个html页面。在python 2.7.8中我没有问题：

import urllib
url = "https://ipdb.at/ip/66.196.116.112"
html = urllib.urlopen(url).read()

一切都很好。但是我想转到python 3.4，在那里我得到HTTP错误403（Forbidden）。我的代码：

import urllib.request
html = urllib.request.urlopen(url) # same URL as before

File "C:\Python34\lib\urllib\request.py", line 153, in urlopen
return opener.open(url, data, timeout)
File "C:\Python34\lib\urllib\request.py", line 461, in open
response = meth(req, response)
File "C:\Python34\lib\urllib\request.py", line 574, in http_response
'http', request, response, code, msg, hdrs)
File "C:\Python34\lib\urllib\request.py", line 499, in error
return self._call_chain(*args)
File "C:\Python34\lib\urllib\request.py", line 433, in _call_chain
result = func(*args)
File "C:\Python34\lib\urllib\request.py", line 582, in http_error_default
raise HTTPError(req.full_url, code, msg, hdrs, fp)
urllib.error.HTTPError: HTTP Error 403: Forbidden

它适用于不使用https的其他URL。

url = 'http://www.stopforumspam.com/ipcheck/212.91.188.166'

是可以的。

python-3.x

来源：https://stackoverflow.com/questions/28396036/python-3-4-urllib-request-error-http-403

3条答案

按热度按时间

um6iljoc1#

网站似乎不喜欢Python 3.x的用户代理。
指定User-Agent将解决您的问题：

import urllib.request
req = urllib.request.Request(url, headers={'User-Agent': 'Mozilla/5.0'})
html = urllib.request.urlopen(req).read()

注意Python 2.x urllib版本也会收到403状态，但与Python 2.x urllib2和Python 3.x urllib不同，它不会引发异常。

您可以通过以下代码确认：

print(urllib.urlopen(url).getcode())  # => 403

赞(0）回复(0）举报 2023-03-31

lztngnrs2#

以下是我在学习python-3时收集到的关于urllib的一些笔记：
我留着它们以防它们可能派上用场或帮助别人。

如何导入`urllib.request`和`urllib.parse`：

import urllib.request as urlRequest
import urllib.parse as urlParse

如何进行GET请求：

url = "http://www.example.net"
# open the url
x = urlRequest.urlopen(url)
# get the source code
sourceCode = x.read()

如何进行POST请求：

url = "https://www.example.com"
values = {"q": "python if"}
# encode values for the url
values = urlParse.urlencode(values)
# encode the values in UTF-8 format
values = values.encode("UTF-8")
# create the url
targetUrl = urlRequest.Request(url, values)
# open the url
x  = urlRequest.urlopen(targetUrl)
# get the source code
sourceCode = x.read()

如何发出POST请求（`403 forbidden`响应）：

url = "https://www.example.com"
values = {"q": "python urllib"}
# pretend to be a chrome 47 browser on a windows 10 machine
headers = {"User-Agent": "Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/47.0.2526.106 Safari/537.36"}
# encode values for the url
values = urlParse.urlencode(values)
# encode the values in UTF-8 format
values = values.encode("UTF-8")
# create the url
targetUrl = urlRequest.Request(url = url, data = values, headers = headers)
# open the url
x  = urlRequest.urlopen(targetUrl)
# get the source code
sourceCode = x.read()

如何发送GET请求（`403 forbidden`响应）：

url = "https://www.example.com"
# pretend to be a chrome 47 browser on a windows 10 machine
headers = {"User-Agent": "Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/47.0.2526.106 Safari/537.36"}
req = urlRequest.Request(url, headers = headers)
# open the url
x = urlRequest.urlopen(req)
# get the source code
sourceCode = x.read()

赞(0）回复(0）举报 2023-03-31

pqwbnv8z3#

urllib请求HTTP 403错误是由于服务器安全功能阻止了已知的bot user-agents而发生的。以下是按可行性（最容易应用的第一个）顺序排列的可能解决方案：-

溶液1：-

添加一个不同的user-agent，这只是不被认为是一个机器人。

from urllib.request import Request, urlopen 
web = "https://www.festo.com/de/de" 
headers = {
   "User-Agent": "XYZ/3.0",
   "X-Requested-With": "XMLHttpRequest"
} 
request = Request(web, headers=headers) 
content = urlopen(request).read()

如果您连续运行多个请求，您可以选择为请求设置一个短超时。

content = urlopen(request,timeout=10).read()

方案二：-

手动打开URL并接受所有Cookie后，从浏览器添加Cookie。

from urllib.request import Request, urlopen 
web = "https://www.festo.com/de/de" 
headers = {
   "User-Agent": "XYZ/3.0",
   "X-Requested-With": "XMLHttpRequest", 
   "cookie": "value stored in your webpage"
} 
request = Request(web, headers=headers) 
content = urlopen(request).read()

如果您使用的是chrome，您可以登录web url，打开inspector（按F12），然后选择Application选项卡，然后在左侧树中选择Storage下的Cookies

溶液3：-

如果需要为多个网站获取cookie，则使用Session对象创建request是明智的，因为它与cookie兼容。

import requests
web = "https://www.festo.com/de/de" 
headers = {
   "User-Agent": "XYZ/3.0",
   "X-Requested-With": "XMLHttpRequest"
} 
request = requests.Session()
content = request.get(web,headers=headers).text

附加：-

如果使用urllib时SSL证书验证失败

from urllib.request import Request, urlopen 
import ssl
web = "https://www.festo.com/de/de" 
headers = {
   "User-Agent": "XYZ/3.0",
   "X-Requested-With": "XMLHttpRequest"
} 
request = Request(web, headers=headers)

ctx = ssl.create_default_context()
ctx.check_hostname = False
ctx.verify_mode = ssl.CERT_NONE 

content = urlopen(request,context=ctx).read()

感谢以下用户Question 1、Question 2、SSL-Certificate

赞(0）回复(0）举报 2023-03-31

我来回答

Python 3.4 urllib.request错误(http403)

3条答案

如何导入`urllib.request`和`urllib.parse`：

如何进行GET请求：

如何进行POST请求：

如何发出POST请求（`403 forbidden`响应）：

如何发送GET请求（`403 forbidden`响应）：

溶液1：-

方案二：-

溶液3：-

附加：-

相关问题

热门标签

最新问答

Python 3.4 urllib.request错误(http403)

3条答案

如何导入urllib.request和urllib.parse：

如何进行GET请求：

如何进行POST请求：

如何发出POST请求（403 forbidden响应）：

如何发送GET请求（403 forbidden响应）：

溶液1：-

方案二：-

溶液3：-

附加：-

相关问题

热门标签

最新问答

如何导入`urllib.request`和`urllib.parse`：

如何发出POST请求（`403 forbidden`响应）：

如何发送GET请求（`403 forbidden`响应）：