如何处理IncompleteRead:在Python中

vu8f3i0k  于 2023-11-15  发布在  Python
关注(0)|答案(9)|浏览(227)

我试图从一个网站获取一些数据.然而,它返回我incomplete read .我试图获取的数据是一个巨大的嵌套链接集.我做了一些在线研究,发现这可能是由于服务器错误(一个分块传输编码完成之前达到预期的大小).我还发现了一个解决方案,上面这个link
然而,我不确定如何在我的情况下使用它。

br = mechanize.Browser()
br.addheaders = [('User-agent', 'Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.1;Trident/5.0)')]
urls = "http://shop.o2.co.uk/mobile_phones/Pay_Monthly/smartphone/all_brands"
page = urllib2.urlopen(urls).read()
soup = BeautifulSoup(page)
links = soup.findAll('img',url=True)

for tag in links:
    name = tag['alt']
    tag['url'] = urlparse.urljoin(urls, tag['url'])
    r = br.open(tag['url'])
    page_child = br.response().read()
    soup_child = BeautifulSoup(page_child)
    contracts = [tag_c['value']for tag_c in soup_child.findAll('input', {"name": "tariff-duration"})]
    data_usage = [tag_c['value']for tag_c in soup_child.findAll('input', {"name": "allowance"})]
    print contracts
    print data_usage

字符串
请帮我拿一下,谢谢

tct7dpnv

tct7dpnv1#

您在问题中包含的link只是一个执行urllib的read()函数的 Package 器,它会为您捕获任何未完成的读取异常。如果您不想实现整个补丁,您可以在读取链接的地方抛出一个try/catch循环。例如:

try:
    page = urllib2.urlopen(urls).read()
except httplib.IncompleteRead, e:
    page = e.partial

字符串
对于Python 3

try:
    page = request.urlopen(urls).read()
except (http.client.IncompleteRead) as e:
    page = e.partial

slmsl1lt

slmsl1lt2#

请注意,这个答案仅适用于Python 2(它于2013年发布)
我发现在我的情况下:发送HTTP/1.0请求,添加此,修复问题。

import httplib
httplib.HTTPConnection._http_vsn = 10
httplib.HTTPConnection._http_vsn_str = 'HTTP/1.0'

字符串
在我完成请求之后:

req = urllib2.Request(url, post, headers)
filedescriptor = urllib2.urlopen(req)
img = filedescriptor.read()


在我回到http 1.1之后,(对于支持1.1的连接):

httplib.HTTPConnection._http_vsn = 11
httplib.HTTPConnection._http_vsn_str = 'HTTP/1.1'


技巧是使用http 1.0而不是默认的http/1.1 http 1.1可以处理块,但由于某种原因web服务器不,所以我们在http 1.0中处理请求
对于Python 3,它会告诉你
ModuleNotFoundError:No module named 'footer'
然后尝试使用http.client模块,它会解决这个问题

import http.client as http
http.HTTPConnection._http_vsn = 10
http.HTTPConnection._http_vsn_str = 'HTTP/1.0'

nwnhqdif

nwnhqdif3#

对我来说,有效的方法是将IncompleteRead作为异常捕获,并通过将其放入下面的循环中来收集您在每次迭代中设法读取的数据:(注意,我使用的是Python 3.4.1,urllib库在2.7和3.4之间发生了变化)

try:
    requestObj = urllib.request.urlopen(url, data)
    responseJSON=""
    while True:
        try:
            responseJSONpart = requestObj.read()
        except http.client.IncompleteRead as icread:
            responseJSON = responseJSON + icread.partial.decode('utf-8')
            continue
        else:
            responseJSON = responseJSON + responseJSONpart.decode('utf-8')
            break

    return json.loads(responseJSON)

except Exception as RESTex:
    print("Exception occurred making REST call: " + RESTex.__str__())

字符串

anhgbhbe

anhgbhbe4#

你可以使用requests来代替urllib2requests基于urllib3,所以它很少有任何问题。把它放在一个循环中尝试3次,它会更强大。你可以这样使用它:

import requests      

msg = None   
for i in [1,2,3]:        
    try:  
        r = requests.get(self.crawling, timeout=30)
        msg = r.text
        if msg: break
    except Exception as e:
        sys.stderr.write('Got error when requesting URL "' + self.crawling + '": ' + str(e) + '\n')
        if i == 3 :
            sys.stderr.write('{0.filename}@{0.lineno}: Failed requesting from URL "{1}" ==> {2}\n'.                       format(inspect.getframeinfo(inspect.currentframe()), self.crawling, e))
            raise e
        time.sleep(10*(i-1))

字符串

kx7yvsdv

kx7yvsdv5#

python3 FYI

from urllib import request
import http.client
import os
url = 'http://shop.o2.co.uk/mobile_phones/Pay_Monthly/smartphone/all_brand'
try:    
    response = request.urlopen(url)                                       
    file = response.read()  
except http.client.IncompleteRead as e:
    file = e.partial
except Exception as result:
    print("Unkonw error" + str(result))
    return

#   save  file 
    with open(file_path, 'wb') as f:
         print("save -> %s " % file_path)
         f.write(file)

字符串

nwwlzxa7

nwwlzxa76#

我发现是我的病毒检测器/防火墙导致了这个问题。AVG的“在线防护”部分。

mgdq6dx1

mgdq6dx17#

诀窍是继续下载与'request.add_header(' Range ','bytes=%d-' % len(return_raw))'如果服务器支持它。

import urllib.request
from http.client import IncompleteRead
import time

def download_file(request, unsafe=False, max_retries=15):
    bytes_ranges_supported = False
    return_raw = b''

    # Check if is supported bytes ranges
    try:
        with urllib.request.urlopen(request) as response:
            if response.headers.get('Accept-Ranges') == 'bytes':
                bytes_ranges_supported = True
    except:
        pass

    i = max_retries
    while (i > 0):
        i -= 1
        try:
            if bytes_ranges_supported:
                request.add_header('Range', 'bytes=%d-' % len(return_raw))
            with urllib.request.urlopen(request) as response:
                return_raw += response.read()
                break  # If the read was successful, break the loop
        except IncompleteRead as e:
            return_raw += e.partial
            if not bytes_ranges_supported and (unsafe or i == 0):
                break  # If bytes ranges not supported and unsafe or no retries left, break the loop
        except:
            raise
        
        finally:
            try:
                time.sleep(0.10)
            except OSError:
                break
            except KeyboardInterrupt:
                raise

    return return_raw

url = 'https://google.com/'
req = urllib.request.Request(url, headers={'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/119.0.0.0 Safari/537.36'})

with open('file.html', 'wb') as f:
    f.write(download_file(req))

字符串

but5z9lq

but5z9lq8#

我尝试了所有这些解决方案,没有一个对我有效,实际上,我没有使用urllib,而是使用了http.client(Python 3)

conn = http.client.HTTPConnection('www.google.com')
conn.request('GET', '/')
r1 = conn.getresponse()
page = r1.read().decode('utf-8')

字符串
这每次都能很好地工作,而对于urllib,它每次都返回一个incompleteread异常。

ldfqzlk8

ldfqzlk89#

我只是添加了一个异常来解决这个问题。
就像

try:
    r = requests.get(url, timeout=timeout)

except (requests.exceptions.ChunkedEncodingError, requests.ConnectionError) as e:
    logging.error("There is a error: %s" % e)

字符串

相关问题