在运行时为python3.9和boto3 - 1.20.32的AWS Lambda中,我运行以下代码:
s3_client = boto3.client(service_name="s3")
s3_bucket = "bucket"
s3_other_bucket = "other_bucket"
def multiprocess_s3upload(tar_index: dict):
def _upload(filename, bytes_range):
src_key = ...
# get single raw file in tar with bytes range
s3_obj = s3_client.get_object(
Bucket=s3_bucket,
Key=src_key,
Range=f"bytes={bytes_range}"
)
# upload raw file
# error occur !!!!!
s3_client.upload_fileobj(
s3_obj["Body"],
s3_other_bucket,
filename
)
def _wait(procs):
for p in procs:
p.join()
processes = []
proc_limit = 256 # limit concurrent processes to avoid "open too much files" error
for filename, bytes_range in tar_index.items():
# filename = "hello.txt"
# bytes_range = "1024-2048"
proc = Process(
target=_upload,
args=(filename, bytes_range)
)
proc.start()
processes.append(proc)
if len(processes) == proc_limit:
_wait(processes)
processes = []
_wait(processes)
这个程序是从一个s3存储桶中的tar文件中提取部分原始文件,然后将每个原始文件上传到另一个s3存储桶中,一个tar文件中可能有数千个原始文件,所以我使用多进程来加快s3上传操作。
而且,我在一个关于SSLError的子进程中随机处理同一个tar文件时遇到了异常。我尝试了不同的tar文件,得到了相同的结果。只有最后一个子进程抛出了异常,其余的子进程工作正常。
Process Process-2:
Traceback (most recent call last):
File "/var/runtime/urllib3/response.py", line 441, in _error_catcher
yield
File "/var/runtime/urllib3/response.py", line 522, in read
data = self._fp.read(amt) if not fp_closed else b""
File "/var/lang/lib/python3.9/http/client.py", line 463, in read
n = self.readinto(b)
File "/var/lang/lib/python3.9/http/client.py", line 507, in readinto
n = self.fp.readinto(b)
File "/var/lang/lib/python3.9/socket.py", line 704, in readinto
return self._sock.recv_into(b)
File "/var/lang/lib/python3.9/ssl.py", line 1242, in recv_into
return self.read(nbytes, buffer)
File "/var/lang/lib/python3.9/ssl.py", line 1100, in read
return self._sslobj.read(len, buffer)
ssl.SSLError: [SSL: WRONG_VERSION_NUMBER] wrong version number (_ssl.c:2633)
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/var/lang/lib/python3.9/multiprocessing/process.py", line 315, in _bootstrap
self._target(*self._args, **self._kwargs)
File "/var/task/main.py", line 144, in _upload
s3_client.upload_fileobj(
File "/var/runtime/boto3/s3/inject.py", line 540, in upload_fileobj
return future.result()
File "/var/runtime/s3transfer/futures.py", line 103, in result
return self._coordinator.result()
File "/var/runtime/s3transfer/futures.py", line 266, in result
raise self._exception
File "/var/runtime/s3transfer/tasks.py", line 269, in _main
self._submit(transfer_future=transfer_future, **kwargs)
File "/var/runtime/s3transfer/upload.py", line 588, in _submit
if not upload_input_manager.requires_multipart_upload(
File "/var/runtime/s3transfer/upload.py", line 404, in requires_multipart_upload
self._initial_data = self._read(fileobj, threshold, False)
File "/var/runtime/s3transfer/upload.py", line 463, in _read
return fileobj.read(amount)
File "/var/runtime/botocore/response.py", line 82, in read
chunk = self._raw_stream.read(amt)
File "/var/runtime/urllib3/response.py", line 544, in read
raise IncompleteRead(self._fp_bytes_read, self.length_remaining)
File "/var/lang/lib/python3.9/contextlib.py", line 137, in __exit__
self.gen.throw(typ, value, traceback)
File "/var/runtime/urllib3/response.py", line 452, in _error_catcher
raise SSLError(e)
urllib3.exceptions.SSLError: [SSL: WRONG_VERSION_NUMBER] wrong version number (_ssl.c:2633)
根据这个10年前类似的问题Multi-threaded S3 download doesn't terminate,根本原因可能是boto3s3上传使用了一个非线程安全的库来发送http请求,但是,这个解决方案对我不起作用。
我发现了一个关于我的问题的boto3 issue。这个问题已经消失了,作者没有做任何改变。
其实这个问题最近已经自行消失了,我也没有(!)做任何的改变,我想这个问题是亚马逊制造并修复的,我只是怕它会再次出现...
有人知道怎么修吗?
1条答案
按热度按时间h22fl7wq1#
根据boto3关于多处理的文档(doc),
资源示例不是线程安全的,不应跨线程或进程共享。这些特殊类包含无法共享的附加 meta数据。建议为每个线程或进程创建一个新资源:
我修改过的代码,
似乎未发生SSL错误异常。