我一直在尝试使用LWP(或依赖于LWP的Web服务API)传输一个大文件,但无论我如何处理,都遇到了that the process crumbles at a certain point问题。我一时兴起,在脚本运行时观看了top
,并注意到在出现故障之前,内存使用量激增至40 GB以上。
我认为问题出在我最初使用的S3 API上,所以我决定使用LWP::UserAgent来连接服务器。不幸的是,只使用LWP:内存使用仍然膨胀,虽然它在失败前花费了更长的时间,但它在传输过程中进行了一半,然后出现了分段错误。
简单地阅读我想以段为单位传输的文件就可以了,而且内存使用量永远不会超过1. 4GB:
my $filename = "/backup/2022-12-13/accounts/backup.tar.gz";
my $size = -s $filename;
my $chunkSize = (1024*1024*100);
my $parts = ceil($size / $chunkSize);
# open 9.6 GB file
open(my $file, '<', $filename) or die("Error reading file, stopped");
binmode($file);
for (my $i = 0; $i <= $parts; $i++) {
my $chunk;
my $offset = $i * $chunkSize + 1;
read($file, $chunk, $chunkSize, $offset);
# Code to do what I need to do with the chunk goes here.
sleep(5);
print STDOUT "Uploaded $i of $parts.\n";
}
然而,突然添加LWP代码会显著增加内存使用量,并且正如我所说的,最终会出现分段错误(在传输的55%时)。
use POSIX;
use HTTP::Request::Common;
use Net::Amazon::Signature::V4;
my $awsSignature = Net::Amazon::Signature::V4->new( $config{'access_key_id'}, $config{'access_key'}, 'us-east-1', 's3' );
# Get Upload ID from Amazon.
our $simpleS3 = Amazon::S3->new({
aws_access_key_id => $config{'access_key_id'},
aws_secret_access_key => $config{'access_key'},
retry => 1
});
my $bucket = $simpleS3->bucket($bucketName);
my $uploadId = $bucket->initiate_multipart_upload('somebigobject');
my $filename = "/backup/2022-12-13/accounts/backup.tar.gz";
my $size = -s $filename;
my $chunkSize = (1024*1024*100);
my $parts = ceil($size / $chunkSize);
# open 9.6 GB file
open(my $file, '<', $filename) or die("Error reading file, stopped");
binmode($file);
for (my $i = 0; $i <= $parts; $i++) {
my $chunk;
my $offset = $i * $chunkSize + 1;
read($file, $chunk, $chunkSize, $offset);
# Code to do what I need to do with the chunk goes here.
my $request = HTTP::Request::Common::PUT("https://bucket.s3.us-east-1.amazonaws.com/somebigobject?partNumber=" . ($i + 1) . "&uploadId=" . $uploadId);
$request->header('Content-Length' => length($chunk));
$request->content($chunk);
my $signed_request = $awsSignature->sign( $request );
my $ua = LWP::UserAgent->new();
my $response = $ua->request($signed_request);
my $etag = $response->header('Etag');
# Try to make sure nothing lingers after this loop ends.
$signed_request = '';
$request = '';
$response = '';
$ua = '';
($partList{$i + 1}) = $etag =~ m#^"(.*?)"$#;
print STDOUT "Uploaded $i of $parts.\n";
}
如果我使用Paws::S3
、Net::Amazon::S3::Client
或Amazon::S3
,同样的问题也会发生--只是在这个过程中发生得更快。似乎每个块都以某种方式留在内存中。随着代码的进展,我可以看到内存使用量逐渐但显著地增加,直到达到40 GB左右。下面是在真实的代码中替换sleep(5)
的代码:
$partList{$i + 1} = $bucket->upload_part_of_multipart_upload('some-big-object', $uploadId, $i + 1, $chunk);
最后一段代码因为占用太多内存而失败:
use Amazon::S3;
our $simpleS3 = Amazon::S3->new({
aws_access_key_id => $config{'access_key_id'},
aws_secret_access_key => $config{'access_key'},
retry => 1
});
my $filename = "/backup/2022-12-13/accounts/backup.tar.gz";
my $size = -s $filename;
my $chunkSize = (1024*1024*100);
my $parts = ceil($size / $chunkSize);
my %partList;
my $uploadId = $bucket->initiate_multipart_upload('some-big-object');
# open 9.6 GB file
open(my $file, '<', $filename) or die("Error reading file, stopped");
binmode($file);
for (my $i = 0; $i <= $parts; $i++) {
my $chunk;
my $offset = $i * $chunkSize + 1;
read($file, $chunk, $chunkSize, $offset);
# Code to do what I need to do with the chunk goes here.
$partList{$i + 1} = $bucket->upload_part_of_multipart_upload('some-big-object', $uploadId, $i + 1, $chunk);
print STDOUT "Uploaded $i of $parts.\n";
}
1条答案
按热度按时间gxwragnw1#
这个问题实际上不是LWP或API的问题,而是我阅读文件时的一个愚蠢错误,我使用的是
read($file, $chunk, $chunkSize, $offset);
。这是用
$offset
创建filler,我认为它在文件中会偏移这么多。这是创建大小不断增长的块,直到它最终崩溃。相反,代码需要:生成预期的块大小这。