perl LWP::UserAgent在大文件PUT期间内存使用问题

8yparm6h  于 2022-12-19  发布在  Perl
关注(0)|答案(1)|浏览(152)

我一直在尝试使用LWP(或依赖于LWP的Web服务API)传输一个大文件,但无论我如何处理,都遇到了that the process crumbles at a certain point问题。我一时兴起,在脚本运行时观看了top,并注意到在出现故障之前,内存使用量激增至40 GB以上。
我认为问题出在我最初使用的S3 API上,所以我决定使用LWP::UserAgent来连接服务器。不幸的是,只使用LWP:内存使用仍然膨胀,虽然它在失败前花费了更长的时间,但它在传输过程中进行了一半,然后出现了分段错误。
简单地阅读我想以段为单位传输的文件就可以了,而且内存使用量永远不会超过1. 4GB:

my $filename = "/backup/2022-12-13/accounts/backup.tar.gz";
my $size = -s $filename; 
my $chunkSize = (1024*1024*100);
my $parts = ceil($size / $chunkSize);

# open 9.6 GB file
open(my $file, '<', $filename) or die("Error reading file, stopped");
binmode($file); 

for (my $i = 0; $i <= $parts; $i++) {
    my $chunk;
    my $offset = $i * $chunkSize + 1;

    read($file, $chunk, $chunkSize, $offset);

    # Code to do what I need to do with the chunk goes here.
    sleep(5);

    print STDOUT "Uploaded $i of $parts.\n";
}

然而,突然添加LWP代码会显著增加内存使用量,并且正如我所说的,最终会出现分段错误(在传输的55%时)。

use POSIX;
use HTTP::Request::Common;
use Net::Amazon::Signature::V4;
my $awsSignature = Net::Amazon::Signature::V4->new( $config{'access_key_id'}, $config{'access_key'}, 'us-east-1', 's3' );

# Get Upload ID from Amazon.
our $simpleS3 = Amazon::S3->new({
    aws_access_key_id  => $config{'access_key_id'},
    aws_secret_access_key => $config{'access_key'},
    retry => 1
}); 
my $bucket = $simpleS3->bucket($bucketName); 
my $uploadId = $bucket->initiate_multipart_upload('somebigobject');

my $filename = "/backup/2022-12-13/accounts/backup.tar.gz";
my $size = -s $filename; 
my $chunkSize = (1024*1024*100);
my $parts = ceil($size / $chunkSize);

# open 9.6 GB file
open(my $file, '<', $filename) or die("Error reading file, stopped");
binmode($file); 

for (my $i = 0; $i <= $parts; $i++) {
    my $chunk;
    my $offset = $i * $chunkSize + 1;

    read($file, $chunk, $chunkSize, $offset);

    # Code to do what I need to do with the chunk goes here.
    my $request = HTTP::Request::Common::PUT("https://bucket.s3.us-east-1.amazonaws.com/somebigobject?partNumber=" . ($i + 1) . "&uploadId=" . $uploadId);
    $request->header('Content-Length' => length($chunk));
    $request->content($chunk);
    my $signed_request = $awsSignature->sign( $request );
    
    my $ua = LWP::UserAgent->new();
    my $response = $ua->request($signed_request);
    
    my $etag = $response->header('Etag');
    
    # Try to make sure nothing lingers after this loop ends.
    $signed_request = '';
    $request = '';
    $response = '';
    $ua = '';           
        
    ($partList{$i + 1}) = $etag =~ m#^"(.*?)"$#;

    print STDOUT "Uploaded $i of $parts.\n";
}

如果我使用Paws::S3Net::Amazon::S3::ClientAmazon::S3,同样的问题也会发生--只是在这个过程中发生得更快。似乎每个块都以某种方式留在内存中。随着代码的进展,我可以看到内存使用量逐渐但显著地增加,直到达到40 GB左右。下面是在真实的代码中替换sleep(5)的代码:

$partList{$i + 1} = $bucket->upload_part_of_multipart_upload('some-big-object', $uploadId, $i + 1, $chunk);

最后一段代码因为占用太多内存而失败:

use Amazon::S3;
our $simpleS3 = Amazon::S3->new({
    aws_access_key_id  => $config{'access_key_id'},
    aws_secret_access_key => $config{'access_key'},
    retry => 1
}); 

my $filename = "/backup/2022-12-13/accounts/backup.tar.gz";
my $size = -s $filename; 
my $chunkSize = (1024*1024*100);
my $parts = ceil($size / $chunkSize);
my %partList;

my $uploadId = $bucket->initiate_multipart_upload('some-big-object');

# open 9.6 GB file
open(my $file, '<', $filename) or die("Error reading file, stopped");
binmode($file); 

for (my $i = 0; $i <= $parts; $i++) {
    my $chunk;
    my $offset = $i * $chunkSize + 1;

    read($file, $chunk, $chunkSize, $offset);

    # Code to do what I need to do with the chunk goes here.
    $partList{$i + 1} = $bucket->upload_part_of_multipart_upload('some-big-object', $uploadId, $i + 1, $chunk);

    print STDOUT "Uploaded $i of $parts.\n";
}
gxwragnw

gxwragnw1#

这个问题实际上不是LWP或API的问题,而是我阅读文件时的一个愚蠢错误,我使用的是read($file, $chunk, $chunkSize, $offset);
这是用$offset创建filler,我认为它在文件中会偏移这么多。这是创建大小不断增长的块,直到它最终崩溃。相反,代码需要:

seek ($file, $offset, 0);
read ($file, $chunk, $chunkSize);

生成预期的块大小这。

相关问题