我有一个用java编写的azure函数,它将侦听azure上的队列消息,队列消息具有azure blob容器上zip文件的路径,一旦收到队列消息,它将从azure上的路径位置获取zip文件并解压缩到azure上的容器。它适用于小文件,但显示大于80MB FailureException: OutOfMemoryError: Java heap spaceStack
例外。我的代码如下
@FunctionName("queueprocessor")
public void run(@QueueTrigger(name = "msg",
queueName = "queuetest",
dataType = "",
connection = "AzureWebJobsStorage") Details message,
final ExecutionContext executionContext,
@BlobInput(name = "file",
dataType = "binary",
connection = "AzureWebJobsStorage",
path = "{Path}") byte[] content) {
executionContext.getLogger().info("PATH: " + message.getPath());
CloudStorageAccount storageAccount = null;
CloudBlobClient blobClient = null;
CloudBlobContainer container = null;
try {
String connectStr = "DefaultEndpointsProtocol=https;AccountName=name;AccountKey=mykey;EndpointSuffix=core.windows.net";
//unique name of the container
String containerName = "output";
// Config to upload file size > 1MB in chunks
int deltaBackoff = 2;
int maxAttempts = 2;
BlobRequestOptions blobReqOption = new BlobRequestOptions();
blobReqOption.setSingleBlobPutThresholdInBytes(1024 * 1024); // 1MB
blobReqOption.setRetryPolicyFactory(new RetryExponentialRetry(deltaBackoff, maxAttempts));
// Parse the connection string and create a blob client to interact with Blob storage
storageAccount = CloudStorageAccount.parse(connectStr);
blobClient = storageAccount.createCloudBlobClient();
blobClient.setDefaultRequestOptions(blobReqOption);
container = blobClient.getContainerReference(containerName);
container.createIfNotExists(BlobContainerPublicAccessType.CONTAINER, new BlobRequestOptions(), new OperationContext());
ZipInputStream zipIn = new ZipInputStream(new ByteArrayInputStream(content));
ZipEntry zipEntry = zipIn.getNextEntry();
while (zipEntry != null) {
executionContext.getLogger().info("ZipEntry name: " + zipEntry.getName());
//Getting a blob reference
CloudBlockBlob blob = container.getBlockBlobReference(zipEntry.getName());
ByteArrayOutputStream outputB = new ByteArrayOutputStream();
byte[] buf = new byte[1024];
int n;
while ((n = zipIn.read(buf, 0, 1024)) != -1) {
outputB.write(buf, 0, n);
}
// Upload to container
ByteArrayInputStream inputS = new ByteArrayInputStream(outputB.toByteArray());
blob.setStreamWriteSizeInBytes(256 * 1024); // 256K
blob.upload(inputS, inputS.available());
executionContext.getLogger().info("ZipEntry name: " + zipEntry.getName() + " extracted");
zipIn.closeEntry();
zipEntry = zipIn.getNextEntry();
}
zipIn.close();
executionContext.getLogger().info("FILE EXTRACTION FINISHED");
} catch(Exception e) {
e.printStackTrace();
}
}
``` `Details message` 具有id和文件路径,路径作为 `@BlobInput(..., path ={Path},...)` . 根据我的分析我觉得 `@BlobInput` 正在将完整文件加载到内存中这就是为什么我 `OutOfMemoryError` . 如果我是对的,请告诉我还有什么办法可以避免吗?。因为将来文件大小可能会达到2gb。如果在解压代码中有任何错误,请告诉我。谢谢。
1条答案
按热度按时间0wi1tuuw1#
我将@joachimsauer的建议总结如下。
当我们使用azure函数blob存储绑定来处理java函数应用程序中的azure blob内容时,它会将整个内容保存在内存中。使用它来处理大文件,我们可能面临
OutOfMemoryError
. 因此,如果我们想处理大尺寸的azure blob,我们应该使用blobsdk打开一个输入流,然后用该流处理内容。例如
软件开发包
代码
详情请参阅此处。