正在获取从Azure Blob修改的最新文件

tcbh2hod  于 2022-11-17  发布在  其他
关注(0)|答案(8)|浏览(181)

假设我每天在blob存储中生成几个json文件,我想做的是在我的任何目录中得到最新修改的文件,所以我在blob中会有这样的内容:

2016/01/02/test.json
2016/01/02/test2.json
2016/02/03/test.json

我想得到2016/02/03/test.json。所以一种方法是得到文件的完整路径,然后做一个regex检查来找到最新创建的目录,但是如果我在每个目录中有多个josn文件,这就不起作用了。有没有像File.GetLastWriteTime这样的东西来得到最新修改的文件?我正在使用这些代码来得到所有的文件btw:

public static CloudBlobContainer GetBlobContainer(string accountName, string accountKey, string containerName)
{
    CloudStorageAccount storageAccount = new CloudStorageAccount(new StorageCredentials(accountName, accountKey), true);
    // blob client
    CloudBlobClient blobClient = storageAccount.CreateCloudBlobClient();
    // container
    CloudBlobContainer blobContainer = blobClient.GetContainerReference(containerName);
    return blobContainer;
}

public static IEnumerable<IListBlobItem> GetBlobItems(CloudBlobContainer container)
{
    IEnumerable<IListBlobItem> items = container.ListBlobs(useFlatBlobListing: true);
    return items;
}

public static List<string> GetAllBlobFiles(IEnumerable<IListBlobItem> blobs)
{
    var listOfFileNames = new List<string>();

    foreach (var blob in blobs)
    {
        var blobFileName = blob.Uri.Segments.Last();
        listOfFileNames.Add(blobFileName);
    }
    return listOfFileNames;
}
nle07wnf

nle07wnf1#

每个IListBlobItem都将是一个云块Blob、一个云页面Blob或一个云块目录。
在转换为块或页面blob或它们的共享基类CloudBlob(最好使用as关键字并检查是否为null)之后,您可以通过blockBlob.Properties.LastModified访问修改的日期。
请注意,您的实现将执行O(n)扫描容器中的所有blob,如果有数十万个文件,这可能需要一段时间。尽管目前没有更有效的查询blob存储的方法,(除非您滥用了文件命名,并对日期进行了编码,使较新的日期按字母顺序排在最前面)。我建议您在手边准备一个数据库表,该表将所有文件列表表示为行,其中包含索引的DateModified列作为搜索依据,以及包含blob路径的列用于轻松访问文件。
UPDATE(2022)看起来微软现在提供了customizable Blob Index Tags。这应该允许在blob元数据上添加一个自定义的DateModified属性或类似的属性,并对blob执行高效的“大于”/“小于”查询,而 * 不 * 需要一个单独的数据库。(注意:它显然只支持字符串值,因此对于日期值,您需要确保将其保存为可按字典排序的格式,如“yyyy-MM-dd”。)

7tofc5zh

7tofc5zh2#

正如亚尔所说,您可以使用单个blob对象的LastModified属性。下面的代码片段演示了如何在引用正确的容器后执行此操作:

var latestBlob = container.ListBlobs()
    .OfType<CloudBlockBlob>()
    .OrderByDescending(m => m.Properties.LastModified)
    .ToList()
    .First();

注意:blob类型不能为<CloudBlockBlob>。请确保在必要时进行更改。

t2a7ltrp

t2a7ltrp3#

//connection string
        string storageAccount_connectionString = "**NOTE: CONNECTION STRING**";

        // Retrieve storage account from connection string.
        CloudStorageAccount storageAccount = CloudStorageAccount.Parse(storageAccount_connectionString);

        // Create the blob client.
        CloudBlobClient blobClient = storageAccount.CreateCloudBlobClient();

        // Retrieve reference to a previously created container.
        CloudBlobContainer container = blobClient.GetContainerReference("**NOTE:NAME OF CONTAINER**");
        //The specified container does not exist

        try
        {
            //root directory
            CloudBlobDirectory dira = container.GetDirectoryReference(string.Empty);
            //true for all sub directories else false 
            var rootDirFolders = dira.ListBlobsSegmentedAsync(true, BlobListingDetails.Metadata, null, null, null, null).Result;

            foreach (var blob in rootDirFolders.Results)
            {
                if (blob is CloudBlockBlob blockBlob)

                {
                    var time = blockBlob.Properties.LastModified;
                    Console.WriteLine("Data", time);

                }
            }

        }
        catch (Exception e)
        {
            //  Block of code to handle errors
            Console.WriteLine("Error", e);

        }
rhfm7lfc

rhfm7lfc4#

使用Azure Web Jobs SDK。SDK具有监视新的/更新的BLOB的选项。

rnmwe5a2

rnmwe5a25#

对于新的V12 Nuget包,前面的答案已经过时了。
新的nuget包是Azure.Storage. Blob,我使用的是版本12.8.4
下面的代码将得到你的最后修改日期。你也可以写这个代码的异步版本。

using Microsoft.WindowsAzure.Storage;
using Microsoft.WindowsAzure.Storage.Blob;
using Azure.Storage.Blobs;
using Azure.Storage.Sas;
using Azure.Storage.Blobs.Specialized;

DateTimeOffset? GetLastModified()
{
    BlobServiceClient blobServiceClient = new BlobServiceClient("connectionstring")
    BlobContainerClient blobContainerClient = blobServiceClient.GetBlobContainerClient("blobname");
    BlobClient blobClient = blobContainerClient.GetBlobClient("file.txt");
    if (blobClient == null || !blobClient.Exists()) return null;
    DateTimeOffset lastModified = blobClient.GetProperties().Value.LastModified;
    return lastModified;
}
vawmfj5a

vawmfj5a6#

如果出现问题,请使用blockBlob.Container.Properties.LastModified

yduiuuwa

yduiuuwa7#

使用Microsoft.Azure.Storage.Blob,您可以得到如下结果:

using System;
using System.Collections.Generic;
using System.IO;
using System.Threading.Tasks;
using Microsoft.Azure.Storage;
using Microsoft.Azure.Storage.Blob;

namespace ListLastModificationOnBlob
{
    class Program
    {
        static void Main(string[] args)
        {
            MainAsync().Wait();
        }

        static async Task MainAsync()
        {
            string storageAccount_connectionString = @"Your connection string";

            // Retrieve storage account from connection string.
            CloudStorageAccount storageAccount = CloudStorageAccount.Parse(storageAccount_connectionString);

            // Create the blob client.
            CloudBlobClient blobClient = storageAccount.CreateCloudBlobClient();

            var containers = await ListContainersAsync(blobClient);

            foreach (var container in containers)
            {
                Console.WriteLine(container.Name);

                try
                {
                    //root directory
                    CloudBlobDirectory dira = container.GetDirectoryReference(string.Empty);
                    //true for all sub directories else false 
                    var rootDirFolders = dira.ListBlobsSegmentedAsync(true, BlobListingDetails.Metadata, null, null, null, null).Result;

                    using (var w = new StreamWriter($"{container.Name}.csv"))
                    {
                        foreach (var blob in rootDirFolders.Results)
                        {
                            if (blob is CloudBlob blockBlob)
                            {
                                var time = blockBlob.Properties.LastModified;
                                var created = blockBlob.Properties.Created;

                                var line = $"{blockBlob.Name},{created},{time}";
                                await w.WriteLineAsync(line);
                                await w.FlushAsync();
                            }
                        }
                    }
                }
                catch (Exception e)
                {
                    //  Block of code to handle errors
                    Console.WriteLine("Error", e);

                }
            }
        }

        private static async Task<IEnumerable<CloudBlobContainer>> ListContainersAsync(CloudBlobClient cloudBlobClient)
        {
            BlobContinuationToken continuationToken = null;
            var containers = new List<CloudBlobContainer>();

            do
            {
                ContainerResultSegment response = await cloudBlobClient.ListContainersSegmentedAsync(continuationToken);
                continuationToken = response.ContinuationToken;
                containers.AddRange(response.Results);

            } while (continuationToken != null);

            return containers;
        }
    }
}

给定存储帐户的上述代码:

  • 获取帐户中的所有容器
  • 获取容器中所有blob
  • CreatedLastModified与blob名称保存在csv文件中(名称类似于容器)
ngynwnxp

ngynwnxp8#

使用rollsch和hbd的方法,我能够生成如下所示的最新图像

public string File;

public async Task OnGetAsync()
{
    var gettingLastModified = _blobServiceClient
        .GetBlobContainerClient("images")
        .GetBlobs()
        .OrderByDescending(m => m.Properties.LastModified)
        .First();

    LatestImageFromAzure = gettingLastModified.Name;

    File = await _blobService.GetBlob(LatestImageFromAzure, "images");
}

我也在使用这些方法https://www.youtube.com/watch?v=B_yDG35lb5I&t=1864s

相关问题