删除所有Azure表记录

9q78igpj  于 2023-03-31  发布在  其他
关注(0)|答案(8)|浏览(119)

我有一个Azure存储表,它有3k+记录。
删除表中所有行的最有效方法是什么?

kkbh8khc

kkbh8khc1#

对于3000条记录,最简单的方法是删除表。但是请注意,当您删除表时,它并不是在当时被删除,而是被放入某种待删除队列中,并在一段时间后被实际删除。这段时间取决于系统的负载+表中的实体数。在此期间,您将无法重新创建此表或使用此表。
如果继续使用表对您很重要,那么唯一的其他选择就是删除实体。为了更快地删除,您可以考虑使用Entity Batch Transactions删除实体。但是对于删除实体,你需要首先获取实体。通过只提取实体的PartitionKeyRowKey属性,而不是只提取这两个属性,可以加快提取过程。删除实体需要属性。

ig9co6j1

ig9co6j12#

对于后来发现这个问题的人来说,接受答案“刚刚删除了表”的问题是,虽然它在存储模拟器中工作得很好,但在生产中会随机失败。如果您的应用/服务需要定期重新生成表,那么您会发现由于冲突或删除仍在进行中而导致失败。
相反,我发现最快和最防错的EF友好方法是删除分段查询中的所有行。下面是我使用的一个简单的下拉式示例。传入客户端,表名和实现ITableEntity的类型。

private async Task DeleteAllRows<T>(string table, CloudTableClient client) where T: ITableEntity, new()
    {
        // query all rows
        CloudTable tableref = client.GetTableReference(table);           
        var query = new TableQuery<T>();
        TableContinuationToken token = null;
                                         
        do
        {
            var result = await tableref.ExecuteQuerySegmentedAsync(query, token);  
            foreach (var row in result)
            {
                var op = TableOperation.Delete(row);
                tableref.ExecuteAsync(op);
            }
            token = result.ContinuationToken;
        } while (token != null);  
        
    }

示例用法:

table = client.GetTableReference("TodayPerformanceSnapshot");
created = await table.CreateIfNotExistsAsync();

if(!created)
{ 
    // not created, table already existed, delete all content
   await DeleteAllRows<TodayPerformanceContainer>("TodayPerformanceSnapshot", client);
   log.Information("Azure Table:{Table} Purged", table);
}

批处理方法需要花费更多的精力,因为你必须处理“一个批处理中只有相同的分区键”和“只有100行”的限制。

private async Task DeleteAllRows<T>(string table, CloudTableClient client) where T: ITableEntity, new()
    {
        // query all rows
        CloudTable tableref = client.GetTableReference(table);           
        var query = new TableQuery<T>();
        TableContinuationToken token = null;            
        TableBatchOperation batchops = new TableBatchOperation();
        Dictionary<string, Stack<TableOperation>> pendingOperations = new Dictionary<string, Stack<TableOperation>>();
        
        do
        {
            var result = await tableref.ExecuteQuerySegmentedAsync(query, token);
            foreach (var row in result)
            {
               var op = TableOperation.Delete(row);
                if (pendingOperations.ContainsKey(row.PartitionKey))
                {
                    pendingOperations[row.PartitionKey].Push(op);
                }
                else
                {
                    pendingOperations.Add(row.PartitionKey, new Stack<TableOperation>() );
                    pendingOperations[row.PartitionKey].Push(op);
                }                                    
            }
            token = result.ContinuationToken;
        } while (token != null);

        // order by partition key            
        foreach (var key in pendingOperations.Keys)
        {                
            log.Information($"Deleting:{key}");                
            var rowStack = pendingOperations[key];
            int max = 100;
            int current = 0;

            while (rowStack.Count != 0)
            {
                // dequeue in groups of 100
                while (current < max && rowStack.Count > 0)
                {
                    var op = rowStack.Pop();
                    batchops.Add(op);
                    current++;
                }

                //execute and reset
                _ = await tableref.ExecuteBatchAsync(batchops);
                log.Information($"Deleted batch of size:{batchops.Count}");
                current = 0;
                batchops.Clear();
            }
        }                       
    }
shyt4zoc

shyt4zoc3#

我使用类似这样的东西。我们按日期划分密钥,您的情况可能有所不同:

async Task Main()
{
    var startDate = new DateTime(2011, 1, 1);
    var endDate = new DateTime(2012, 1, 1);

    var account = CloudStorageAccount.Parse("connString");
    var client = account.CreateCloudTableClient();
    var table = client.GetTableReference("TableName");

    var dates = Enumerable.Range(0, Math.Abs((startDate.Month - endDate.Month) + 12 * (startDate.Year - endDate.Year)))
        .Select(offset => startDate.AddMonths(offset))
        .ToList();

    foreach (var date in dates)
    {
        var key = $"{date.ToShortDateString()}";

        var query = $"(PartitionKey eq '{key}')";
        var rangeQuery = new TableQuery<TableEntity>().Where(query);

        var result = table.ExecuteQuery<TableEntity>(rangeQuery);
        $"Deleting data from {date.ToShortDateString()}, key {key}, has {result.Count()} records.".Dump();

        var allTasks = result.Select(async r =>
        {
            try
            {
                await table.ExecuteAsync(TableOperation.Delete(r));
            }
            catch (Exception e) { $"{r.RowKey} - {e.ToString()}".Dump(); }
        });
        await Task.WhenAll(allTasks);
    }
}
ac1kyiln

ac1kyiln4#

这取决于数据的结构,但如果可以为所有记录编写查询,则可以将每个记录添加到TableBatchOperation中并一次性执行它们。
下面是一个示例,它只获取同一分区键中的所有结果,改编自如何开始使用Azure表存储和Visual Studio连接服务。

// query all rows
CloudTable peopleTable = tableClient.GetTableReference("myTableName");
var query = new TableQuery<MyTableEntity>();
var result = await remindersTable.ExecuteQuerySegmentedAsync(query, null);

// Create the batch operation.
TableBatchOperation batchDeleteOperation = new TableBatchOperation();

foreach (var row in result)
{
    batchDeleteOperation.Delete(row);
}

// Execute the batch operation.
await remindersTable.ExecuteBatchAsync(batchDeleteOperation);
mklgxw1f

mklgxw1f5#

我使用下面的函数首先将所有分区键放入一个队列中,然后循环遍历该键,以100为一批删除所有行。

Queue queue = new Queue();
            queue.Enqueue("PartitionKeyTodelete1");
            queue.Enqueue("PartitionKeyTodelete2");
            queue.Enqueue("PartitionKeyTodelete3");

            while (queue.Count > 0)
            {
                string partitionToDelete = (string)queue.Dequeue();

                TableQuery<TableEntity> deleteQuery = new TableQuery<TableEntity>()
                  .Where(TableQuery.GenerateFilterCondition("PartitionKey", QueryComparisons.Equal, partitionToDelete))
                  .Select(new string[] { "PartitionKey", "RowKey" });

                TableContinuationToken continuationToken = null;

                do
                {
                    var tableQueryResult = await myTable.ExecuteQuerySegmentedAsync(deleteQuery, continuationToken);

                    continuationToken = tableQueryResult.ContinuationToken;

                    // Split into chunks of 100 for batching
                    List<List<TableEntity>> rowsChunked = tableQueryResult.Select((x, index) => new { Index = index, Value = x })
                        .Where(x => x.Value != null)
                        .GroupBy(x => x.Index / 100)
                        .Select(x => x.Select(v => v.Value).ToList())
                        .ToList();

                    // Delete each chunk of 100 in a batch
                    foreach (List<TableEntity> rows in rowsChunked)
                    {
                        TableBatchOperation tableBatchOperation = new TableBatchOperation();
                        rows.ForEach(x => tableBatchOperation.Add(TableOperation.Delete(x)));

                        await myTable.ExecuteBatchAsync(tableBatchOperation);
                    }
                }
                while (continuationToken != null);
            }
qv7cva1a

qv7cva1a6#

我最近写了一个库,可以做到这一点。
来源/文档:https://github.com/pflajszer/AzureTablesLifecycleManager
对于您的用例,代码看起来像这样:

// inject ITableManager in the constructor:

private readonly ITableManager _api;

public MyClass(ITableManager api)
{
    _api = api;
}
/// <summary>
/// Delete all data from a single table
/// </summary>
/// <typeparam name="T"></typeparam>
/// <param name="tableName"></param>
/// <returns></returns>
public Task<DataTransferResponse<T>> DeleteTableDataAsync<T>(string tableName) where T : class, ITableEntity, new()
{
    // this query will return a single table with a given name:
    Expression<Func<TableItem, bool>> tableQuery = x => x.Name == tableName;

    // this query will return all the data from the table:
    Expression<Func<T, bool>> dataQuery = x => true;
             
    // ... but you can use LINQ to filter results too, like:
    // Expression<Func<T, bool>> anotherExampleOfdataQuery = x => x.Timestamp < DateTime.Now.AddYears(-1);

    return _api.DeleteDataFromTablesAsync<T>(tableQuery, dataQuery);
}

...或者,正如Gaurav Mantri所建议的,您可以直接删除表本身:

/// <summary>
/// Delete a single table
/// </summary>
/// <param name="tableName"></param>
/// <returns></returns>
public Task<DataTransferResponse<TableItem>> DeleteTableAsync(string tableName)
{
    // this query will return a single table with a given name:
    Expression<Func<TableItem, bool>> tableQuery = x => x.Name == tableName;

    return _api.DeleteTablesAsync(tableQuery);
}
kjthegm6

kjthegm67#

以下是我的解决方案,使用了新的Azure.Data.Tables SDK,具有以下增强功能:

  • 获取每页1000行
  • 仅获取每行的PartitonKey和RowKey
  • 通过PartitionKey将要删除的行分组为最多100个批
  • 作为TableClient的扩展方法编写,因此易于重用

注意:我使用System.Linq.Asyncnuget包使代码更具可读性。

/// <summary>
/// Deletes all rows from the table
/// </summary>
/// <param name="tableClient">The authenticated TableClient</param>
/// <returns></returns>
public static async Task DeleteAllEntitiesAsync(this TableClient tableClient)
{
    // Only the PartitionKey & RowKey fields are required for deletion
    AsyncPageable<TableEntity> entities = tableClient
        .QueryAsync<TableEntity>(select: new List<string>() { "PartitionKey", "RowKey" }, maxPerPage: 1000);

    await entities.AsPages().ForEachAwaitAsync(async page => {
        // Since we don't know how many rows the table has and the results are ordered by PartitonKey+RowKey
        // we'll delete each page immediately and not cache the whole table in memory
        await BatchManipulateEntities(tableClient, page.Values, TableTransactionActionType.Delete).ConfigureAwait(false);
    });
}

/// <summary>
/// Groups entities by PartitionKey into batches of max 100 for valid transactions
/// </summary>
/// <returns>List of Azure Responses for Transactions</returns>
public static async Task<List<Response<IReadOnlyList<Response>>>> BatchManipulateEntities<T>(TableClient tableClient, IEnumerable<T> entities, TableTransactionActionType tableTransactionActionType) where T : class, ITableEntity, new()
{
    var groups = entities.GroupBy(x => x.PartitionKey);
    var responses = new List<Response<IReadOnlyList<Response>>>();
    foreach (var group in groups)
    {
        List<TableTransactionAction> actions;
        var items = group.AsEnumerable();
        while (items.Any())
        {
            var batch = items.Take(100);
            items = items.Skip(100);

            actions = new List<TableTransactionAction>();
            actions.AddRange(batch.Select(e => new TableTransactionAction(tableTransactionActionType, e)));
            var response = await tableClient.SubmitTransactionAsync(actions).ConfigureAwait(false);
            responses.Add(response);
        }
    }
    return responses;
}
omhiaaxx

omhiaaxx8#

这是一个比较晚的展示,但这可能是有用的人(像我)谁遇到这一点。有一种方法可以在Azure存储资源管理器做到这一点。
首先打开要清除的表。
如果需要,运行查询,以便只返回要删除的记录。
单击数据下导航器上的双右箭头

这将缓存数据。
现在,您可以使用“全选”按钮上的下拉菜单来使用“全选缓存”选项

最后,单击“删除”按钮。

相关问题