.net 如何有效地按索引从一个非常大的列表中删除元素?

368yc8dk  于 2023-01-10  发布在  .NET
关注(0)|答案(6)|浏览(187)

我有一个非常大的整数列表(大约20亿个元素)和一个索引列表(几千个元素),我需要从第一个列表中删除元素,我目前的方法是循环第二个列表中的所有索引,将每个索引传递给第一个列表的RemoveAt()方法:

indices.Sort();
indices.Reverse();
for (i = 0; i < indices.Count; i++)
{
    largeList.RemoveAt(indices[i]);
}

但是,大约需要2分钟才能完成,我确实需要执行这个操作快得多,有没有办法优化这个?
我有一个10核的英特尔i9X CPU,所以也许有一些并行处理的方式?

0wi1tuuw

0wi1tuuw1#

每次调用RemoveAt()时,它必须将指定索引之后的每个元素向上移动一个元素,如果在一个非常大的列表中有数千个元素要删除,这将导致许多许多(不必要的)移动。
我的想法是,如果你能计算出每个移动的起始索引和长度,你就可以在一个没有重叠移动的单遍中处理列表。这是通过比较索引列表中的相邻元素来完成的。虽然这确实意味着要构建第三个List<>移动操作来执行,但我希望有最高效的,事先计划好的最小移动策略最终会得到回报(或者可能有一种方法可以做到这一点,而不分配任何目前没有出现在我身上的对象)。
在下面的基准测试代码中,您可以看到我的实现RemoveUsingMoveStrategy()。在test模式下运行下面的启动程序,我们可以看到,当给定索引01510111515时,它和其他答案都产生了正确的结果(重复)、181920-元素List<int>中删除...

PS> dotnet run --configuration release --framework netcoreapp3.1 -- test
[snip]
RemoveUsingMoveStrategy():
        Elements: { 2, 3, 4, 6, 7, 8, 9, 12, 13, 14, 16, 17 }
           Count: 12
        Sequence: Equal to control list
        Duration: 3.95 milliseconds
        Returned: Input list
[snip]

基准测试说明

  • 我的意图是针对十亿个元素的List<int>进行基准测试,但是RemoveUsingRemoveAt()--受问题中的代码启发--效率 * 太 * 低,花费的时间太长,所以我只使用了1000万个元素。
  • 我仍然希望运行一个不包括RemoveUsingRemoveAt()的十亿元素的基准测试,为此,我引入了一个不太幼稚的实现RemoveUsingListCopy()作为比较所有列表大小的基线,顾名思义,它不修改输入列表,而是创建一个应用了删除操作的新列表。
  • 因为不是所有的实现都要求要删除的索引列表是唯一的和/或排序的,所以我认为最公平的基准测试方法是在方法需要的基准测试时间 * 包括该开销 *。
  • 我对其他答案中的代码所做的唯一其他更改是利用基准测试的列表(DataListRemovalList),并将var更改为显式类型,以使读者更清楚。
  • RemovalListLocation指示从DataList中的何处移除索引。
  • 对于BeginningMiddleEnd,它是从该位置移除的连续RemovalListSize大小的块。
  • 对于Random,它是从一个常量种子生成的RemovalListSize随机、有效、未排序、不保证唯一的索引。

为了保持结果简短(呃),我选择只对MiddleRandom的值进行基准测试--认为这将是一个很好的折中方案。

基准测试结果

  • RemoveUsingRemoveAt()可怕的。不要这样做。
  • 对于较小的列表,RemoveUsingListCopy()始终是最快的,代价是内存使用量加倍。
  • 对于更大的列表,Vernou's answer至少比其他答案快5倍,代价是依赖于实现细节。

这正好说明了你不应该总是偏爱List<>而不是一个阵列--除非你需要它的额外功能--因为它并不总是上级的。它隔离了底层阵列,阻止你(没有反射)在它上面使用更高性能的访问方法,比如Array.Copy()unsafe代码。

  • Theodor Zoulias's answer在较小的列表中表现得很好,但我认为必须“接触”所有1000万个索引--每个索引都调用HashSet<>.Contains()并递增一个索引变量--在较大的列表中确实会伤害它。
  • 其余的实现(包括我的)具有大致相同的性能:不是最好的,但还是相当不错的。

基准数据

benchmark模式下运行本答案后面定义的启动程序,我从BenchmarkDotNet得到这些结果...

PS> dotnet run --configuration release --framework netcoreapp3.1 -- benchmark
[snip]
// * Summary *

BenchmarkDotNet=v0.12.1, OS=Windows 10.0.19041.450 (2004/?/20H1)
Intel Core i7 CPU 860 2.80GHz (Nehalem), 1 CPU, 8 logical and 4 physical cores
.NET Core SDK=3.1.401
  [Host]    : .NET Core 3.1.7 (CoreCLR 4.700.20.36602, CoreFX 4.700.20.37001), X64 RyuJIT
  MediumRun : .NET Framework 4.8 (4.8.4200.0), X64 RyuJIT

Job=MediumRun  InvocationCount=1  IterationCount=15  
LaunchCount=2  UnrollFactor=1  WarmupCount=10  

|                        Method |       Runtime | DataListSize | RemovalListSize | RemovalListLocation |           Mean |         Error |        StdDev |         Median | Ratio | RatioSD |
|------------------------------ |-------------- |------------- |---------------- |-------------------- |---------------:|--------------:|--------------:|---------------:|------:|--------:|
|           RemoveUsingListCopy |      .NET 4.8 |        10000 |            2000 |              Random |       379.9 μs |      15.93 μs |      23.36 μs |       380.4 μs |  1.00 |    0.00 |
|           RemoveUsingRemoveAt |      .NET 4.8 |        10000 |            2000 |              Random |     1,463.7 μs |      93.79 μs |     137.47 μs |     1,434.7 μs |  3.87 |    0.41 |
|       RemoveUsingMoveStrategy |      .NET 4.8 |        10000 |            2000 |              Random |       635.9 μs |      19.77 μs |      29.60 μs |       624.0 μs |  1.67 |    0.14 |
| Answer63496225_TheodorZoulias |      .NET 4.8 |        10000 |            2000 |              Random |       372.1 μs |      11.52 μs |      16.15 μs |       373.8 μs |  0.99 |    0.07 |
|       Answer63496768_CoolBots |      .NET 4.8 |        10000 |            2000 |              Random |       594.5 μs |      25.13 μs |      36.03 μs |       593.1 μs |  1.57 |    0.12 |
|        Answer63495657_NetMage |      .NET 4.8 |        10000 |            2000 |              Random |       618.8 μs |      17.53 μs |      23.99 μs |       622.4 μs |  1.65 |    0.13 |
|         Answer63496256_Vernou |      .NET 4.8 |        10000 |            2000 |              Random |       645.6 μs |      27.28 μs |      39.99 μs |       632.2 μs |  1.71 |    0.16 |
|                               |               |              |                 |                     |                |               |               |                |       |         |
|           RemoveUsingListCopy | .NET Core 3.1 |        10000 |            2000 |              Random |       391.5 μs |      10.39 μs |      15.55 μs |       391.6 μs |  1.00 |    0.00 |
|           RemoveUsingRemoveAt | .NET Core 3.1 |        10000 |            2000 |              Random |     1,402.2 μs |      44.20 μs |      64.80 μs |     1,407.6 μs |  3.59 |    0.21 |
|       RemoveUsingMoveStrategy | .NET Core 3.1 |        10000 |            2000 |              Random |       557.9 μs |      19.73 μs |      27.00 μs |       557.2 μs |  1.43 |    0.10 |
| Answer63496225_TheodorZoulias | .NET Core 3.1 |        10000 |            2000 |              Random |       424.3 μs |      20.90 μs |      29.30 μs |       424.2 μs |  1.09 |    0.09 |
|       Answer63496768_CoolBots | .NET Core 3.1 |        10000 |            2000 |              Random |       535.0 μs |      19.37 μs |      27.16 μs |       537.1 μs |  1.37 |    0.08 |
|        Answer63495657_NetMage | .NET Core 3.1 |        10000 |            2000 |              Random |       557.7 μs |      18.73 μs |      25.63 μs |       550.0 μs |  1.43 |    0.09 |
|         Answer63496256_Vernou | .NET Core 3.1 |        10000 |            2000 |              Random |       554.2 μs |      13.82 μs |      18.45 μs |       554.0 μs |  1.42 |    0.07 |
|                               |               |              |                 |                     |                |               |               |                |       |         |
|           RemoveUsingListCopy |      .NET 4.8 |        10000 |            2000 |              Middle |       221.6 μs |       7.25 μs |      10.63 μs |       222.5 μs |  1.00 |    0.00 |
|           RemoveUsingRemoveAt |      .NET 4.8 |        10000 |            2000 |              Middle |     1,195.3 μs |      20.01 μs |      28.69 μs |     1,187.7 μs |  5.42 |    0.30 |
|       RemoveUsingMoveStrategy |      .NET 4.8 |        10000 |            2000 |              Middle |       405.0 μs |      13.65 μs |      19.14 μs |       410.7 μs |  1.83 |    0.10 |
| Answer63496225_TheodorZoulias |      .NET 4.8 |        10000 |            2000 |              Middle |       206.3 μs |       8.62 μs |      12.09 μs |       204.9 μs |  0.94 |    0.08 |
|       Answer63496768_CoolBots |      .NET 4.8 |        10000 |            2000 |              Middle |       427.5 μs |      15.56 μs |      22.81 μs |       435.4 μs |  1.93 |    0.13 |
|        Answer63495657_NetMage |      .NET 4.8 |        10000 |            2000 |              Middle |       405.4 μs |      13.80 μs |      19.35 μs |       403.8 μs |  1.84 |    0.11 |
|         Answer63496256_Vernou |      .NET 4.8 |        10000 |            2000 |              Middle |       413.9 μs |      15.26 μs |      20.89 μs |       419.8 μs |  1.87 |    0.12 |
|                               |               |              |                 |                     |                |               |               |                |       |         |
|           RemoveUsingListCopy | .NET Core 3.1 |        10000 |            2000 |              Middle |       235.2 μs |      10.73 μs |      15.73 μs |       236.2 μs |  1.00 |    0.00 |
|           RemoveUsingRemoveAt | .NET Core 3.1 |        10000 |            2000 |              Middle |     1,345.6 μs |      32.07 μs |      43.90 μs |     1,352.7 μs |  5.77 |    0.41 |
|       RemoveUsingMoveStrategy | .NET Core 3.1 |        10000 |            2000 |              Middle |       324.0 μs |       4.92 μs |       7.05 μs |       326.6 μs |  1.39 |    0.09 |
| Answer63496225_TheodorZoulias | .NET Core 3.1 |        10000 |            2000 |              Middle |       262.9 μs |       6.18 μs |       9.06 μs |       265.4 μs |  1.12 |    0.08 |
|       Answer63496768_CoolBots | .NET Core 3.1 |        10000 |            2000 |              Middle |       333.6 μs |      10.14 μs |      13.87 μs |       340.9 μs |  1.43 |    0.11 |
|        Answer63495657_NetMage | .NET Core 3.1 |        10000 |            2000 |              Middle |       313.5 μs |       9.05 μs |      12.69 μs |       310.5 μs |  1.34 |    0.11 |
|         Answer63496256_Vernou | .NET Core 3.1 |        10000 |            2000 |              Middle |       332.3 μs |       6.70 μs |       8.95 μs |       331.9 μs |  1.43 |    0.09 |
|                               |               |              |                 |                     |                |               |               |                |       |         |
|           RemoveUsingListCopy |      .NET 4.8 |     10000000 |            2000 |              Random |   253,977.1 μs |   2,721.70 μs |   3,989.43 μs |   253,809.0 μs |  1.00 |    0.00 |
|           RemoveUsingRemoveAt |      .NET 4.8 |     10000000 |            2000 |              Random | 5,191,083.4 μs |  13,200.66 μs |  18,931.99 μs | 5,187,162.3 μs | 20.43 |    0.34 |
|       RemoveUsingMoveStrategy |      .NET 4.8 |     10000000 |            2000 |              Random |    65,365.4 μs |     422.41 μs |     592.16 μs |    65,307.3 μs |  0.26 |    0.00 |
| Answer63496225_TheodorZoulias |      .NET 4.8 |     10000000 |            2000 |              Random |   240,584.4 μs |   3,687.89 μs |   5,048.03 μs |   244,336.1 μs |  0.95 |    0.02 |
|       Answer63496768_CoolBots |      .NET 4.8 |     10000000 |            2000 |              Random |    54,168.4 μs |   1,001.37 μs |   1,436.14 μs |    53,390.3 μs |  0.21 |    0.01 |
|        Answer63495657_NetMage |      .NET 4.8 |     10000000 |            2000 |              Random |    72,501.4 μs |     452.46 μs |     634.29 μs |    72,161.2 μs |  0.29 |    0.00 |
|         Answer63496256_Vernou |      .NET 4.8 |     10000000 |            2000 |              Random |     5,814.0 μs |      89.71 μs |     128.67 μs |     5,825.3 μs |  0.02 |    0.00 |
|                               |               |              |                 |                     |                |               |               |                |       |         |
|           RemoveUsingListCopy | .NET Core 3.1 |     10000000 |            2000 |              Random |   239,784.0 μs |   2,721.35 μs |   3,902.88 μs |   241,125.5 μs |  1.00 |    0.00 |
|           RemoveUsingRemoveAt | .NET Core 3.1 |     10000000 |            2000 |              Random | 5,538,337.5 μs | 353,505.30 μs | 495,565.06 μs | 5,208,226.1 μs | 23.12 |    2.15 |
|       RemoveUsingMoveStrategy | .NET Core 3.1 |     10000000 |            2000 |              Random |    33,071.8 μs |     103.80 μs |     138.57 μs |    33,030.5 μs |  0.14 |    0.00 |
| Answer63496225_TheodorZoulias | .NET Core 3.1 |     10000000 |            2000 |              Random |   240,825.5 μs |     851.49 μs |   1,248.11 μs |   240,520.9 μs |  1.00 |    0.02 |
|       Answer63496768_CoolBots | .NET Core 3.1 |     10000000 |            2000 |              Random |    26,265.0 μs |      90.76 μs |     124.23 μs |    26,253.0 μs |  0.11 |    0.00 |
|        Answer63495657_NetMage | .NET Core 3.1 |     10000000 |            2000 |              Random |    48,670.6 μs |     581.51 μs |     833.99 μs |    48,303.0 μs |  0.20 |    0.00 |
|         Answer63496256_Vernou | .NET Core 3.1 |     10000000 |            2000 |              Random |     5,905.5 μs |      96.27 μs |     131.78 μs |     5,915.1 μs |  0.02 |    0.00 |
|                               |               |              |                 |                     |                |               |               |                |       |         |
|           RemoveUsingListCopy |      .NET 4.8 |     10000000 |            2000 |              Middle |   153,776.2 μs |   2,454.90 μs |   3,674.38 μs |   152,872.0 μs |  1.00 |    0.00 |
|           RemoveUsingRemoveAt |      .NET 4.8 |     10000000 |            2000 |              Middle | 5,245,952.0 μs |  13,845.58 μs |  20,294.67 μs | 5,252,922.4 μs | 34.10 |    0.81 |
|       RemoveUsingMoveStrategy |      .NET 4.8 |     10000000 |            2000 |              Middle |    33,233.6 μs |     110.33 μs |     158.24 μs |    33,217.3 μs |  0.22 |    0.01 |
| Answer63496225_TheodorZoulias |      .NET 4.8 |     10000000 |            2000 |              Middle |   128,949.8 μs |     560.72 μs |     804.17 μs |   128,724.9 μs |  0.84 |    0.02 |
|       Answer63496768_CoolBots |      .NET 4.8 |     10000000 |            2000 |              Middle |    48,965.1 μs |      70.75 μs |      94.45 μs |    48,957.3 μs |  0.32 |    0.01 |
|        Answer63495657_NetMage |      .NET 4.8 |     10000000 |            2000 |              Middle |    32,641.5 μs |      66.85 μs |      91.51 μs |    32,610.0 μs |  0.21 |    0.01 |
|         Answer63496256_Vernou |      .NET 4.8 |     10000000 |            2000 |              Middle |     2,982.2 μs |      29.47 μs |      41.31 μs |     2,961.9 μs |  0.02 |    0.00 |
|                               |               |              |                 |                     |                |               |               |                |       |         |
|           RemoveUsingListCopy | .NET Core 3.1 |     10000000 |            2000 |              Middle |   144,208.7 μs |   2,035.88 μs |   2,984.16 μs |   142,693.2 μs |  1.00 |    0.00 |
|           RemoveUsingRemoveAt | .NET Core 3.1 |     10000000 |            2000 |              Middle | 5,235,957.7 μs |  13,674.19 μs |  20,043.46 μs | 5,241,536.1 μs | 36.32 |    0.78 |
|       RemoveUsingMoveStrategy | .NET Core 3.1 |     10000000 |            2000 |              Middle |    16,547.3 μs |      72.72 μs |     101.95 μs |    16,520.7 μs |  0.11 |    0.00 |
| Answer63496225_TheodorZoulias | .NET Core 3.1 |     10000000 |            2000 |              Middle |   137,218.2 μs |     716.45 μs |     980.69 μs |   137,027.0 μs |  0.95 |    0.02 |
|       Answer63496768_CoolBots | .NET Core 3.1 |     10000000 |            2000 |              Middle |    23,728.5 μs |      79.84 μs |     111.93 μs |    23,689.9 μs |  0.16 |    0.00 |
|        Answer63495657_NetMage | .NET Core 3.1 |     10000000 |            2000 |              Middle |    17,298.3 μs |     216.46 μs |     310.44 μs |    17,165.5 μs |  0.12 |    0.00 |
|         Answer63496256_Vernou | .NET Core 3.1 |     10000000 |            2000 |              Middle |     2,999.7 μs |      85.78 μs |     123.03 μs |     2,957.1 μs |  0.02 |    0.00 |
[snip]

基准代码

要查看各种实现,请向下滚动三分之一,查找用[Benchmark()]修饰的方法。
这需要BenchmarkDotNet package

using System;
using System.Collections.Generic;
using System.Linq;
using System.Reflection;
using BenchmarkDotNet.Attributes;

namespace SO63495264
{
    public class Benchmarks
    {
        public enum RemovalLocation
        {
            Random,
            Beginning,
            Middle,
            End
        }

        [Params(10_000, 10_000_000)]
        public int DataListSize
        {
            get; set;
        }

        [Params(2_000)]
        public int RemovalListSize
        {
            get; set;
        }

        //[ParamsAllValues()]
        [Params(RemovalLocation.Random, RemovalLocation.Middle)]
        public RemovalLocation RemovalListLocation
        {
            get; set;
        }

        public List<int> DataList
        {
            get;
            set;
        }

        public IReadOnlyList<int> RemovalList
        {
            get;
            set;
        }

        [GlobalSetup()]
        public void GlobalSetup()
        {
            IEnumerable<int> removalIndices;

            if (RemovalListLocation == RemovalLocation.Random)
            {
                // Pass a constant seed so the same indices get generated for every run
                Random random = new Random(12345);

                removalIndices = Enumerable.Range(0, RemovalListSize)
                    .Select(i => random.Next(DataListSize));
            }
            else
            {
                int removalSegmentOffset = RemovalListLocation switch {
                    RemovalLocation.Beginning => 0,
                    RemovalLocation.Middle => (DataListSize - RemovalListSize) / 2,
                    RemovalLocation.End => DataListSize - RemovalListSize,
                    _ => throw new Exception($"Unexpected {nameof(RemovalLocation)} enumeration value {RemovalListLocation}.")
                };

                removalIndices = Enumerable.Range(removalSegmentOffset, RemovalListSize);
            }

            // For efficiency, create a single List<int> to be reused by all iterations for a given set of parameters
            DataList = new List<int>(DataListSize);
            RemovalList = removalIndices.ToList().AsReadOnly();
        }

        [IterationSetup()]
        public void IterationSetup()
        {
            // Each iteration could modify DataList, so repopulate its elements for the
            // next iteration.  DataList is either new or has been Clear()ed by IterationCleanup().
            for (int i = 0; i < DataListSize; i++)
                DataList.Add(i);
        }

        [IterationCleanup()]
        public void IterationCleanup()
        {
            DataList.Clear();
        }

        [GlobalCleanup()]
        public void GlobalCleanup()
        {
            // Force collection of the List<> for the current set of parameters
            int generation = GC.GetGeneration(DataList);

            DataList = null;
            GC.Collect(generation, GCCollectionMode.Forced, true);
        }

        [Benchmark(Baseline = true)]
        public List<int> RemoveUsingListCopy()
        {
            HashSet<int> removalSet = RemovalList.ToHashSet();
            List<int> newList = new List<int>(DataList.Count - removalSet.Count);

            for (int index = 0; index < DataList.Count; index++)
                if (!removalSet.Contains(index))
                    newList.Add(DataList[index]);

            return newList;
        }

        // Based on Stack Overflow question 63495264
        // https://stackoverflow.com/q/63495264/150605
        [Benchmark()]
        public List<int> RemoveUsingRemoveAt()
        {
            // The collection of indices to remove must be in decreasing order
            // with no duplicates; all are assumed to be in-range for DataList
            foreach (int index in RemovalList.Distinct().OrderByDescending(i => i))
                DataList.RemoveAt(index);

            return DataList;
        }

        // From Stack Overflow answer 63498191
        // https://stackoverflow.com/a/63498191/150605
        [Benchmark()]
        public List<int> RemoveUsingMoveStrategy()
        {
            // The collection of indices to remove must be in increasing order
            // with no duplicates; all are assumed to be in-range for DataList
            int[] coreIndices = RemovalList.Distinct().OrderBy(i => i).ToArray();
            List<(int startIndex, int count, int distance)> moveOperations
                = new List<(int, int, int)>();

            int candidateIndex;
            for (int i = 0; i < coreIndices.Length; i = candidateIndex)
            {
                int currentIndex = coreIndices[i];
                int elementCount = 1;

                // Merge removal of consecutive indices into one operation
                while ((candidateIndex = i + elementCount) < coreIndices.Length
                    && coreIndices[candidateIndex] == currentIndex + elementCount
                )
                {
                    elementCount++;
                }

                int nextIndex = candidateIndex < coreIndices.Length
                    ? coreIndices[candidateIndex]
                    : DataList.Count;
                int moveCount = nextIndex - currentIndex - elementCount;

                // Removing one or more elements from the end of the
                // list will result in a no-op, 0-length move; omit it
                if (moveCount > 0)
                {
                    moveOperations.Add(
                        (
                            startIndex: currentIndex + elementCount,
                            count: moveCount,
                            distance: i + elementCount
                        )
                    );
                }
            }

            // Perform the element moves
            foreach ((int startIndex, int count, int distance) in moveOperations)
            {
                for (int offset = 0; offset < count; offset++)
                {
                    int sourceIndex = startIndex + offset;
                    int targetIndex = sourceIndex - distance;

                    DataList[targetIndex] = DataList[sourceIndex];
                }
            }

            // "Trim" the number of removed elements from the end of the list
            DataList.RemoveRange(DataList.Count - coreIndices.Length, coreIndices.Length);

            return DataList;
        }

        // Adapted from Stack Overflow answer 63496225 revision 4
        // https://stackoverflow.com/revisions/63496225/4
        [Benchmark()]
        public List<int> Answer63496225_TheodorZoulias()
        {
            HashSet<int> indicesSet = new HashSet<int>(RemovalList);
            int index = 0;

            DataList.RemoveAll(_ => indicesSet.Contains(index++));

            return DataList;
        }

        // Adapted from Stack Overflow answer 63496768 revision 2
        // https://stackoverflow.com/revisions/63496768/2
        [Benchmark()]
        public List<int> Answer63496768_CoolBots()
        {
            List<int> sortedIndicies = RemovalList.Distinct().OrderBy(i => i).ToList();

            int sourceStartIndex = 0;
            int destStartIndex = 0;
            int spanLength = 0;

            int skipCount = 0;

            // Copy items up to last index to be skipped
            foreach (int skipIndex in sortedIndicies)
            {
                spanLength = skipIndex - sourceStartIndex;
                destStartIndex = sourceStartIndex - skipCount;

                for (int i = sourceStartIndex; i < sourceStartIndex + spanLength; i++)
                {
                    DataList[destStartIndex] = DataList[i];
                    destStartIndex++;
                }

                sourceStartIndex = skipIndex + 1;
                skipCount++;
            }

            // Copy remaining items (between last index to be skipped and end of list)
            spanLength = DataList.Count - sourceStartIndex;
            destStartIndex = sourceStartIndex - skipCount;

            for (int i = sourceStartIndex; i < sourceStartIndex + spanLength; i++)
            {
                DataList[destStartIndex] = DataList[i];
                destStartIndex++;
            }

            DataList.RemoveRange(destStartIndex, sortedIndicies.Count);

            return DataList;
        }

        // Adapted from Stack Overflow answer 63495657 revision 6
        // https://stackoverflow.com/revisions/63495657/6
        [Benchmark()]
        public List<int> Answer63495657_NetMage()
        {
            List<int> removeAtList = RemovalList.Distinct().OrderBy(i => i).ToList();

            int srcCount = DataList.Count;
            int ralCount = removeAtList.Count;
            int removeAtIndice = 1;
            int freeIndex = removeAtList[0];
            int current = freeIndex + 1;
            while (current < srcCount)
            {
                while (removeAtIndice < ralCount && current == removeAtList[removeAtIndice])
                {
                    ++current;
                    ++removeAtIndice;
                }

                if (current < srcCount)
                    DataList[freeIndex++] = DataList[current++];
            }

            DataList.RemoveRange(freeIndex, srcCount - freeIndex);

            return DataList;
        }

        // Adapted from Stack Overflow answer 63496256 revision 3
        // https://stackoverflow.com/revisions/63496256/3
        [Benchmark()]
        public List<int> Answer63496256_Vernou()
        {
            List<int> indices = RemovalList.Distinct().OrderBy(i => i).ToList();

            //Get the internal array
            int[] largeArray = (int[]) typeof(List<int>)
                .GetField("_items", BindingFlags.Instance | BindingFlags.NonPublic)
                .GetValue(DataList);

            int current = 0;
            int copyFrom = 0;

            for (int i = 0; i < indices.Count; i++)
            {
                int copyTo = indices[i];
                if (copyTo < copyFrom)
                {
                    //In case the indice is duplicate,
                    //The item is already passed
                    continue;
                }
                int copyLength = copyTo - copyFrom;
                Array.Copy(largeArray, copyFrom, largeArray, current, copyLength);
                current += copyLength;
                copyFrom = copyTo + 1;
            }
            //Resize the internal array
            DataList.RemoveRange(DataList.Count - indices.Count, indices.Count);

            return DataList;
        }
    }
}

启动器代码

RunBenchmark()定义要运行的基准作业的类型以及运行时间。

using System;
using System.Collections.Generic;
using System.Diagnostics;
using System.Linq;
using System.Reflection;
using BenchmarkDotNet.Attributes;
using BenchmarkDotNet.Configs;
using BenchmarkDotNet.Environments;
using BenchmarkDotNet.Jobs;

namespace SO63495264
{
    class Program
    {
        static void Main(string[] args)
        {
            if (args?.Length == 0)
            {
                string assemblyFilePath = Assembly.GetExecutingAssembly().Location;
                string assemblyFileName = System.IO.Path.GetFileName(assemblyFilePath);

                Console.WriteLine($"{assemblyFileName} {{ benchmark | test }}");
            }
            else if (string.Equals(args[0], "benchmark", StringComparison.OrdinalIgnoreCase))
                RunBenchmark();
            else if (string.Equals(args[0], "test", StringComparison.OrdinalIgnoreCase))
                RunTest();
            else
                Console.WriteLine($"Unexpected parameter \"{args[0]}\"");
        }

        static void RunBenchmark()
        {
            Job baseJob = Job.MediumRun;
            IConfig config = DefaultConfig.Instance;

            foreach (Runtime runtime in new Runtime[] { CoreRuntime.Core31, ClrRuntime.Net48 })
            {
                config = config.AddJob(
                    baseJob.WithRuntime(runtime)
                );
            }

            BenchmarkDotNet.Running.BenchmarkRunner.Run<Benchmarks>(config);
        }

        static void RunTest()
        {
            const int ListSize = 20;
            const int MaxDisplayElements = 20;

            IEnumerable<int> data = Enumerable.Range(0, ListSize);
            IReadOnlyList<int> indices = new List<int>(
                new int[] {
                    0,                               1, // First two indices
                    ListSize / 4,
                    ListSize / 2,     ListSize / 2 + 1, // Consecutive indices
                    ListSize / 4 * 3, ListSize / 4 * 3, // Duplicate indices
                    ListSize - 2,     ListSize - 1      // Last two indices
                }
            ).AsReadOnly();

            // Discover and invoke the benchmark methods the same way BenchmarkDotNet would
            Benchmarks benchmarks = new Benchmarks() {
                RemovalList = indices
            };
            IEnumerable<MethodInfo> benchmarkMethods = benchmarks.GetType()
                .GetMethods(BindingFlags.Instance | BindingFlags.Public)
                .Where(
                    method => method.CustomAttributes.Any(
                        attribute => attribute.AttributeType == typeof(BenchmarkAttribute)
                    )
                );

            // Call a known-good method to get the correct results for comparison
            benchmarks.DataList = data.ToList();
            List<int> controlList = benchmarks.RemoveUsingListCopy();

            foreach (MethodInfo benchmarkMethod in benchmarkMethods)
            {
                List<int> inputList = data.ToList();

                benchmarks.DataList = inputList;
                Stopwatch watch = Stopwatch.StartNew();
                List<int> outputList = (List<int>) benchmarkMethod.Invoke(benchmarks, Array.Empty<object>());
                watch.Stop();

                Console.WriteLine($"{benchmarkMethod.Name}():");
                Console.WriteLine($"\tElements: {{ {string.Join(", ", outputList.Take(MaxDisplayElements))}{(outputList.Count > MaxDisplayElements ? ", ..." : "") } }}");
                Console.WriteLine($"\t   Count: {outputList.Count:N0}");
                Console.WriteLine($"\tSequence: {(outputList.SequenceEqual(controlList) ? "E" : "*Not* e")}qual to control list");
                Console.WriteLine($"\tDuration: {watch.Elapsed.TotalMilliseconds:N2} milliseconds");
                Console.WriteLine($"\tReturned: {(object.ReferenceEquals(outputList, inputList) ? "Input list" : "New list")}");
            }
        }
    }
}

为了在.NET Framework(4.8)上进行基准测试,我必须将以下属性添加到我的.NET Core .csproj项目文件:

<TargetFrameworks>netcoreapp3.1;net48</TargetFrameworks>
<LangVersion>8.0</LangVersion>
  • 41个字符 * 备用!
laawzig2

laawzig22#

由于源列表的顺序很重要,您可以在列表中向下移动每个项,跳过要删除的索引,然后删除列表的末尾。
更新:获取RemoveAll的.Net Core源代码,并将其修改为索引列表而不是 predicate 。
更新2:优化为尽可能不重复检测。
更新3:简化为在基准测试中使用额外代码的优化。
src作为大列表,将removeAtList作为要以某种随机顺序删除的索引,您可以执行以下操作:

removeAtList.Sort();

var srcCount = src.Count;
var ralCount = removeAtList.Count;
var removeAtIndice = 1;
var freeIndex = removeAtList[0];
var current = freeIndex+1;
while (current < srcCount) {
    while (removeAtIndice < ralCount && current == removeAtList[removeAtIndice]) {
        ++current;
        ++removeAtIndice;
    }

    if (current < srcCount)
        src[freeIndex++] = src[current++];
}

src.RemoveRange(freeIndex, srcCount-freeIndex);

对于一个10亿个随机整数元素的列表和一个1000 - 3000个随机索引元素的列表,使用这个算法,每次删除只需1.1毫秒,而使用RemoveAt,每次删除需要232.77毫秒,所以速度快了大约200倍。

cigdeys3

cigdeys33#

允许这被并行化的一种方式是将列表分成多个片段;也许最初(任意地)分离100万个元素的块。只要每个块保持其自己的计数,您就可以按索引将工作拆分为从不同块的移除(纯粹基于计数),然后并发地进行实际的移除工作。如果你在每个中留下一些备用容量,你也可以更便宜地向中间添加元素,因为您通常只接触一个厚片。随机访问会稍慢,因为您可能需要查看多个厚片计数来确定正确的厚片,但如果厚片计数保持在连续矢量中(而不是针对每个slab),那么在执行此操作时,您应该具有出色的内存缓存命中率。

xxhby3vn

xxhby3vn4#

这个答案基于这里的其他答案-主要是,我在列表中向上移动元素,正如@Vernou建议的那样(在他们的回答中)和@BACON(在评论中)。这一个终于有表现了(不像我的前几种方法),而且比目前发布的其他解决方案更快,至少在我的测试中-我尝试了OP的2_000_000_000条目和2_000索引设置-在我的笔记本电脑(i7- 8550U@1.8GHz,16 GB RAM)上运行时间不到10秒:

static void FilterOutIndicies(List<int> values, List<int> sortedIndicies)
{
    int sourceStartIndex = 0;
    int destStartIndex = 0;
    int spanLength = 0;

    int skipCount = 0;

    // Copy items up to last index to be skipped
    foreach (var skipIndex in sortedIndicies)
    {
        spanLength = skipIndex - sourceStartIndex;
        destStartIndex = sourceStartIndex - skipCount;

        for (int i = sourceStartIndex; i < sourceStartIndex + spanLength; i++)
        {
            values[destStartIndex] = values[i];
            destStartIndex++;
        }

        sourceStartIndex = skipIndex + 1;
        skipCount++;
    }

    // Copy remaining items (between last index to be skipped and end of list)
    spanLength = values.Count - sourceStartIndex;
    destStartIndex = sourceStartIndex - skipCount;

    for (int i = sourceStartIndex; i < sourceStartIndex + spanLength; i++)
    {
        values[destStartIndex] = values[i];
        destStartIndex++;
    }

    values.RemoveRange(destStartIndex, sortedIndicies.Count);
}
csga3l58

csga3l585#

方法List.RemoveAt从删除的项复制所有后续项,在您的例子中,每个项复制2,000 * 2,000,000,000次(不是真的,但真的很接近)。
解决方案是在已删除项和下一个已删除项之间手动复制项:

static void Main(string[] args)
{
    var largeList = Enumerable.Range(0, 2_000_000_000).ToList();

    var indices = new List<int>();
    var rand = new Random();
    for (var i = 0; i < 20000; i++)
    {
        indices.Add(rand.Next(0, largeList.Count - 1));
    }
    indices.Sort();

    var watch = new Stopwatch();
    watch.Start();

    // You can convert the list to array with ToArray,
    // but this duplicate the memory use.
    // Or get the internal array by reflection,
    // but reflection on external library isn't recommended
    var largeArray = (int[])typeof(List<int>)
        .GetField("_items", BindingFlags.Instance | BindingFlags.NonPublic)
        .GetValue(largeList);

    var current = 0;
    var copyFrom = 0;

    for (var i = 0; i < indices.Count; i++)
    {
        var copyTo = indices[i];
        if (copyTo < copyFrom)
        {
            //In case the indice is duplicate,
            //The item is already passed
            continue;
        }
        var copyLength = copyTo - copyFrom;
        Array.Copy(largeArray, copyFrom, largeArray, current, copyLength);
        current += copyLength;
        copyFrom = copyTo + 1;
    }
    //Resize the internal array
    largeList.RemoveRange(largeList.Count - indices.Count, indices.Count);

    watch.Stop();
    Console.WriteLine(watch.Elapsed);
    Console.WriteLine(largeList.Count);
}
lskq00tm

lskq00tm6#

如果要从List中删除多个项,并且无法使用新的List替换List,则最有效的方法是使用RemoveAll方法而不是RemoveAtRemoveAll只重新排列List的内部状态一次,而不是在每个删除项时重新排列一次。
RemoveAll接受为列表中的每个项调用一次的Predicate<T>(大列表)。不幸的是,这个委托没有接收到当前测试项的索引。然而,你 * 可以 * 依赖于了解RemoveAll是如何实现的。源代码显示,这些项是按升序顺序测试的。因此,基于这些知识,你 * 可以 * 从列表中删除选定的索引,非常高效,使用以下三行代码:

HashSet<int> indicesSet = new(indices);
int index = 0;
largeList.RemoveAll(_ => indicesSet.Contains(index++));

虽然这种行为没有明确的记录。理论上微软可以在未来的.NET版本中改变RemoveAll方法的行为,破坏上面的代码。我个人认为这种情况几乎是不可能的,但是如果你想安全起见,你可以使用一个自定义的RemoveAll实现,它有固定的行为,就像这个答案中找到的那样。它也有一个带索引的委托。你可以像这样使用它:

HashSet<int> indicesSet = new(indices);
largeList.RemoveAll((_, index) => indicesSet.Contains(index));

相关问题