获取数组列表中前k项的最有效方法Java

neekobn8  于 2022-12-21  发布在  Java
关注(0)|答案(2)|浏览(223)

我试图找到最快和最有效的方法来获得基于自定义可比较实现的对象arrayList中的前K个项。
在我的研究过程中,有人建议我应该使用最大/最小堆,它在java中被抽象为优先级队列。2然而,问题是我不知道如何在对象的数组列表上实现它
下面是我对象示例

public class PropertyRecord {

    private long id;
    private String address, firstName, lastName, email, ownerAddress;
    private LocalDate dateSold;
    private BigDecimal price;

    public PropertyRecord(long id, String address, String firstName, String lastName, String email, String ownerAddress, LocalDate dateSold, BigDecimal price) {

        this.id = id;
        this.address = address;
        this.firstName = firstName;
        this.lastName = lastName;
        this.email = email;
        this.ownerAddress = ownerAddress;
        this.dateSold = dateSold;
        this.price = price;

    }
 //getters and setters...
}

我想根据价格得到前k个项目。我写了一个方法(见下文),它接受arrayList和K(得到前K个项目),并使用StreamAPI,但我知道这不是最有效的方法,因为这将排序整个列表,即使我只想得到前K个项目。所以我想有O(k log n)而不是O(n)。

//return the top n properties based on sale price.
    public List<PropertyRecord> getTopProperties(List<PropertyRecord> properties, int n){

       //using StreamAPI
       return properties.stream()
               .sorted((p1, p2) -> p2.getPrice().compareTo(p1.getPrice()))
               .limit(n)
               .collect(Collectors.toList());

    }

有没有人帮忙?

5cg8jx4n

5cg8jx4n1#

Guava包含了一个TopKSelector类,它可以做到这一点。
在最新的Guava版本中,此功能现在公开为Comparators.greatest()
然而,如果您不局限于使用ArrayList进行存储,那么使用PriorityQueue可能会更好,因为它自然会按照优先级顺序保存元素。

cbjzeqam

cbjzeqam2#

在java中有几种可能的方法来计算top K,那么哪种方法最有效呢?让我们试试:

package com.example;

import com.google.common.collect.Ordering;

import java.util.*;
import java.util.stream.Collectors;

public class TopKBenchmark {
    public static void main(String[] args) {
        int inputListSize = 500000;
        int topK = 1000;
        int runCount = 100;
        List<Integer> inputList = new ArrayList<>(inputListSize);
        Random rand = new Random();
        rand.setSeed(System.currentTimeMillis());
        for (int i = 0; i < inputListSize; i++) {
            inputList.add(rand.nextInt(100000));
        }

        List<Integer> result1 = null, result2 = null, result3 = null, result4 = null;

        // method 1: stream and limit
        for (int i = 0; i < runCount; i++) {
            result1 = inputList.stream().sorted().limit(topK).collect(Collectors.toList());
        }

        // method 2: sort all
        for (int i = 0; i < runCount; i++) {
            Collections.sort(inputList);
            result2 = inputList.subList(0, topK);
        }

        // method3: guava: TopKSelector
        Ordering<Integer> ordering = Ordering.natural();
        for (int i = 0; i < runCount; i++) {
            result3 = ordering.leastOf(inputList, topK);
        }

        // method4: PQ
        for (int i = 0; i < runCount; i++) {
            PriorityQueue<Integer> priorityQueue = new PriorityQueue<>();
            priorityQueue.addAll(inputList);
            result4 = new ArrayList<>(topK);
            for (int j = 0; j < topK; j++) {
                result4.add(priorityQueue.poll());
            }
        }

        if (result1.size() != result2.size() || result2.size() != result3.size() || result3.size() != result4.size()) {
            throw new RuntimeException();
        }
        for (int i = 0; i < result1.size(); i++) {
            if (!result1.get(i).equals(result2.get(i)) || !result2.get(i).equals(result3.get(i)) || !result3.get(i).equals(result4.get(i))) {
                throw new RuntimeException();
            }
        }
    }
}

我尝试了以下inputListSizetopK组合:

  • 五十万,一千
  • 五万,一千
  • 五千,一千
  • 一千,一千
  • 十万五千

以下是基准测试结果:

使用Spot Profiler for Java and KotlinCollections.sort获胜。:)

相关问题