如何在java中找到两个列表之间的公共序列

hgb9j2n6  于 2021-06-29  发布在  Java
关注(0)|答案(3)|浏览(451)

我试图在两个列表之间找到公共序列。如果我们试图在具有所有唯一值的列表中找到公共序列,我可以做到。例如:

list one: [1, 8, 3, 13, 14, 6, 11]
listTwo : [8, 9, 10, 11, 12, 13, 14, 15]

如我们所见,[13,14]序列对于两个列表是通用的。我的算法是 retainAll 函数有公共值,在这个例子中是 [8,11,13,14] . 但是由于list one已经被“retainall”函数更改了,所以我正在创建list one的副本。然后我从这些公共值的原始列表(列表一和列表二)中获取它们的位置。之后,我得到连续值的位置差。比如:

list1   list2   difList1     difList2
[8]     1      0     -1  (0-1)   -1  (0-1)
[11]    6      3     -5  (1-6)   -3  (0-3)
[13]    3      5      3  (6-3)   -2  (3-5)
[14]    4      6     -1  (3-4)   -1  (5-6)

如果diflist1和diflis2的值都显示为“-1”,这意味着该值和前一个值是连续的,并形成序列。由于[14]满足本例中的条件,因此序列为[13][14]。
对于这种情况,我的代码是:

public static void main(String args[]) {
    List<Integer> list1= new ArrayList(Arrays.asList(1, 8, 3, 13, 14, 6, 11));
    List<Integer> list2= new ArrayList(Arrays.asList(8, 9, 10, 11, 12, 13, 14, 15));
    list1.retainAll(list2);
    List<Integer> ori_list1= new ArrayList(Arrays.asList(1, 8, 3, 13, 14, 6, 11));
    List<Integer> difList1= new ArrayList<>();
    List<Integer> diffList2= new ArrayList<>();
    difList1.add(-1); // Since the first element doesn't have any previous element in common elements list,i'm putting -1 on first index.
    diffList2.add(-1); // Since the first element doesn't have any previous element in common elements list,i'm putting -1 on first index.
    System.out.println(list1); // common elements are [8, 13, 14, 11]

    for(int k=1;k<list1.size();k++){ // Let's say k = 2 ..
        int index1_1 = ori_list1.indexOf(list1.get(k)); // For index 2, it takes actual index of 14 value -> 4
        int index1_2 = ori_list1.indexOf(list1.get(k-1)); // it takes actual index of 13 value -> 3
        int diff_list1 = index1_2-index1_1; // 3-4= -1 -> we got -1 .That means they're consecutive.
        difList1.add(diff_list1); // And putting the -1 into the diffList1.
        int index2_1 = list2.indexOf(list1.get(k)); // doing the same thing for list2.. -> 6
        int index2_2 = list2.indexOf(list1.get(k-1)); // doing the same thing for list2.. -> 5
        int diff_doc2 = index2_2-index2_1;  // 5-6 = -1
        diffList2.add(diff_doc2); // put -1 in diffList2 
    }
    for(int y=1;y<difList1.size();y++){ 
        if(difList1.get(y)==-1 && diffList2.get(y)==-1){  // Since they are both -1 for 14 value 

            System.out.println("The common sequence is:"+list1.get(y-1)+" "+list1.get(y)); // Print them
        }
    }
}

但我需要解决重复元素的问题。假设我们有这样的清单
列表一:[1,8,3,10,13,8,10,14,6,11]列表二:[8,9,10,11,12,8,10,13,14,15]
现在我们有了另一个公共序列[8,10],在输出中,我想同时看到[13,14]和[8,10]。但我只看到[13,14]。因为在计算8和10的索引时,程序会取前8和10的索引。对于list1,8值取第一个索引,10值取第三个索引。但是我需要传递它们,因为我已经用过了,我需要5和6这样的索引,而不是1和3。
我不知道如何找到两个以上值的序列。例如,不仅是[13,14],而且是[13,14,15]或更多,如果它们是连续的。我知道这个问题有点难,但我需要你的帮助。

zaqlnxep

zaqlnxep1#

因为这是家庭作业(或类似的),所以我不打算给出代码。
但是,我的方法是扫描较长的列表,并在一个列表中记录每个数字的位置 Map<Integer, Set<Integer>> .
然后遍历较短的列表,从Map集中的索引开始查找相同的序列。
这样做会带来很好的时间复杂性,

emeijp43

emeijp432#

将int值处理为 codePoints :
[1] 将列表2转换为str2
[2] 将list1转换为str1,去掉左侧所有int值(不在list2中)
[3] 将str1移到str2上,记住str1位于str2上的最长序列

ArrayList<int[]> results = new ArrayList<>();

String str2 = new String(
            new int[] { 1, 8, 3, 10, 13, 14, 8, 10, 14, 6, 11 }, 0, 11 ); //[1]
int[] tmp = new int[] { 8, 9, 10, 11, 12, 8, 10, 13, 14, 15 };
int[] arr1 = IntStream.of( tmp ).dropWhile(
    c -> str2.indexOf( c ) < 0 ).toArray();      //[2]
String str1 = new String( arr1, 0, arr1.length );
for( int i = str1.length() - 2; i >= 0; i-- ) {  //[3]
  int[] rslt = new int[0];
  for( int j = 0; j < str2.length() - 2; j++ ) {
    int[] idx2 = new int[] { j };
    rslt = str1.substring( i ).codePoints().takeWhile(
        c -> c == (int)str2.charAt( idx2[0]++ ) ).toArray();
    if( rslt.length >= 2 ) {
      results.add( rslt );
    }
  }
}

results.forEach(a -> System.out.println( Arrays.toString( a ) ));

得到: [13, 14] , [10, 13, 14] , [8, 10]

avwztpqn

avwztpqn3#

我不太清楚你想做什么,但如果我做的是普通序列,我会创建子列表并比较它们:

public static Set<List<Integer>> findCommonSequence(List<Integer> source, List<Integer> target, int startLength) {
        Set<List<Integer>> sequences = new LinkedHashSet<>();

        // algorithm works in this way:
        // we prepare all possible sublists of source list that are at least startLength length
        // and then we check every of those sublists against the target list to see if it contains any

        // length is from startLength to maxSize, to check all sublists with that length
        // ie if startLength is 2 and source is 10, it will be 2 - 10 and thus it will check all sublist sizes
        for (int length = startLength; length < source.size(); length++) {
            // startIndex will move from 0 to original_list - length, so if length is 2, it will generate sublists
            // with indexes 0,1; 1,2; 2,3 ... 8,9
            for (int startIndex = 0; startIndex+length < source.size(); startIndex++) {
                // creates lightweight sublist that shares the data
                List<Integer> sublist = source.subList(startIndex, startIndex+length);
                // add all found subsequences into the set
                sequences.addAll(findSequenceIn(target, sublist));
            }
        }

        return sequences;
    }

    // Returns all subsequences that are inside the target list
    private static Set<List<Integer>> findSequenceIn(List<Integer> target, List<Integer> sublist) {
        Set<List<Integer>> subsequences = new LinkedHashSet<>();

        // simply do the same process as in first method but with fixed length to the length of sublist
        for (int i=0; i<target.size() - sublist.size(); i++) {
            // create another sublist, this time from target (again, share data)
            List<Integer> testSublist = target.subList(i, i+sublist.size());

            // compare two sublists, if they are equal, that means target list contains sublist from original list
            if (testSublist.equals(sublist)) {
                // add it to the set
                subsequences.add(new ArrayList<>(sublist));
            }
        }

        return subsequences;
    }

然后,您可以优化算法,通过发送索引而不是子列表来执行检查,并手动执行比较。该算法的复杂度应该在o(n3)到o(n4)之间。可能是o(n4),因为我们做了n2个子列表,然后将其中的n个操作与列表2中的n个子列表进行比较,但可能是n3,因为比较较小,不知道在数学上它与n3或n4有多接近。
当然,还有另一个n与子列表的副本,但你可以优化这一个了。

相关问题