使用regex删除字符串中重复的(连续的或非连续的)单词

cl25kdpy  于 2021-07-06  发布在  Java
关注(0)|答案(3)|浏览(314)

如何在java中使用regex删除重复/重复的单词(连续和非连续的)?

Hello to everyone hello in this world world \\ how do I convert this into

Hello to everyone in this world \\into this

我确实找到了一个正则表达式,它可以找到不连续的重复单词:

regex: (?s)(\b\w+\b)(?=.*\b\1\b)

那么,如何使用这个正则表达式来删除重复的单词(并且只保留第一次出现的重复单词)?

yduiuuwa

yduiuuwa1#

这是你申请的另一个选择 replaceAll 两次两种不同的模式。我可能遗漏了一些细节,但这适用于提供的字符串。

String str =
        "how do do I remove how repeated words from this words sentence.";

String nonc = "(?i)(\\S+)(.*)(\\1(\\s|$))";
String conc = "(?i)(\\S+\\s)(\\1)";
str = str.replaceAll(nonc,"$1$2").replaceAll(conc, "$1");
System.out.println(str);

印刷品

how do I remove repeated words from this sentence.
0kjbasz6

0kjbasz62#

尝试:

String text = "Hello to everyone hello in this world world \\ how do I convert this into";
Pattern p = Pattern.compile("(?i)(\\b\\w+\\b)(.*?) \\b\\1\\b");
Matcher m = p.matcher(text);
while (m.find()) {
    text = m.replaceAll("$1$2");
    m = p.matcher(text);
}

查看java演示

fykwrbwg

fykwrbwg3#

这里是一种使用流的非正则表达式方法,假设单词之间用空格隔开

String original = "Hello to everyone hello in this world world";
Set<String> set = new HashSet<>();
String modified = Arrays.stream(original.split(" ")).filter(s -> set.add(s.toLowerCase())).collect(Collectors.joining(" "));

相关问题