regex 比较一个大字符串和一个大字符串数组,以在JavaScript中找到3个或更多的连续单词匹配

lnlaulya  于 2023-03-24  发布在  Java
关注(0)|答案(2)|浏览(123)

我有一个大字符串(1000个单词),我想与数组的所有元素进行比较,该数组也包含大字符串,所有3个或更多连续的单词匹配。我已经用正则表达式实现了它,但得到的是空的匹配数组。

小文本示例:

let textToCompare = "Hello there how are you doing with your life";

let textsToCompareWith= [
  { id:1, text:"Hope you are doing good with your life" },
  { id:2, text:"what are you doing with your life. hello there how are you" },
  { id:3, text:"hello there mate" }
];

预期产出:

[
  {id:1, matchedText:["with your life"]}, 
  {id:2, matchedText:["are you doing with your life","hello there how are you"]},
  {id:3, matchedText:[]}
];

当前输出:

[
  {id:1, matchedText:[]}, 
  {id:2, matchedText:[]},
  {id:3, matchedText:[]}
];

我的代码:

let regex = new RegExp("\\b" + textToCompare.split(" ").join("\\b.*\\b") + "\\b", "gi");

let output = textsToCompareWith.map(textObj => {
  // Match against each element in the array
  let matchedText = textObj?.text.match(regex);
  console.log(matchedText);
  return {
    id: textObj.id,
    matchedText: matchedText ? matchedText : [] // Return an empty array if no match is found
  };
});
                                  
console.log(output);
q7solyqu

q7solyqu1#

我为自己学习JavaScript创造了一个答案。拼凑起来,我得出了:

let textToCompare = "Hello there how are you doing with your life";
let words = textToCompare.split(/\s+/);
let x = words.length;
let textsToCompareWith= [
  { id:1, text:"Hope you are doing good with your life" },
  { id:2, text:"what are you doing with your life. hello there how are you" },
  { id:3, text:"hello there mate" }
];

let combos = [...chunks(words)];
combos.sort(function(a, b){return b.length - a.length});

console.log(textsToCompareWith.map(({ id, text }) => ({id, matchedText: FindMatches(text)})));

function* chunks(arr) {
    for (let i = 0; i < x-2; i++) {
        for (let j = i+3; j < x+1; j++) {
            yield arr.slice(i,j).join(" ");
        }
    }
}

function FindMatches(s) {
    var r = [];
    for (let i = 0; i < combos.length; i++) {
        re = new RegExp(`\\b${combos[i]}\\b`, 'i');
        if (re.test(s)) {
            r.push(combos[i]);
            s = s.replace(re, ' ');
        } 
    }
    return r;
}

我很确定这段代码会有很多缺陷,看起来会很笨重,但我的想法是将你的输入拆分成3个以上的单词,这是基于它可以被空格拆分的假设。然后我试着根据长度对结果数组进行排序,这样我们就不会先找到更小的子字符串。
谁知道呢也许这里面有些东西真的有用

gfttwv5a

gfttwv5a2#

你可以互相检查每个单词,并留意最后一个单词。

const
    compare = (w1, w2) => {
        const
            result = [],
            ends = {};
        
        for (let i = 0; i < w1.length; i++) {
            for (let j = 0; j < w2.length; j++) {
                if (w1[i] !== w2[j]) continue;
                let k = 0;
                while (i + k < w1.length && j + k < w2.length) {
                    if (w1[i + k] !== w2[j + k]) break;
                    k++;
                }
                if (k > 2 && !ends[j + k]) {
                    result.push(w2.slice(j, j + k).join(' '));
                    ends[j + k] = true;
                }
            }
        }
        return result;
    },
    lower = s => s.toLowerCase(),
    textToCompare = "Hello there how are you doing with your life",
    textsToCompareWith = [{ id: 1, text: "Hope you are doing good with your life" }, { id: 2, text: "what are you doing with your life. hello there how are you" }, { id: 3, text: "hello there mate" }],
    words = textToCompare.match(/\w+/g).map(lower),
    result = textsToCompareWith.map(({ id, text }) => ({
        id,
        matchedText: compare(words, text.match(/\w+/g).map(lower))
    }));

console.log(result);
.as-console-wrapper { max-height: 100% !important; top: 0; }

另一个稍微不同的方法是避免使用单词。
一个二个一个一个

相关问题