linq 如果所有字词都在停用字词列表中,则删除这些字词

igsr9ssn  于 2022-12-06  发布在  其他
关注(0)|答案(3)|浏览(305)

我有一个单词数组,它可以包含一个或多个单词。如果是一个单词,很容易删除它,但当选择删除多个单词时,如果它们都在停用词列表中,我很难弄清楚。我更喜欢用LINQ解决它。
想象一下,我有一个字符串数组

then use 
then he
the image
and the
should be in
should be written

我只想得到

then use 
the image
should be written

因此,应删除停用词中的all it words行,保留混词行。
我的停用词数组string[] stopWords = {"a", "an", "x", "y", "z", "this", "the", "me", "you", "our", "we", "I", "them", "then", "ours", "more", "will", "he", "she", "should", "be", "at", "on", "in", "has", "have", "and"};
谢谢你的好意,

eqqqjvef

eqqqjvef1#

解决此问题的一种方法是执行以下操作:

string[] stopWords = { "a", "an", "x", "y", "z", "this", "the", "me", "you", "our", "we", "I", "them", "ours", "more", "will", "he", "she", "should", "be", "at", "on", "in", "has", "have", "and" };

string input = """"
            then use 
            then he
            the image
            and the
            should be in
            should be written
            """";

var array = input.Split(Environment.NewLine.ToCharArray(), StringSplitOptions.RemoveEmptyEntries);

var filteredArray = array.Where(x => x.Split(' ').Any(y => !stopWords.Contains(y))).ToList();
var result = string.Join(Environment.NewLine, filteredArray);

Console.WriteLine(result);

前两行只是设置数据。
第三行通过在换行符上拆分,将字符串转换为行数组。(Environment.NewLine确保代码在linux上也能正常工作。)
第四行通过在空格上拆分行来处理每一行(这会得到单独的单词),然后检查是否有任何单词在stopWords列表中不存在。如果有任何单词不存在,则满足Where条件,并在filteredArray中返回整行。
第五行简单地连接所有单独的行以形成最终的result字符串。
结果应如下所示:

then use
then he
the image
should be written

请注意,在stopWords列表中,有单词them,但没有then,因此不应删除第二个结果行。

gajydyqb

gajydyqb2#

使用“相交”方法,如下所示:

foreach (string word in WordsList)
    {
        List<string> splitData = word.Split(new string[] { " "}, StringSplitOptions.RemoveEmptyEntries).ToList();
        bool allOfWordsIsInStopWords = splitData.Intersect(stopWords).Count() == splitData.Count();
    }
qvk1mo1f

qvk1mo1f3#

根据最初的问题描述:
我有一个单词数组,它可以包含一个或多个单词。如果只有一个单词,很容易删除它,但是当选择删除多个单词时,如果它们都在停用词列表中,我很难弄清楚。我更喜欢用LINQ解决它。
下列程式码会解析粗体的句子。

using System.Text.RegularExpressions;

string[] stopWords = { "a", "an", "x", "y", "z", "this", "the", "me", "you", "our", "we", "I", "them", "ours", "more", "will", "he", "she", "should", "be", "at", "on", "in", "has", "have", "and" };

string[] inputStrings = { "then use", "then he", "the image", "and the", "should be in", "should be written" };

var wordSeparatorPattern = new Regex(@"\s+");

var outputStrings = inputStrings.Where((words) => 
{
    return wordSeparatorPattern.Split(words).Any((word) =>
    {
        return !stopWords.Contains(word);
    });
});

foreach (var item in outputStrings)
{
    Console.WriteLine(item);
}

相关问题