python 返回一个 Dataframe ,其中包含包含单词

p8h8hvxi  于 2022-12-10  发布在  Python
关注(0)|答案(1)|浏览(118)

我有一个 Dataframe :

business049.txt  [bmw, cash, fuel, mini, product, less, mini]
business470.txt  [saudi, investor, pick, savoy, london, famou]
business075.txt  [eu, minist, mull, jet, fuel, tax, european]
business101.txt  [australia, rate, australia, rais, benchmark]
business060.txt  [insur, boss, plead, guilti, anoth, us, insur]

因此,我希望输出包含一列单词和一列包含单词的文件名。

bmw          [business049.txt,business055.txt]
australia    [business101.txt,business141.txt]

谢谢你

eeq64g8w

eeq64g8w1#

这很可能不是最有效/最佳的方法,但您可以这样做:

# Create DataFrame from question
df = pd.DataFrame({
    'txt_file': ['business049.txt',
            'business470.txt',
            'business075.txt',
            'business101.txt',
            'business060.txt',
            ],
    'words': [
        ['bmw', 'cash', 'fuel', 'mini', 'product', 'less', 'mini'],
        ['saudi', 'investor', 'pick', 'savoy', 'london', 'famou'],
        ['eu', 'minist', 'mull', 'jet', 'fuel', 'tax', 'european'],
        ['australia', 'rate', 'australia', 'rais', 'benchmark'],
        ['insur', 'boss', 'plead', 'guilti', 'anoth', 'us', 'insur'],
    ]
})

# Get all unique words in a list
word_list = list(set(df['words'].explode()))

# Link txt files to unique words
# Note: list of txt files is one string comma separated to ensure single column in resulting DataFrame
word_dict = {
    unique_word: [', '.join(df[df['words'].apply(lambda list_of_words: unique_word in list_of_words)]['txt_file'])] for unique_word in word_list
}

# Create DataFrame from dictionary (transpose to have words as row index).
words_in_files = pd.DataFrame(word_dict).transpose()

字典word_dict可能正是您所需要的,而不是仅仅为了使用DataFrame而保留DataFrame。如果是这种情况,请从字典创建中删除', '.join()部分,因为dict的值在长度上不相等并不重要。

相关问题