如何使用FuzzyWzzy从列表中提取全文?

yduiuuwa  于 2021-08-20  发布在  Java
关注(0)|答案(2)|浏览(471)

下面是我的代码:

  1. from fuzzywuzzy import fuzz
  2. check = open("text.txt","a")
  3. MIN_MATCH_SCORE = 30
  4. heard_word = 'i5-1135G7 '
  5. possible_words = check
  6. guessed_word = [word for word in possible_words if fuzz.ratio(heard_word, word) >=
  7. MIN_MATCH_SCORE]
  8. print ('this one - ', guessed_word)

预期产出:

  1. 11th Generation Intel® Core i5-1135G7 Processor

仅仅给出“i5-1135g7”就可以得到预期输出的整个句子吗?是否有其他解决方案来实现我的期望?先谢谢你。
下面是text.txt的链接
https://drive.google.com/file/d/1mo3qfmeoaqa3wppyg8spefvsjdx7aqbj/view

l7wslrjt

l7wslrjt1#

为了抵消较长的句子,并确保在单词层面上重叠,您应该使用 token_set_ratio . 另外,如果您想要完整的单词重叠,则增加 MIN_MATCH_SCORE 接近100。

  1. from fuzzywuzzy import fuzz
  2. MIN_MATCH_SCORE = 90
  3. heard_word = 'i5-1135G7'
  4. possible_words = ['11th Generation Intel® Core™ i5-1135G7 Processor (2.40 GHz,up to 4.20 GHz with Turbo Boost, 4 Cores, 8 Threads, 8 MB Cache)',
  5. 'windows 10 64 bit', 'intel i7']
  6. print ([word for word in possible_words
  7. if fuzz.token_set_ratio(heard_word, word) >= MIN_MATCH_SCORE])

输出:

  1. ['11th Generation Intel® Core™ i5-1135G7 Processor (2.40 GHz,up to 4.20 GHz with Turbo Boost, 4 Cores, 8 Threads, 8 MB Cache)']
nhhxz33t

nhhxz33t2#

token\u set\u比率工作正常!

从fuzzyfuzzy导入fuzz

  1. s = []
  2. for l in df1.values:
  3. l = ', '.join(l)
  4. s.append(l)
  5. s = ', '.join(s)
  6. main = [x for x in g if x]
  7. MIN_MATCH_SCORE = 60
  8. heard_word = 'i5-11th gen'
  9. guessed_word = [word for word in main if fuzz.token_set_ratio(heard_word,
  10. word) >= MIN_MATCH_SCORE]
  11. print ('this one - ', guessed_word)
展开查看全部

相关问题