regex 使用www.example.com()循环访问文件re.search，并为每个匹配项递增字典值

ltqd579y 于 2022-11-18 发布在其他

关注(0)|答案(1)|浏览(96)

我试图遍历一个用多种语言编写的文件，每次我得到一个句子的匹配项时，我想增加相应字典键的值。每个句子的开头都有一个语言标记（类似lang=“de”）。

import re
import sys

lang_freq = {'de':0, 'fr':0, 'it':0, 'rm': 0, 'en': 0, 'ch-de': 0}
word_freq_de = {}

filename = sys.argv[1]
infile = open(filename, "r")

for line in infile:
    matches = re.search(r'\blang=\".*\"\B', line)
    if matches == 'lang="de"':
        lang_freq['de'] +=1
    if matches == 'lang="fr"':
        lang_freq['fr'] +=1
    if matches == 'lang="it"':
        lang_freq['it'] +=1
    if matches == 'lang="rm"':
        lang_freq['rm'] +=1
    if matches == 'lang="en"':
        lang_freq['en'] +=1
    if matches == 'lang="ch-de"':
        lang_freq['ch-de'] +=1

print(lang_freq)

现在字典的值保持不变，我不知道我的逻辑哪里出错了。在这种情况下，是否可以使用==，或者我必须完全不同地解决它？我可以不使用www.example.com方法来解决它re.search，但仍然希望这样解决它：）

regex

来源：https://stackoverflow.com/questions/74435357/iterating-over-a-file-with-re-search-and-incrementing-dictionary-value-with-ea

1条答案

按热度按时间

f4t66c6m1#

re.search函数返回一个match对象，当你检查与if matches == 'lang="de"':是否相等时，你总是得到False。
您需要比较match对象的group()属性，或者更好的方法是 * 捕获 * 双引号中的数据，然后检查它是否存在于lang_freq字典中，然后采取相应的措施。
看起来只要使用

matches = re.search(r'\blang="(.*?)"', line)
if matches:
    lang_freq[matches.group(1)] +=1

另请参阅example Python code：

import re, sys

lang_freq = {'de':0, 'fr':0, 'it':0, 'rm': 0, 'en': 0, 'ch-de': 0}

# filename = sys.argv[1]
# open(filename, "r")
infile = iter('lang="de"\nlang="it"\nlang="de"\nlang="de"\nlang="it"\nlang="pl'.splitlines())

for line in infile:
    matches = re.search(r'\blang="([^"]*)"', line)
    if matches:
        if matches.group(1) in lang_freq:
            lang_freq[matches.group(1)] += 1
        else:
            lang_freq[matches.group(1)] = 1

print(lang_freq)

输出量：

{'de': 3, 'fr': 0, 'it': 2, 'rm': 0, 'en': 0, 'ch-de': 0}

赞(0）回复(0）举报 2022-11-18

我来回答

regex 使用www.example.com()循环访问文件re.search，并为每个匹配项递增字典值

1条答案

相关问题

热门标签

最新问答