regex 正则表达式问题的Python模式匹配

qf9go6mv 于 2022-12-05 发布在 Python

关注(0)|答案(3)|浏览(177)

我正在尝试学习模式匹配与regex，课程是通过coursera，并没有更新，因为python 3出来，所以教师代码是不正确的工作。
以下是我目前掌握的情况：

# example Wiki data
wiki= """There are several Buddhist universities in the United States. Some of these have existed for decades and are accredited. Others are relatively new and are either in the process of being accredited or else have no formal accreditation. The list includes: 
• Dhammakaya Open University – located in Azusa, California, 
• Dharmakirti College – located in Tucson, Arizona 
• Dharma Realm Buddhist University – located in Ukiah, California 
• Ewam Buddhist Institute – located in Arlee, Montana
• Naropa University - located in Boulder, Colorado 
• Institute of Buddhist Studies – located in Berkeley, California
• Maitripa College – located in Portland, Oregon
• Soka University of America – located in Aliso Viejo, California
• University of the West – located in Rosemead, California 
• Won Institute of Graduate Studies – located in Glenside, Pennsylvania"""



pattern=re.compile(
    r'(?P<title>.*)' # the university title
    r'(-\ located\ in\ )' #an indicator of the location
    r'(?P<city>\w*)' # city the university is in
    r'(,\ )' #seperator for the state
    r'(?P<state>\w.*)') #the state the city is in)

for item in re.finditer(pattern, wiki, re.VERBOSE):
    print(item.groupdict())

输出量：

Traceback (most recent call last):
  File "/Users/r..., line 194, in <module>
    for item in re.finditer(pattern, wiki, re.VERBOSE):
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/re/__init__.py", line 223, in finditer
    return _compile(pattern, flags).finditer(string)
           ^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/re/__init__.py", line 282, in _compile
    raise ValueError(
ValueError: cannot process flags argument with a compiled pattern

我只想要一个包含大学名称、城市和州的字典。如果我运行它而不使用re.VERBOSE，只有一所学校出现，其他的都没有。我对python有点陌生，不知道该怎么处理这些错误

regex

来源：https://stackoverflow.com/questions/74670737/pattern-matching-in-python-with-regex-problem

3条答案

按热度按时间

kyxcudwk1#

实际上，对于当前版本的Python，根本不需要添加re.VERBOSE。

for item in re.finditer(pattern, wiki):                                                                 
    print(item.groupdict())

程序将打印

{'title': '• Naropa University ', 'city': 'Boulder', 'state': 'Colorado '}

使用Python 3.10。
顺便说一下，程序只输出一所学校，因为其他学校使用长连字符–或短连字符-。让所有学校使用相同的连字符，并相应地更改pattern，应该会给予你完整的列表。

赞(0）回复(0）举报 2022-12-05

atmip9wb2#

感谢JustLearning，我的问题解决了。这是我最终使用的代码。我不敢相信它是一个长连字符而不是一个短连字符。现在我知道我不需要使用re. VERBOSE了。再次感谢
模式=re.编译（r '（？P.）' r '（-\位于\）' r '（？P.）' r '（，\）' r '（？P.*）'）

赞(0）回复(0）举报 2022-12-05

eni9jsuy3#

在示例数据中，您使用了两种类型的连字符。

如果您想匹配这两种类型，可以使用字符类[–-]
除此之外，使用.*重复0次以上的任何字符（可以匹配空字符串），并将首先匹配，直到行尾，并将允许回溯匹配模式的其余部分。
你可以做的是使模式更精确一点，从每个组开始匹配至少一个单词字符。
如果您只对组title、city和state感兴趣，则不需要其他2个捕获组。
请注意，如果要匹配空格，则不必对其进行转义。

^\W*(?P<title>\w.*?) [–-] located in (?P<city>\w.*?), (?P<state>\w.*)

^字符串开头
\W*匹配可选的非单词字符
(?P<title>\w.*?)匹配一个单词字符，然后匹配尽可能少的字符
[–-]匹配任何在左右两侧带有空格的短划线
located in逐字匹配
(?P<city>\w.*?)匹配一个单词字符，然后匹配尽可能少的字符
,逐字匹配
(?P<state>\w.*)匹配一个单词字符，后跟行的其余部分

Regex demo|Python demo
范例

import re

pattern = r"^\W*(?P<title>\w.*?) [–-] located in (?P<city>\w.*?), (?P<state>\w.*)"

wiki = """There are several Buddhist universities in the United States. Some of these have existed for decades and are accredited. Others are relatively new and are either in the process of being accredited or else have no formal accreditation. The list includes:
• Dhammakaya Open University – located in Azusa, California,
• Dharmakirti College – located in Tucson, Arizona
• Dharma Realm Buddhist University – located in Ukiah, California
• Ewam Buddhist Institute – located in Arlee, Montana
• Naropa University - located in Boulder, Colorado
• Institute of Buddhist Studies – located in Berkeley, California
• Maitripa College – located in Portland, Oregon
• Soka University of America – located in Aliso Viejo, California
• University of the West – located in Rosemead, California
• Won Institute of Graduate Studies – located in Glenside, Pennsylvania"""

for item in re.finditer(pattern, wiki, re.M):
    print(item.groupdict())

输出量

{'title': 'Dhammakaya Open University', 'city': 'Azusa', 'state': 'California,'}
{'title': 'Dharmakirti College', 'city': 'Tucson', 'state': 'Arizona'}
{'title': 'Dharma Realm Buddhist University', 'city': 'Ukiah', 'state': 'California'}
{'title': 'Ewam Buddhist Institute', 'city': 'Arlee', 'state': 'Montana'}
{'title': 'Naropa University', 'city': 'Boulder', 'state': 'Colorado'}
{'title': 'Institute of Buddhist Studies', 'city': 'Berkeley', 'state': 'California'}
{'title': 'Maitripa College', 'city': 'Portland', 'state': 'Oregon'}
{'title': 'Soka University of America', 'city': 'Aliso Viejo', 'state': 'California'}
{'title': 'University of the West', 'city': 'Rosemead', 'state': 'California'}
{'title': 'Won Institute of Graduate Studies', 'city': 'Glenside', 'state': 'Pennsylvania'}

赞(0）回复(0）举报 2022-12-05

我来回答

regex 正则表达式问题的Python模式匹配

3条答案

相关问题

热门标签

最新问答