我想匹配所有以两个##开头的字符串,并做一些替换,这意味着如果字符串以两个以上的##开头,比如###,它就不应该是匹配的,如果它只以一个#开头,它也不应该是匹配的。
import re
text = '''
# some one string
Describe your writing briefly here, what ihow many people are you looking for?
## some section two string
Describe your writing briefly here, what ihow many people are you looking for?Describe your writing briefly here, what ihow many people are you looking for?
Describe your writing briefly here, what ihow many people are you looking for?
## some other section two string with question sign?
Describe your writing briefly here, what ihow many people are you looking for? containing all keyword arguments except for those corresponding to a formal parameter. This may be combined with a formal parameter of the form *name (described in the next subsection) which receives a tuple containing the positional arguments beyond the formal parameter list. (*name must occur before **name.) For example, if we define a function like this
## some other section with . and : colon
Describe your writing briefly here, what ihow many people are you looking for?Describe your writing briefly here, what ihow many people are you looking for?
'''
pattern = r"##(.+?.*)"
list_with_sections_ = list(dict.fromkeys(re.findall(pattern, text)))
print(list_with_sections_)
if list_with_sections_:
for item in list_with_sections_:
text = re.sub(item, f'<a href="#" class="section-header title" id="{item.replace(" ", "-").strip()}_">{item}</a>', text)
print(text)
这看起来很有效,但是当字符串以问号结尾或包含一些特殊字符时,re.sub返回一些不一致的结果,例如,当匹配项以问号(?)结尾时,re.sub会在a
标签后添加一个额外的?
。
当我运行上面的输出:
1条答案
按热度按时间5vf7fwbs1#
此问题是由regex中如何处理“?”字符引起的。
在这里:
text = re.sub(item, f'<a href="#" class="section-header title..."
您将item
(它本质上是输入文本的一部分,并且可能包含“?”字符)视为正则表达式公式。但是正则表达式公式中的“?”字符具有特殊含义。因此,您匹配的相关文本段末尾没有?。您可以通过转义“item”中的特殊字符来解决此问题,如下所示:text = re.sub(re.escape(item), f'<a href="#" class="section-header title..."