regex 从特定模式的字符串中提取单词

k5ifujac  于 2023-08-08  发布在  其他
关注(0)|答案(1)|浏览(131)

在给定的字符串中,只提取字母数字,除了'::'之间的单词,而不管':'和字母数字之间的空格,它应该能够提取它。下面是代码示例

import re

message = "ass :gifs_e4VLc8f2_galabingo: ass dof:stickers_t3B0l2J7_galabingo:dor"
message1 = ":gifs_e4VLc8f2_galabingo::stickers_t3B0l2J7_galabingo:"
# Regex pattern to extract words that do not start and end with colons
pattern = r'(?<!:)(?::[^:]+:)*([^:]+)(?::[^:]+:)*(?!:)'

# Find all occurrences of words in the message that do not start and end with colons
words_without_colons = re.findall(pattern, message)
words_without_colons1 = re.findall(pattern, message1)
print(words_without_colons)
print(words_without_colons1 )

字符串
实际产量:
['ass','ass dof',' or ']['ifs_e4VLc8f2_galabing','tickers_t3B0l2J7_galabing']
预期的输出:op1:['ass ','ass dof','dor']
op2:[] #空列表

klh5stk1

klh5stk11#

也许使用re.split会更容易,因为它使用了一个由冒号之间的不间断字符组成的分隔符(可选的前导/尾随空格):

import re

pattern  = r" ?:[^ :]*?: ?"

message  = "ass :gifs_e4VLc8f2_galabingo: ass dof:stickers_t3B0l2J7_galabingo:dor"
message1 = ":gifs_e4VLc8f2_galabingo::stickers_t3B0l2J7_galabingo:"

*words,  = filter(None,re.split(pattern,message))
*words1, = filter(None,re.split(pattern,message1))

print(words)  # ['ass', 'ass dof', 'dor']
print(words1) # []

字符串

相关问题