python语言中的多行正则表达式问题

ioekq8ef 于 2022-10-30 发布在 Python

关注(0)|答案(4)|浏览(186)

我想在一个文本文件中选择一组行，以获取与ipref相关的所有作业。测试文件如下所示：工作编号：（1，2，3），IP参考：（十、十二、十）
文本文件：1 ...（几行文本）xxx 10 2 ...（几行文本）xxx 12 3 ...（几行文本）xxx 10
我想为IPref=10选择作业号。
编码：


# !/usr/bin/python

import re
import sys

fic=open('test2.xml','r')
texte=fic.read()
fic.close()

# pattern='\n?\d(?!(?:\n?xxx \d{2}\n)*)xxx 10'

pattern='\n?\d.*?xxx 10'

result= re.findall(pattern,texte, re.DOTALL)

i=1
for match in result:
    print("\nmatch:",i)
    i=i+1
    print(match)

结果：

match: 1
1
a
b
xxx 10

match: 2

1
a
b
xxx 12
1
a
b
xxx 10

我已经尝试用否定的lookaheadAssert来替换.*，以便仅在“xxx10”之前没有类似"\n?xxx \d{2}\n"的表达式时才进行选择：

pattern='\n?\d(?!(?:\n?xxx \d{2}\n)*)xxx 10'

但它不起作用...

python

来源：https://stackoverflow.com/questions/74245242/question-about-a-multi-line-regex-in-python-language

4条答案

按热度按时间

yvt65v4c1#

你可以这样写模式，重复换行符，Assert不是xxx，后面跟着一个或多个数字：

^\d(?:\n(?!xxx \d+$).*)*\nxxx 10$

模式匹配：

^字符串开头
\d匹配单个数字（或\d+匹配1个或多个数字）
(?:非捕获组
\n匹配换行符
(?!xxx \d+$)负lookaheadAssert字符串不是后跟1+位数的xxx
.*如果Assert为真，则匹配整行
)*关闭组并选择性地重复
\nxxx 10$匹配换行符、xxx和10

Regex demo

赞(0）回复(0）举报 2022-10-30

qacovj5a2#

你好：）和非常感谢你的快速React！！我给予你下面的结果注：我已按re. DOTALL修改了re.DOTALL| re.MULTILINE（因为没有这个结果就没有了......抱歉之前的演示......它不是很清楚）
文字档：

1
a
b
xxx 10
1
a
b
xxx 12
1
a
b
xxx 10

用你的模式编码：


# !/usr/bin/python

import re
import sys

fic=open('test2.xml','r')
texte=fic.read()
fic.close()
print(texte)

# pattern='<\/?(?!(?:span|br|b)(?: [^>]*)?>)[^>\/]*>'

# pattern='\n?\d(?!(?:\n?xxx \d{2}\n?)*?)xxx 10'

# pattern='\n?\d.*?xxx 10'

pattern='^\d(?:\n(?!xxx \d+$).*)*\nxxx 10$'

result= re.findall(pattern,texte, re.DOTALL|re.MULTILINE)

i=1
for match in result:
    print("\nmatch:",i)
    i=i+1
    print(match)

结果：

match: 1
1
a
b
xxx 10
1
a
b
xxx 12
1
a
b
xxx 10

但我试图获得：

match: 1
1
a
b
xxx 10

match 2 : 
1
a
b
xxx 10

赞(0）回复(0）举报 2022-10-30

r1zk6ea13#

非常感谢你，（你救了我的一天！！）就像你说的：

pattern='^\d(?:\n(?!xxx \d+$).*)*\nxxx 10$'
result= re.findall(pattern,texte, re.MULTILINE)

结果：正常，线组（1..xxx 12）被忽略，注：我可以将其适用于行1是提供作业信息的行，而“xxx 12”是提供打印机IP信息的行的情况。

match: 1
1
a
b
xxx 10

match: 2
1
a
b
xxx 10

赞(0）回复(0）举报 2022-10-30

f4t66c6m4#

文件名：

job_number job_id
1 10202
bla bla
bla bla bla
xxx 100.10.10.100
2 10203
bla bla
bla bla bla
bla bla bla
xxx 100.10.10.102
3 10204
bla bla bla
bla bla bla
xxx 100.10.10.100

嵌入Python脚本的bash脚本：


# !/bin/bash

# function , $1 : ip of a printer

get_jobs_ip ()
{
cat <<EOF | python
import re

fic=open('test3.xml','r')
texte=fic.read()
fic.close()

"""
The pattern matches example with ip="100\.10\.10\.100" :
thank you to Fourth bird for the pattern !!!

# pattern='^\d\s+\d+(?:\n(?!xxx \d+\.\d+\.\d+\.\d+$).*)*\nxxx 100\.10\.10\.100$'

^ Start of string
\d Match a single digit (or \d+ for 1 or more)
(?: Non capture group
\n Match a newline
(?!xxx \d+\.\d+\.\d+\.\d+$) Negative lookahead to assert that the string is not xxx  followed by 1+ digits
.* If the assertion is true, match the whole line
)* Close the group and optionally repeat it
\nxxx 100\.10\.10\.100$ Match a newline, xxx  and 10
"""

ip="$1"
pattern_template='^\d\s+\d+(?:\n(?!xxx \d+\.\d+\.\d+\.\d+$).*)*\nxxx @ip@$'
pattern=pattern_template.replace('@ip@',ip)

result= re.findall(pattern,texte, re.MULTILINE)

i=1
for match in result:
    print("\nmatch:",i)
    i=i+1
    print(match)
EOF
}

get_jobs_ip "100\.10\.10\.100"
get_jobs_ip "100\.10\.10\.102"

实验结果：

match: 1
1 10202
bla bla
bla bla bla
xxx 100.10.10.100

match: 2
3 10204
bla bla bla
bla bla bla
xxx 100.10.10.100

match: 1
2 10203
bla bla
bla bla bla
bla bla bla
xxx 100.10.10.102

赞(0）回复(0）举报 2022-10-30

我来回答

python语言中的多行正则表达式问题

4条答案

相关问题

热门标签

最新问答