我有一个大文件与类似的文本模式如下。
{ device_id: 'do0142', message: '[0,"xyz","something",{}]}
{
device_id: 'Ampn05',
message: '[0,"23","something",{"connect":1,"error":"something","info":"xyz","valid":"Unavailable","timestamp":"2020-03-15T04:33:32Z","vendorId":"cycle","country":"anywhere"}]}
{ device_id: 'do0142', message: '[0,"xyz","something",{}]}
{
device_id: 'do0142',
message: '[0,"23","something",{"connect":1,"error":"something","info":"xyz","valid":"Unavailable","timestamp":"2020-03-15T04:33:32Z","vendorId":"cycle","country":"anywhere"}]}
字符串
我想在花括号内搜索device_id
,如果找到匹配项,则返回该花括号内的全部内容。
ex -如果我正在搜索device_id = 'do 0142',输出应该是这样的:
{ device_id: 'do0142', message: '[0,"xyz","something",{}]}
{ device_id: 'do0142', message: '[0,"xyz","something",{}]}
{
device_id: 'do0142',
message: '[0,"23","something",{"connect":1,"error":"something","info":"xyz","valid":"Unavailable","timestamp":"2020-03-15T04:33:32Z","vendorId":"cycle","country":"anywhere"}]}
型
我尝试在Python中使用正则表达式,但我只得到部分输出:
import re
file_name = "log.txt"
word = "do0142"
regex = r"(\[.*\])"
with open("log.txt", 'r', encoding="utf8") as input:
line = input.read()
matches = re.finditer(regex, line, re.MULTILINE)
for match in enumerate(matches, start=1):
for groupNum in range(0, len(match.groups())):
groupNum = groupNum + 1
print ("{group}".format(group = match.group(groupNum)))
型
请帮我写一下Python代码。
2条答案
按热度按时间8gsdolmq1#
一个 * 天真 * 的方法是打开
dot-matches-all
并捕获相关的块:字符串
演示:[ regex101 ]
输出量:
型
bhmjp9jg2#
使用正则表达式无法可靠地解析结构化数据。相反,由于输入文件的格式显然是由一个或多个空行分隔的多个YAML文档,因此可以使用YAML解析器(如
pyyaml
)将每组非空行解析为一个dict,您可以测试它是否具有您要查找的device_id
值,在这种情况下,文档将进入输出:字符串
因此,给定以下内容作为输入内容:
型
代码将输出:
型
请注意,我已经修复了示例输入中所有
message
中未终止的引号字符串,这可能是由于您努力最小化问题的输入而导致的格式错误。演示:https://replit.com/@blhsing1/ImpracticalGreatInsurance