Javascript中的正则表达式-查找标记后的多行文本

ryevplcw  于 2023-03-11  发布在  Java
关注(0)|答案(2)|浏览(96)

我将以下文本存储在变量description中:

This is a code update

Official Name: None

Pub: https://content.upcodes.co/viewer/washington/wa-mechanical-code-2021

Agency:  

Reference: https://web.archive.org/web/20230226234118/https://lawfilesext.leg.wa.gov/law/wsr/agency/BuildingCodeCouncil.htm

Citation: WAC 51-52 / WSR 23-02-055

Draft Doc Title: WSR 23-02-055 (#1)

Draft Source Doc: https://web.archive.org/web/20230303022030/https://lawfilesext.leg.wa.gov/law/wsr/2023/02/23-02-055.htm (#1)

Draft Drive: https://drive.google.com/file/d/1pYmwQS3t-ZX-Vyg9yBabtIpXZ7By2G6f/view?usp=share_link ( #1)

Final Doc Title: 

IECC Com Update(#1)

IECC Res Update (#2)

Final Source Doc: https://web.archive.org/web/20230303022130/https://apps.leg.wa.gov/wac/default.aspx?cite=51-52&full=true&pdf=true (#1)

https://web.archive.org/web/20230303022030/https://lawfilesext.leg.wa.gov/law/wsr/2023/02/23-02-055.htm (#2)

Final Drive: https://web.archive.org/web/20230303022130/https://apps.leg.wa.gov/wac/default.aspx?cite=51-52&full=true&pdf=true (#1)

https://web.archive.org/web/2023030302fdfdfg2130/https://apps.legfdg.gov/wac/default.aspx?cite=51-52&fdsfullfdsf=true&pfdsfdf=true  (#2)

Effective Date:  January 4, 2023

我想提取“最终文档标题:”标记后的信息。它应该给予我两个值。第一个值是IECC Com Update(#1)IECC Res Update (#2)。我有下面的代码,提取标记后的文本,直到找到一个新的行字符。

//8. Extract Final Doc Title
var final_doc_title = description.search("Final Doc Title:");
if(final_doc_title != -1){
    final_doc_title = description.match(/(?<=^Final Doc Title:)[^\n\r]+/m);
    final_doc_title = final_doc_title?.[0].trim();
}else{
    final_doc_title = '';
}
console.log('Final Doc Title: ' + final_doc_title);

这段代码的问题是它返回了一个空字符串,因为在“最终文档标题:”后面有一个换行符。

Final Doc Title:\n
IECC Com Update(#1)\n
IECC Com Update(#1)\n

我如何修改我的代码返回两行?谢谢!

r6l8ljro

r6l8ljro1#

假设您对要查找的文本前面白色空白不感兴趣,则可以将这些换行符与\s*匹配。
如果要查找的文本恰好在带有冒号的行之前结束(如Final Source Doc: https:....),则可以执行以下操作:

const description = "This is a code update\n\nOfficial Name: None\n\nPub: https://content.upcodes.co/viewer/washington/wa-mechanical-code-2021\n\nAgency:  \n\nReference: https://web.archive.org/web/20230226234118/https://lawfilesext.leg.wa.gov/law/wsr/agency/BuildingCodeCouncil.htm\n\nCitation: WAC 51-52 / WSR 23-02-055\n\nDraft Doc Title: WSR 23-02-055 (#1)\n\nDraft Source Doc: https://web.archive.org/web/20230303022030/https://lawfilesext.leg.wa.gov/law/wsr/2023/02/23-02-055.htm (#1)\n\nDraft Drive: https://drive.google.com/file/d/1pYmwQS3t-ZX-Vyg9yBabtIpXZ7By2G6f/view?usp=share_link ( #1)\n\nFinal Doc Title: \n\nIECC Com Update(#1)\n\nIECC Res Update (#2)\n\nFinal Source Doc: https://web.archive.org/web/20230303022130/https://apps.leg.wa.gov/wac/default.aspx?cite=51-52&full=true&pdf=true (#1)\n\nhttps://web.archive.org/web/20230303022030/https://lawfilesext.leg.wa.gov/law/wsr/2023/02/23-02-055.htm (#2)\n\nFinal Drive: https://web.archive.org/web/20230303022130/https://apps.leg.wa.gov/wac/default.aspx?cite=51-52&full=true&pdf=true (#1)\n\nhttps://web.archive.org/web/2023030302fdfdfg2130/https://apps.legfdg.gov/wac/default.aspx?cite=51-52&fdsfullfdsf=true&pfdsfdf=true  (#2)\n\nEffective Date:  January 4, 2023\nI want to extract the information after 'Final Doc Title:' tag. It should give me two values. The first value is IECC Com Update(#1) and IECC Res Update (#2). I have a code below that extracts the text after the tag until a new line character is found.\n\n//8. Extract Final Doc Title";

var result = description.match(/^Final Doc Title:\s*((?:\s*^(?:[^:\r\n]*)$)*)/m)?.[1];
var parts = result?.match?.(/.+/gm);
console.log(parts);
mbjcgjjk

mbjcgjjk2#

一个简单的多行只标记正则表达式如下...

/^Final Doc Title:\s+(.+)\s+(.+)/m

...其中功能2 capturing groups(他们不一定要命名)已经做的工作。
x一个一个一个一个x一个一个二个x

相关问题