我需要捕获单词TITLE和JOURNAL之间的标题,并排除捕获的字符串为Direct Submission
的场景。
例如在下文中,
TITLE The Identification of Novel Diagnostic Marker Genes for the
Detection of Beer Spoiling Pediococcus damnosus Strains Using the
BlAst Diagnostic Gene findEr
JOURNAL PLoS One 11 (3), e0152747 (2016)
PUBMED 27028007
REMARK Publication Status: Online-Only
REFERENCE 2 (bases 1 to 462)
AUTHORS Behr,J., Geissler,A.J. and Vogel,R.F.
TITLE Direct Submission
JOURNAL Submitted (04-AUG-2015) Technische Mikrobiologie, Technische
捕获的字符串只需要'The Identification of Novel Diagnostic Marker Genes for the Detection of Beer Spoiling Pediococcus damnosus Strains Using the BlAst Diagnostic Gene findEr'
,带有或不带有换行符(最好不带有换行符)。
我尝试应用here和here等提供的正则表达式,但无法满足我的需要。
谢谢。
1条答案
按热度按时间6pp0gazn1#
(?<=TITLE)[\S\s]*?(?=JOURNAL)
应该起作用。(?〈=TITLE)是确保匹配的前面是TITLE。(?=JOURNAL)是确保匹配的后面是JOURNAL。
要排除
Direct Submission
,请使用(?<=TITLE)(?!\s*Direct Submission)[\S\s]*?(?=JOURNAL)
。但是,此方法也会排除以Direct Submission
. Here is the result开头的字符串。