regex Python正则表达式-获取从找到匹配项后的下一行开始的文本

0lvr5msh  于 2022-11-26  发布在  Python
关注(0)|答案(2)|浏览(201)

我有一个关于在Python中使用正则表达式的问题。这是我正在分析的文本的一部分。

Amit Jawaharlaz Daryanani,  Evercore ISI Institutional Equities, Research Division - Senior MD & Fundamental Research Analyst   [19]\n I have 2 as well. I guess, first off, on the channel inventory, I was hoping if you could talk about how did channel inventory look like in the March quarter because it sounds like it may be below the historical ranges. And then the discussion you had for June quarter performance of iPhones, what are you embedding from a channel building back inventory levels in that expectation?\n

我的目标是通过匹配分析师Amit Jawaharlaz Daryanani的姓名来提取这部分文本:\n我也有两个。我想,首先,关于渠道库存,我希望您能谈谈三月份季度的渠道库存情况,因为听起来它可能低于历史范围。然后,您讨论了iPhone六月份季度的业绩,您从渠道构建库存水平的预期中嵌入了什么?\n
我不能只从\n到\n,因为文字太长,而且我特别需要他名字后面的文字行。
我试过了:re.findall(r'(?〈=阿米特·贾瓦哈拉斯·达里亚纳尼).*?(?=\n)',text)
但这里的输出是

[',  Evercore ISI Institutional Equities, Research Division - Senior MD & Fundamental Research Analyst   [19]'

那么,如何从他名字后面的第一个字符\n开始,直到他名字后面的第二个字符\n?

pkbketx9

pkbketx91#

您可以使用撷取群组:

\bAmit Jawaharlaz Daryanani\b.*\n\s*(.*)\n

说明

  • \bAmit Jawaharlaz Daryanani\b匹配名称
  • .*\n匹配行的其余部分和一个换行符
  • \s*(.*)\n匹配可选的空白字符,并捕获组1中的整行,然后匹配换行符

请参阅regex demoPython demo

import re

pattern = r"\bAmit Jawaharlaz Daryanani\b.*\n\s*(.*)\n"

s = ("Amit Jawaharlaz Daryanani,  Evercore ISI Institutional Equities, Research Division - Senior MD & Fundamental Research Analyst   [19]\n"
     " I have 2 as well. I guess, first off, on the channel inventory, I was hoping if you could talk about how did channel inventory look like in the March quarter because it sounds like it may be below the historical ranges. And then the discussion you had for June quarter performance of iPhones, what are you embedding from a channel building back inventory levels in that expectation?\n"
     " \n")

m = re.search(pattern, s)
if m:
    print(m.group(1))

输出量

I have 2 as well. I guess, first off, on the channel inventory, I was hoping if you could talk about how did channel inventory look like in the March quarter because it sounds like it may be below the historical ranges. And then the discussion you had for June quarter performance of iPhones, what are you embedding from a channel building back inventory levels in that expectation?
uqzxnwby

uqzxnwby2#

试试这个:

  • 名称的非捕获组
  • 查找第一个\n
  • 捕获组直到第二个\n
re.findall(r'(?:Amit Jawaharlaz Daryanani).*?\n(.*?)\n', text)

这是因为.*?是非贪婪的,这意味着它在遇到第一个\n之前停止。
输出量:

[' I have 2 as well. I guess, first off, on the channel inventory, I was hoping if you could talk about how did channel inventory look like in the March quarter because it sounds like it may be below the historical ranges. And then the discussion you had for June quarter performance of iPhones, what are you embedding from a channel building back inventory levels in that expectation?']

相关问题