我有以下字符串:
JOHN SMITH, YOUTUBE:
I'm having a great day today
JANE DOE, GOOGLE:
I'm going to the gym later
STEVE SMITH, FACEBOOK:
Time for people to speak
SCHMEFF SCHMEZOS, JUNGLE:
Buy something from my online shop. You might like it
字符串
您可以在此处创建字符串:
string = """JOHN SMITH, YOUTUBE:
I'm having a great day today
JANE DOE, GOOGLE:
I'm going to the gym later
STEVE SMITH, FACEBOOK:
Time for people to speak
SCHMEFF SCHMEZOS, JUNGLE:
Buy something from my online shop. You might like it"""
型
加载python包:第一个月
我试图找到一个正则表达式来根据说话者和他们的句子来分割文本,例如,我试图得到以下摘录:
string1: JOHN SMITH, YOUTUBE>>
I'm having a great day today
string2: JANE DOE, GOOGLE>>
I'm going to the gym later
string3: STEVE SMITH, FACEBOOK>>
Time for people to speak
string4: SCHMEFF SCHMEZOS, JUNGLE>>
Buy something from my online shop. You might like it
型
字符串可以跨越多行,所以我尝试捕获两组,总是有冒号的扬声器:后面有名字(有时有一个空格,所以\s),他们的讲话可以在几行。
我试图捕捉到下一个发言者,当前的正则表达式是这样的:
(^[A-Z].*):\s*\n*(?=(?:[A-Z]|$))
型
名字总是大写字母,并开始当一个新的发言者说话,任何帮助表示感谢。
我正在使用Python 3.9
新样本字符串:
JOHN SMITH, GLOBAL HEAD OF YOUTUBE : Good morning, good
afternoon, everyone . Before I hand over to facebook, I want to give a quick reminder of the reporting
changes that have taken effect this filming of a tv show.
BOBBY DUDE, GROUP FROM FACEBOOK: Thanks, john smith lets talk about movies and films we watch when we are bored parents.
型
2条答案
按热度按时间bbuxkriu1#
我们可以在这里使用
re.findall
作为正则表达式选项:字符串
cnwbcb6i2#
使用re.split,对一个空格序列进行拆分,该序列以一个新行结尾,后跟一个大写字母,直到
:
为止没有小写字母。字符串