如何使用python库regex提取2位或4位数字?执行\d{2,4}也可以提取具有3个连续数字的子字符串

ix0qys7i  于 2023-04-13  发布在  Python
关注(0)|答案(3)|浏览(168)

我想从一个文本中提取年份。在这个文本中,年份可以显示为2或4位数字。。我应该使用哪个正则表达式来实现?
\d{2,4}还提取了3个连续数字的子字符串,这不是我想要的。

j8yoct9x

j8yoct9x1#

实际上,限定符{2,4}允许3。相反,指定您想要两个数字,后面可选地跟2个以上。您还需要确保没有 * 更多 * 个数字与匹配项相邻。为此,您可以使用\b。因此,给出:

\b\d\d(\d\d)?\b
8hhllhi2

8hhllhi22#

您可以使用\b专门搜索两个符合条件的长度\d{2}|\d{4},周围没有其他数字:

\b(\d{2}|\d{4})\b
ibrsph3r

ibrsph3r3#

使用findall和\d{2,4}并向后查找年份短语或/

def extract_year(text):
    """Extract year from text using regular expression"""
    pattern = "(?<=\d\/)\d{2,4}\s|(?<=\syear)\s+\d{2,4}" # regular expression pattern to look for
    result = re.findall(pattern, text)
    return result

# test cases
text1 = "The 10/24/1990  1/2/1992 Academy Awards, also called the Oscars, are a set of awards given annually for excellence of cinematic achievements. The awards, organized by the Academy of Motion Picture Arts and Sciences (AMPAS), were first handed out in 1929. At first, the Academy Awards ceremony was held in various locations such as a banquet hall of a hotel, until it finally settled at the Hollywood Roosevelt Hotel in the year 1929. The first Academy Awards ceremony consisted of a private dinner for 250 people in the Blossom Room of the Roosevelt Hotel, with an audience of five hundred people."
text2 = "The 10/24/91 Academy Awards, also called the Oscars, are a set of awards given annually for excellence of cinematic achievements. The awards, organized by the Academy of Motion Picture Arts and Sciences (AMPAS), were first handed out in 1929. At first, the Academy Awards ceremony was held in various locations such as a banquet hall of a hotel, until it finally settled at the Hollywood Roosevelt Hotel in the year 1929. The first Academy Awards ceremony consisted of a private dinner for 250 people in the Blossom Room of the Roosevelt Hotel, with an audience of five hundred people. Since then, the Academy Awards ceremonies take place every year except in the year 1943. That year, due to the World War II, the ceremony was not held. The Academy Awards ceremony is one of the oldest award ceremonies in the United States."

print(extract_year(text1))
print(extract_year(text2))

输出:

['1990 ', '1992 ', '1929']
['91 ', '1929', '1943']

相关问题