def extract_year(text):
"""Extract year from text using regular expression"""
pattern = "(?<=\d\/)\d{2,4}\s|(?<=\syear)\s+\d{2,4}" # regular expression pattern to look for
result = re.findall(pattern, text)
return result
# test cases
text1 = "The 10/24/1990 1/2/1992 Academy Awards, also called the Oscars, are a set of awards given annually for excellence of cinematic achievements. The awards, organized by the Academy of Motion Picture Arts and Sciences (AMPAS), were first handed out in 1929. At first, the Academy Awards ceremony was held in various locations such as a banquet hall of a hotel, until it finally settled at the Hollywood Roosevelt Hotel in the year 1929. The first Academy Awards ceremony consisted of a private dinner for 250 people in the Blossom Room of the Roosevelt Hotel, with an audience of five hundred people."
text2 = "The 10/24/91 Academy Awards, also called the Oscars, are a set of awards given annually for excellence of cinematic achievements. The awards, organized by the Academy of Motion Picture Arts and Sciences (AMPAS), were first handed out in 1929. At first, the Academy Awards ceremony was held in various locations such as a banquet hall of a hotel, until it finally settled at the Hollywood Roosevelt Hotel in the year 1929. The first Academy Awards ceremony consisted of a private dinner for 250 people in the Blossom Room of the Roosevelt Hotel, with an audience of five hundred people. Since then, the Academy Awards ceremonies take place every year except in the year 1943. That year, due to the World War II, the ceremony was not held. The Academy Awards ceremony is one of the oldest award ceremonies in the United States."
print(extract_year(text1))
print(extract_year(text2))
3条答案
按热度按时间j8yoct9x1#
实际上,限定符
{2,4}
允许3。相反,指定您想要两个数字,后面可选地跟2个以上。您还需要确保没有 * 更多 * 个数字与匹配项相邻。为此,您可以使用\b
。因此,给出:8hhllhi22#
您可以使用
\b
专门搜索两个符合条件的长度\d{2}|\d{4}
,周围没有其他数字:ibrsph3r3#
使用findall和\d{2,4}并向后查找年份短语或/
输出: