如何在Python中使用Regex匹配整个字符串?

ubof19bj  于 2023-11-20  发布在  Python
关注(0)|答案(1)|浏览(135)

我正在尝试在Python中构建一个正则表达式模式,它将匹配像这样的字符串:
“汽车失窃-大(950.01美元及以上)",“汽车失窃-被盗”,“运输工厂(机场)",“5600 N FIGUEROA”和“400 WORLD WY”ST。

import re

hello = {"meta": 1, "reza": [[ "row-f696.af3d.c3v9", "00000000-0000-0000-2D2F-EA38F9F11DB9", 0, 1642111191, 1642111191, "{ }", "201412343", "2020-06-15T00:00:00", "2020-06-15T00:00:00", "0700", "14", "Pacific", "1494", "1", "331", "THEFT FROM MOTOR VEHICLE - GRAND ($950.01 AND OVER)", "1606 0344 1300 1402", "60", "F", "W", "212", "TRANSPORTATION FACILITY (AIRPORT)", "IC", "Invest Cont", "331", "998", "400    WORLD                        WY", "33.9433", "-118.4072" ] ,
        [ "row-f2wh.yte2-zhv8", "00000000-0000-0000-0BF4-2A6281C66DEF", 0, 1636553859, 1636553859, "{ }", "201107194", "2020-03-11T00:00:00", "2020-03-11T00:00:00", "1100", "11", "Northeast", "1118", "1", "510", "VEHICLE - STOLEN", "0", "108", "PARKING LOT", "IC", "Invest Cont", "510", "5600 N  FIGUEROA                     ST", "34.114", "-118.1949" ]]}
crime = []
for items in hello["reza"]:
    for item in items:
        pattern = re.compile(r'[A-Z].*')
        crime = re.findall(pattern,str(item))

print(crime)

字符串

eh57zj3b

eh57zj3b1#

代码中最明显的问题是,在嵌套循环的每次迭代中,你都是crime。因此,你将打印最后一次findall调用的结果。由于findall返回一个列表(str(item)中所有匹配项的列表),因此最终得到一个空列表(因为最后一项中没有匹配项)。
此外,你没有描述如何过滤结果,你的模式[A-Z].*将匹配以小写字母开头的字符串,但它显然会排除5600 N FIGUEROA
这里有一个建议,检查至少有三个空格的字符串,并且不是以-直接跟随的数字开头(也用一个空格替换多个空格):

import re

hello = {"meta": 1, "reza": [[ "row-f696.af3d.c3v9", "00000000-0000-0000-2D2F-EA38F9F11DB9", 0, 1642111191, 1642111191, "{ }", "201412343", "2020-06-15T00:00:00", "2020-06-15T00:00:00", "0700", "14", "Pacific", "1494", "1", "331", "THEFT FROM MOTOR VEHICLE - GRAND ($950.01 AND OVER)", "1606 0344 1300 1402", "60", "F", "W", "212", "TRANSPORTATION FACILITY (AIRPORT)", "IC", "Invest Cont", "331", "998", "400    WORLD                        WY", "33.9433", "-118.4072" ] ,
        [ "row-f2wh.yte2-zhv8", "00000000-0000-0000-0BF4-2A6281C66DEF", 0, 1636553859, 1636553859, "{ }", "201107194", "2020-03-11T00:00:00", "2020-03-11T00:00:00", "1100", "11", "Northeast", "1118", "1", "510", "VEHICLE - STOLEN", "0", "108", "PARKING LOT", "IC", "Invest Cont", "510", "5600 N  FIGUEROA                     ST", "34.114", "-118.1949" ]]}
crime = []
pattern = re.compile(r'(?!\d+-).*[A-Z]{3,}')
for items in hello["reza"]:
    for item in items:
        if isinstance(item, str) and re.match(pattern, item):
            crime.append(re.sub(r'\s+', ' ', item))

print(crime)

字符串
输出量:

['THEFT FROM MOTOR VEHICLE - GRAND ($950.01 AND OVER)', 'TRANSPORTATION FACILITY (AIRPORT)', '400 WORLD WY', 'VEHICLE - STOLEN', 'PARKING LOT', '5600 N FIGUEROA ST']

相关问题