Python -检查文件名中的字符串是否准确

envsm3lx 于 2022-12-10 发布在 Python

关注(0)|答案(3)|浏览(173)

我有一个文件夹，其中每个文件都以数字命名（例如img 1、img 2、img-3、4-img等）。我希望按 exact 字符串获取文件（所以如果我输入“4”作为输入，它应该只返回带有“4”的文件，而不是任何包含“14”或“40”的文件，我的问题是程序返回所有的文件，只要它匹配字符串。注意，数字并不总是在同一个地方（对于相同的文件，它在最后，对于其他的文件，它在中间）
例如，如果我的文件夹中有文件['ep 4', 'xxx 3 ', 'img4', '4xxx', 'ep-40', 'file.mp4', 'file 4.mp4', 'ep.4.', 'ep.4 ', 'ep. 4. ',ep4xxx, 'ep 4 ', '404ep']，而我只需要文件中的文件数正好是4，那么我只需要返回['ep 4', 'img4', '4xxx','file 4.mp4','ep.4.','ep.4 ', 'ep. 4. ',ep4xxx,'ep 4 ','404ep]
下面是我所拥有（在本例中，我只想返回所有mp4文件类型）

for (root, dirs, file) in os.walk(source_folder):
    for f in file:
        if '.mp4' and ('4') in f:
            print(f)

已尝试==而不是in

python

来源：https://stackoverflow.com/questions/74737938/python-check-for-exact-string-in-file-name

3条答案

按热度按时间

58wvjzkj1#

根据您的输入判断，所需的正则表达式需要满足以下条件：
1.与提供的数字完全匹配
1.忽略文件扩展名中的数字匹配项（如果存在）
1.处理包含空格的文件名
我认为这将满足所有这些要求：

def generate(n):
    return re.compile(r'^[^.\d]*' + str(n) + r'[^.\d]*(\..*)?$')

def check_files(n, files):
    regex = generate(n)
    return [f for f in files if regex.fullmatch(f)]

用法：

>>> check_files(4, ['ep 4', 'xxx 3 ', 'img4', '4xxx', 'ep-40', 'file.mp4', 'file 4.mp4'])
['ep 4', 'img4', '4xxx', 'file 4.mp4']

请注意，这个解决方案涉及到创建一个Pattern对象，并使用该对象检查每个文件。与直接使用模式和文件名调用re.fullmatch相比，这个策略提供了性能优势，因为不必为每个调用编译模式。
这种解决方案有一个缺点：它假设文件名的格式为name.extension，并且您要搜索的值位于name部分。由于正则表达式的贪婪特性，如果您允许文件名包含.，则无法从搜索中排除扩展名。因此，修改此表达式以匹配ep.4也会导致它匹配file.mp4。也就是说，有一种解决方法，即在进行匹配之前从文件名中去除扩展名：

def generate(n):
    return re.compile(r'^[^\d]*' + str(n) + r'[^\d]*$')

def strip_extension(f):
    return f.removesuffix('.mp4')

def check_files(n, files):
    regex = generate(n)
    return [f for f in files if regex.fullmatch(strip_extension(f))]

请注意，此解决方案现在在匹配条件中包括.，并且不排除扩展名。相反，它依赖于预处理（strip_extension函数）在匹配之前从文件名中删除任何文件扩展名。
作为补充，有时候你会得到带有前缀为零的文件（例如004、0001等）。你可以修改正则表达式来处理这种情况：

def generate(n):
    return re.compile(r'^[^\d]*0*' + str(n) + r'[^\d]*$')

赞(0）回复(0）举报 2022-12-10

nxowjjhe2#

我们可以使用re.search沿着正则表达式选项的列表解析：

files = ['ep 4', 'xxx 3 ', 'img4', '4xxx', 'ep-40', 'file.mp4', 'file 4.mp4']
num = 4
regex = r'(?<!\d)' + str(num) + r'(?!\d)'
output = [f for f in files if re.search(regex, f)]
print(output)  # ['ep 4', 'img4', '4xxx', 'file.mp4', 'file 4.mp4']

赞(0）回复(0）举报 2022-12-10

xpszyzbs3#

这可以通过以下函数来实现

import os

files = ["ep 4", "xxx 3 ", "img4", "4xxx", "ep-40", "file.mp4", "file 4.mp4"]
desired_output = ["ep 4", "img4", "4xxx", "file 4.mp4"]

def number_filter(files, number):
    filtered_files = []
    for file_name in files:

        # if the number is not present, we can skip this file
        if file_name.count(str(number)) == 0:
            continue

        # if the number is present in the extension, but not in the file name, we can skip this file
        name, ext = os.path.splitext(file_name)

        if (
            isinstance(ext, str)
            and ext.count(str(number)) > 0
            and isinstance(name, str)
            and name.count(str(number)) == 0
        ):
            continue

        # if the number is preseent in the file name, we must determine if it's part of a different number
        num_index = file_name.index(str(number))

        # if the number is at the beginning of the file name
        if num_index == 0:
            # check if the next character is a digit
            if file_name[num_index + len(str(number))].isdigit():
                continue

        # if the number is at the end of the file name
        elif num_index == len(file_name) - len(str(number)):
            # check if the previous character is a digit
            if file_name[num_index - 1].isdigit():
                continue

        # if it's somewhere in the middle
        else:
            # check if the previous and next characters are digits
            if (
                file_name[num_index - 1].isdigit()
                or file_name[num_index + len(str(number))].isdigit()
            ):
                continue

        print(file_name)
        filtered_files.append(file_name)

    return filtered_files

output = number_filter(files, 4)

for file in output:
    assert file in desired_output

for file in desired_output:
    assert file in output

赞(0）回复(0）举报 2022-12-10

我来回答

Python -检查文件名中的字符串是否准确

3条答案

相关问题

热门标签

最新问答