字符串中的CSV通配符

ctrmrzij 于 2022-12-06 发布在其他

关注(0)|答案(2)|浏览(168)

我正在使用CSV文件来读取我在文本文件中查找的术语。
我想使用通配符或“类似”作为术语。我不是在文本文档名称中查找通配符，而是在CSV文件中查找要搜索的术语。
例如：要在文本文件中搜索的CSV文件中的术语，每个术语都在其自己的行中。
test* #搜索测试、测试、测试等项目
列表
工作台
一种椅子
CSV文件中是否可以使用通配符，以便返回该单词的所有变体？我想将通配符放在CSV文件中。
下面是我的代码，它读取我搜索的术语的文件是contract_search_terms.csv

def main():
    txt_filepaths = glob.glob("**/*.txt", recursive=True)
    start = time.time()
    results = {}
    new_results = [] #place dictionary values organized per key = value instead of key = tuple of values 
    term_filename = open('contract_search_terms.csv', 'r') #file where terms to be searched is found
    term_file = csv.DictReader(term_filename)
    search_terms =[] #append terms into a list, this means that we can use several columns and append them all to one list.
    
    #############search for the terms imported into the list############################################################
    for col in term_file:
                
        search_terms.append(col['Contract Terms']) #indicate what columns you want read in

    print(search_terms) #this is just a check to show what terms are in the list
    
    for filepath in txt_filepaths:
        print(f"Searching document {filepath}")   #print what file the code is reading
        
        search_terms = search_terms #place terms list into search_terms so that the code below can read it when looping through the contracts.
                    
        filename = os.path.basename(filepath)
        found_terms = {} #dictionary of the terms found
        
        line_number={}
        
        
        for term in search_terms:
            if term in found_terms.keys():
                continue
                
            with open(filepath, "r", encoding="utf-8") as fp:
                lines = str(fp.readlines()).split('.') #turns contract file lines as a list
                
                for line in lines:
                    if line.find(term) != -1: #line by line is '-1', paragraph '\n'
                        line_number = lines.index(line)
                        new_results.append(f"'{term}' New_Column '{filename}' New_Column '{line}' New_Column '{line_number}'") #placing the results from the print statement below into a list
                        print(f"Found '{term}' in document '{filename}' in line '{line_number}'") 

                    if term in results.keys():
                        pages = results[''.join(term)].append([filename,line,line_number])

                    else:
                        results[term] = [filename]

                #Place results into dataframe and create a csv file to use as a check if results_reports is not correct
                d2=pd.DataFrame(new_results, columns=['Results']) #passing the list to a dataframe and giving it a column title
                d2.to_csv('results.csv', index=True)

csv

来源：https://stackoverflow.com/questions/74279330/csv-wildcard-within-string

2条答案

按热度按时间

xqkwcwgp1#

为了简化代码流，而不测试通配符的术语，然后分支到另一种搜索，我建议你通常使用正则表达式模块（在标准Python发行版中可用）：

import re

和正则表达式模式作为正则表达式搜索的项：

lst_found_terms = re.findall(term, line)
if lst_found_terms != []: 
    for found_term in lst_found_terms:

而不是：

if line.find(term) != -1:

如果你要查找的是'test'，那么正则表达式的模式将与find（）函数中的模式相同（即'test'），如果你要查找所有以'test'开头的单词，那么模式将是r'\btest\w*'。
换句话说，任何以词结尾的“通配符”都将用前缀r“\b”和结尾r“\w*”将词括起来（在CSV中存储为：\bTERM\w*）的数据。
如果在re.findall()中使用参数flags=re.I，则正则表达式可以执行不区分大小写的搜索。
在另一个答案if 'test' in line:中提出的简单条件也会对'attestation'或'contest'求值为True。为了避免这种情况，正则表达式的'wildcard'在术语的开头设置了一个单词边界（r'\b'）。
请注意，如果不限制“test”的扩展字符数，则使用通配符也会找到“testosterone”。您可以限制扩展字符数，将*替换为{0,3}，最多可添加3个字符（包括tests和testing，但不包括testosterone）。

赞(0）回复(0）举报 2022-12-06

qlvxas9a2#

在您的特定情况下（test，tests，testing），简单的in运算符可能就足够了，例如：
"test" in line对于所有三个字将评估为True，且if "test" in line:将执行所述工作。
在更复杂的情况下，可能需要使用regular expressions。

赞(0）回复(0）举报 2022-12-06

我来回答

字符串中的CSV通配符

2条答案

相关问题

热门标签

最新问答