regex 如何从文本块中提取包含关键字的句子

我的目标是拿出一个脚本，将搜索一个特定的关键字的日志文件的文件夹，并输出到一个results.txt文件的文件名，行号在每个文件中包含的关键字，索引的关键字开始的地方和包含关键字的文本的完整行。
我已经创建了一些代码来实现这一点，但它在示例中存在问题，例如：
大家好，我们计划在本周末进行一些夜间维护，这意味着您将无法在周五晚上/周六早上（23/06/23至24/06/23）晚上7点至上午10点之间使用网络上的任何设备。我们对由此造成的不便表示歉意，但这是不可避免的。请确保您已在周五晚上（23/06/23）下午6点30分之前退出网络。
它正确地将关键字“device”标识为第1行，并从字符114开始，并且非常正确地显示了包含关键字“device”的整个文本块，而我希望它只显示出现关键字“device”的句子。
我在想：

对于每个“设备”，查找前一个句号之后和下一个句号之前的文本，或
获取“device”前后的n个字符

以下是我到目前为止编写的代码：

#Import os module
import os
fname2 = "D:\X250\Python_Scripts\Search_File_for_Keyword_and_Print_Line\Results.txt"

# String to search
search_path = input("Enter directory path to search : ")
file_type = input("File Type : ")
search_str = input("Enter the search string : ")

#**Create Output File**
fw = open(fname2, 'w')

# Append a directory separator if not already present
if not (search_path.endswith("/") or search_path.endswith("\\") ): 
        search_path = search_path + "/"
                                                          
# If path does not exist, set search path to current directory
if not os.path.exists(search_path):
        search_path ="."

# Repeat for each file in the directory  
for fname in os.listdir(path=search_path):

   # Apply file type filter   
   if fname.endswith(file_type):

        # Open file for reading
        fo = open(search_path + fname)

        # Read the first line from the file
        line = fo.readline()

        # Initialize counter for line number
        line_no = 1

        # Loop until EOF
        while line != '' :
                # Search for string in line
                index = line.find(search_str)
                if ( index != -1) :
                    print(fname, "[", line_no, ",", index, "] ", line, sep="")
                    #Write Output File
                    fw.write(fname + " " + str(line_no) + " " + str(index)+"  ")
                    fw.write(line)

               

                # Read next line
                line = fo.readline()  

                # Increment line counter
                line_no += 1

                

        # Close the files
        fo.close()

字符串

类似这样的东西应该会起作用：

text = "Hi everyone, we've planned some overnight maintenance this weekend so that means you will not be able to use any device on the network between 7pm and 10am on this coming Friday evening/ Saturday morning (23/06/23 to 24/06/23). We apologise for the inconvenience this will cause but it is unavoidable. Please ensure you have logged out of the network by 6.30pm on Friday evening (23/06/23)."

#split text into sentences
sentences = text.split(".")

# filter to only sentences with "device" in them 
sentences_with_device = [sentence for sentence in sentences if "device" in sentence]

# using regex
import re
# this looks for, in order, all of the following:
# 1. anything that is not a period (.) 0 or more times
# 2. the word "device"
# 3. anything that is not a period (.) 0 or more times
# 4. a period (.)
sentences_with_device = re.findall(r'([^.]*?device[^.]*\.)', text)

字符串

regex 如何从文本块中提取包含关键字的句子

1条答案

相关问题

热门标签

最新问答