python-3.x 从文本文件中删除特殊字符

xa9qqrwz  于 2022-12-05  发布在  Python
关注(0)|答案(3)|浏览(158)

我的input.txt如下

\n\n            \n        \n\n    \n\n\r\n\r\n    \r\n\r\n\r\n\r\n\r\n\r\n\r\n\r\n\r\n\r\n        \r\n        \r\n    \r\n\r\n\r\n    \r\n\r\n\r\n\r\n        \r\n\r\n\r\n\r\n\r\n\r\n\r\n        hello boys\r\n        boys\r\n    boys\r\n\r\n         Download PDF\r\n        PDF\r\n    X\r\n    Close Window\r\n\r\n\r\n\r\n    
This boys text has undergone conversion so that it is mobile and web-friendly. This may have created formatting or alignment issues. Please refer to the PDF copy for a print-friendly version.
\r\n\r\n\r\n\r\n\r\n\r\n\r\n    \r\n        
BOYS CLUB AUSTRALIA
\r\n
26 July 2019
\r\n
hello boys
\r\n
\r\nhello boys
\r\n
--------------------------------------------------------------------------------------------------------------------------------------
\r\n
Introduction
\r\n\r\n    
1. \r\n    
This letter to let you know that your application has been successful with our school
\r\n

我正在尝试删除不必要的模式,如“\n\n”、“\r\r”、“\r\n\r\n\r\n\r\n\r\n\r\n\r\n\r\n\r\n”。分析时,我希望删除所有特殊模式,并只使用文本和数字。
我已经试过了。

with open (data, "r", encoding='utf-8') as myfile:
    for line in myfile:
        line.rstrip()
        line = re.sub(r'\r\n', '', line)
            with open("out.txt", "a",  encoding='utf-8') as output:
                output.write(line)

但即使是'\r\n'也不会从输出档中移除。谢谢。

k4ymrczo

k4ymrczo1#

您可以使用replace()方法,以空字串取代\r\n子字串。

with open(data, 'r', encoding='utf-8') as myfile:
    with open('out.txt', 'a',  encoding='utf-8') as output:
        for line in myfile:
            output.write(line.replace('\r\n', '').rstrip())
o7jaxewo

o7jaxewo2#

在Python中,要删除文本文件中的特殊字符,如换行符(\n),可以使用str类的replace()方法,该方法允许用其他字符或字符串替换特定的字符或字符串。

x7yiwoj4

x7yiwoj43#

您可以使用replace,也可以使用str.isprintable来过滤掉不可打印的字符:

input_file = 'input.txt'
output_file = 'output.txt'

with open (input_file, "r", encoding='utf-8') as infile:
    with open(output_file, "w", encoding='utf-8') as outfile:
        for line in infile:
            outfile.write(
                ''.join(filter(str.isprintable, line.replace('\\n', '').replace('\\r', '')))
                + '\n'
            )

因此,输出为:

hello boys        boys    boys         Download PDF        PDF    X    Close Window    
This boys text has undergone conversion so that it is mobile and web-friendly. This may have created formatting or alignment issues. Please refer to the PDF copy for a print-friendly version.
            
BOYS CLUB AUSTRALIA

26 July 2019

hello boys

hello boys

--------------------------------------------------------------------------------------------------------------------------------------

Introduction
    
1.     
This letter to let you know that your application has been successful with our school

相关问题