python不是应该处理拉丁字符吗？

3vpjnl9f 于 2023-11-20 发布在 Python

关注(0)|答案(1)|浏览(91)

我的最终目标是将一个文本文件简化为纯单词，然后将其转换为一个新的文本文件。然而，它是法语的，并且使用了拉丁字符，如é，或。然而，我的代码只将它们转换为空格，而它使用的是空格字符。
例如，它将“Messieurs les Présidents”改为“messieurs les prsidents”，

def convert():
    for i in files_names:
        f1 = open(f"speeches/{i}","r")
        L = f1.readlines()
        cleaned_text=" "
        for j in L:
            for k in j :
                if ord(k)>=65 and ord(k)<=90: #Changing to lower case
                    f=chr(ord(k)+32)
                    cleaned_text+=f
                elif (ord(k)>=97 and ord(k)<=122): #keeping lower case letters
                    cleaned_text+=k
                elif (ord(k)>=224 and ord(k)<=254): #keeping lower case latins
                    cleaned_text+=k
                    print(k)
                else:
                    if cleaned_text[-1]!=" ":
                        cleaned_text+=" "
        f1.close()
        f2 = open(f"./cleaned/{i}","w")
        for i in cleaned_text[1:]:
            f2.write(i)
        f2.close()

字符串
这就是我的代码看起来的样子，我添加了一个单独的if语句来打印拉丁文中的任何条目，但没有。

python-3.x

来源：https://stackoverflow.com/questions/77506552/isnt-python-supposed-to-work-with-latin-characters

1条答案

按热度按时间

a6b3iqyw1#

最后，这个问题不是python的问题，因为它是基于UTF-8的。它是与os导入，不能自动打开UTF-8文件，必须被告知。这里是修复和清理的代码，任何人都希望有一点乐趣。

def convert():
for i in files_names: #here the file names were extracted outside of the function
    f1 = open(f"............../{i}","r", encoding="utf-8") #enter the name of your directory
    start_text = f1.readlines()
    cleaned_text=" "
    for lines in start_text:
        for letters in lines.lower() :
            if letters in "abcdefghijklmnopqrstuvwxyzüéâäåçêëèïîìôöòûùÿáíóúñà":
                cleaned_text+=letters
            else:
                if cleaned_text[-1]!=" ": #only adding a space if there isn't one already
                    cleaned_text+=" "
    f1.close()
    f2 = open(f"./................./{i}","w", encoding='utf-8') #enter the name of your new directory
    f2.write(cleaned_text)
    f2.close()

字符串

赞(0）回复(0）举报 2023-11-20

我来回答

python不是应该处理拉丁字符吗？

1条答案

相关问题

热门标签

最新问答