python-3.x .csv文件的Unicode解码错误

wdebmtf2 于 2022-12-14 发布在 Python

关注(0)|答案(3)|浏览(121)

我有一个很基本的Python问题。
我试图编写一个脚本来消除一些.csv文件中的一堆空行，我编写的脚本可以在大约90%的文件上工作，但有一些文件会向我抛出以下错误：

Traceback (most recent call last):
  File "/Users/stephensmith/Documents/Permits/deleterows.py", line 17, in <module>
    deleteRow(file, "output/" + file)
  File "/Users/stephensmith/Documents/Permits/deleterows.py", line 8, in deleteRow
    for row in csv.reader(input):
  File "/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/codecs.py", line 319, in decode
    (result, consumed) = self._buffer_decode(data, self.errors, final)
  File "/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/encodings/utf_8_sig.py", line 69, in _buffer_decode
    return codecs.utf_8_decode(input, errors, final)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xa2 in position 6540: invalid start byte

下面是我的代码：

import csv
import os

def deleteRow(in_fnam, out_fnam):
    input = open(in_fnam, 'r')
    output = open(out_fnam, 'w')
    writer = csv.writer(output)
    for row in csv.reader(input):
        if any(row):
            writer.writerow(row)
    input.close()
    output.close()

for file in os.listdir("/Users/stephensmith/Documents/Permits/"):
    print(file)
    if file.endswith(".csv"):
        deleteRow(file, "output/" + file)

我试过在两个open（）语句中添加encoding ='utf-8'、='ascii'和='latin 1'，但没有成功。：-（知道我做错了什么吗？.csv文件是用Excel for Mac 2011创建的，如果有帮助的话。

python-3.x

来源：https://stackoverflow.com/questions/30871741/unicode-decode-error-with-csv-file

3条答案

按热度按时间

m528fe3b1#

也许您可以尝试使用类似以下内容循环遍历崩溃的csv文件：

with open(file) as f:
    for line in f:
        print repr(line)

看看有没有可疑人物出现。
如果你能用这种方法识别出可疑字符，比如 \0Xý1 弹出，你可以重写并替换这个字符来清理文件：

with open(file) as f:
    with open(file.rstrip(".csv") + "_fixed.csv") as g:
        for line in f:
            g.write(line.replace('\0Xý1', ''))

然后用清理过的文件再试一次。

赞(0）回复(0）举报 2022-12-14

e37o9pze2#

这是一个编码问题，输入的csv文件不是Python平台所期望的那种UTF-8编码，问题是，如果不知道它的编码，也没有一个错误行的例子，我真的猜不出它的编码。
encoding='utf8'和encoding='ascii'都被打断是正常的，因为有问题的字符是0xa 2，它不在ascii范围内（〈= 0x 7 f），不是一个有效的utf-8字符。但很奇怪的是，encoding='latin1'在同一个地方给出了同样的错误，因为0xa 2在latin 1中是¢。
IMHO，根据this other SO post，如果您的平台支持encoding='windows-1252'，您可以尝试它。
如果它仍然不起作用，您应该尝试找出latin1的违规行：

class special_opener:
    def __init__(self, filename, encoding):
        self.fd = open(filename, 'rb')
        self.encoding = encoding
    def __enter__(self):
        return self
    def __exit__(self, exc_type, exc_value, traceback):
        return False
    def __next__(self):
        line = next(self.fd)
        try:
            return line.decode(self.encoding).strip('\r\n') + '\n'
        except Exception as e:
            print("Offending line : ", line, file = sys.stderr)
            raise e
    def __iter__(self):
        return self

def deleteRow(in_fnam, out_fnam):
    input = special_opener(in_fnam, 'latin1')
    output = open(out_fnam, 'w')
    writer = csv.writer(output)
    for row in csv.reader(input):
        if any(row):
            writer.writerow(row)
    input.close()
    output.close()

special_opener应输出如下内容：

Offending line :  b'a,\xe9,\xe8,d\r\n'
Traceback (most recent call last):
    ...

(this行是有效的latin 1，我用special_opener(file, 'utf8')得到的）
然后您就可以在此处发布冒犯行

赞(0）回复(0）举报 2022-12-14

toiithl63#

我也遇到过类似的问题。在我的例子中，csv不是很大，在LibreOffice Calc中打开它并保存回来修复了它。

赞(0）回复(0）举报 2022-12-14

我来回答

python-3.x .csv文件的Unicode解码错误

3条答案

相关问题

热门标签

最新问答