csv 有没有一种方法可以通过代码来记录提供ParserError的文件?

uurity8g  于 2023-06-19  发布在  其他
关注(0)|答案(2)|浏览(129)

我有一堆CSV文件需要处理。每个文件都应该包含一定数量的表,这些表保存信息。我使用代码来提取表,将它们分配为 Dataframe ,并在导出时将它们连接起来。
FILE FORMAT WITH MULTIPLE TABLES

df1 = pd.read_csv (“file.csv”, sep=’\t’, encoding = ‘utf-16', header = 0, nrows = 1, usecols [1 , 2,  3])
df1.columns = ‘TABLE1_’ + df1.columns
df2 = pd.read_csv (“file.csv”, sep=’\t’, encoding = ‘utf-16', header = 2, nrows = 1, usecols [1, 2, 3])
df2.columns = ‘TABLE2_’ + df2.columns
df3 = pd.read_csv (“file.csv”, sep=’\t’, encoding = ‘utf-16', header = 4, nrows = 1, usecols [1, 2, 3])
df3.columns = ‘TABLE3_’ + df3.columns
df4 = pd.read_csv (“file.csv”, sep=’\t’, encoding = ‘utf-16', header = 6, nrows = 1, usecols [1, 2, 3])
df4.columns = ‘TABLE4_’ + df4.columns

df5 = df1.join([df2, df3, df4])
df5.to_csv(“file_output.csv”, sep = ’,’, encoding = ‘utf-8’, header = True, index = False)

但是,有些文件没有从主数据库正确导出,因此丢失了其中的一些表。这会导致运行代码时出错;
解析器错误:传递的头=n,在文件中只找到n行
我目前正试图集思广益,探讨如何解决这个问题。我考虑过的方法之一是编写一个条件,如果错误被引发,而不是停止代码,而是在文本文档中记录文件,然后继续。然而,我不知道如何做到这一点。

qvtsj1bj

qvtsj1bj1#

import pandas as pd
def process_file(file_path):
try:
   df1 = pd.read_csv (“file.csv”, sep=’\t’, encoding = ‘utf-16', header = 0, nrows = 1, usecols [1 , 2,  3])
   df1.columns = ‘TABLE1_’ + df1.columns
   df2 = pd.read_csv (“file.csv”, sep=’\t’, encoding = ‘utf-16', header = 2, nrows = 1, usecols [1, 2, 3])
   df2.columns = ‘TABLE2_’ + df2.columns
  df3 = pd.read_csv (“file.csv”, sep=’\t’, encoding = ‘utf-16', header = 4, nrows = 1, usecols [1, 2, 3])
  df3.columns = ‘TABLE3_’ + df3.columns
  df4 = pd.read_csv (“file.csv”, sep=’\t’, encoding = ‘utf-16', header = 6, nrows = 1, usecols [1, 2, 3])
 df4.columns = ‘TABLE4_’ + df4.columns

  df5 = df1.join([df2, df3, df4])
  df5.to_csv(“file_output.csv”, sep = ’,’, encoding = ‘utf-8’, header = True, index = False)
except pd.errors.ParserError:
   
    with open('error_log.txt', 'a') as f:
        f.write(f'Error processing file: {file_path}\n')

     file_list = ['file1.csv', 'file2.csv', 'file3.csv']

    for file in file_list:
   process_file(file)
axr492tv

axr492tv2#

您应该使用Try Except块来捕获错误。下面的代码捕获所有错误并将错误和相关文件名记录在List中,您可以将其打印或写入文件。如果有用的话,可以将该进程放入一个循环中以处理List中的所有文件。

import pandas as pd

bad_files = []

def load_file(df, filename, _sep = None, _encoding = None, _header = None, _nrows = None, _usecols = None):
    try:
        print(_header, _nrows, _usecols)
        df=pd.read_csv(filename, sep= _sep, encoding = _encoding, header = _header, nrows = _nrows, usecols = _usecols)
    except Exception as e:
        bad_files.append(f"{filename}  {e}")

df1 = pd.DataFrame()   #created empty dataframe 
load_file(df1, "file1.csv", _sep='\t', _encoding = 'utf-16', _header = 0, _nrows = 1, _usecols = [1, 2, 3])
#and so on

print(bad_files)

相关问题