从csv文件获取唯一值,输出到新文件

qoefvg9y  于 2022-12-06  发布在  其他
关注(0)|答案(1)|浏览(135)

我正在尝试从csv文件中获取唯一值。下面是该文件的一个示例:

12,life,car,good,exellent
10,gift,truck,great,great
11,time,car,great,perfect

新文件中所需的输出如下:

12,10,11
life,gift,time
car,truck
good.great
excellent,great,perfect

下面是我的代码:

def attribute_values(in_file, out_file):
    fname = open(in_file)
    fout = open(out_file, 'w')

    # get the header line
    header = fname.readline()
    # get the attribute names
    attrs = header.strip().split(',')

    # get the distinct values for each attribute
    values = []
    
    for i in range(len(attrs)):
        values.append(set())

    # read the data
    for line in fname:
        cols = line.strip().split(',')
        
        for i in range(len(attrs)):
            values[i].add(cols[i])

        # write the distinct values to the file
        for i in range(len(attrs)):
            fout.write(attrs[i] + ',' + ','.join(list(values[i])) + '\n')

    fout.close()
    fname.close()

代码当前输出如下:

12,10
life,gift
car,truck
good,great
exellent,great
12,10,11
life,gift,time
car,car,truck
good,great
exellent,great,perfect

我该如何解决这个问题?

2vuwiymt

2vuwiymt1#

您可以尝试使用zip迭代输入文件的列,然后消除重复项:

import csv

def attribute_values(in_file, out_file):
    with open(in_file, "r") as fin, open(out_file, "w") as fout:
        for column in zip(*csv.reader(fin)):
            items, row = set(), []
            for item in column:
                if item not in items:
                    items.add(item)
                    row.append(item)
            fout.write(",".join(row) + "\n")

示例文件的结果:

12,10,11
life,gift,time
car,truck
good,great
exellent,great,perfect

相关问题