我正在尝试从csv文件中获取唯一值。下面是该文件的一个示例:
12,life,car,good,exellent
10,gift,truck,great,great
11,time,car,great,perfect
新文件中所需的输出如下:
12,10,11
life,gift,time
car,truck
good.great
excellent,great,perfect
下面是我的代码:
def attribute_values(in_file, out_file):
fname = open(in_file)
fout = open(out_file, 'w')
# get the header line
header = fname.readline()
# get the attribute names
attrs = header.strip().split(',')
# get the distinct values for each attribute
values = []
for i in range(len(attrs)):
values.append(set())
# read the data
for line in fname:
cols = line.strip().split(',')
for i in range(len(attrs)):
values[i].add(cols[i])
# write the distinct values to the file
for i in range(len(attrs)):
fout.write(attrs[i] + ',' + ','.join(list(values[i])) + '\n')
fout.close()
fname.close()
代码当前输出如下:
12,10
life,gift
car,truck
good,great
exellent,great
12,10,11
life,gift,time
car,car,truck
good,great
exellent,great,perfect
我该如何解决这个问题?
1条答案
按热度按时间2vuwiymt1#
您可以尝试使用
zip
迭代输入文件的列,然后消除重复项:示例文件的结果: