计算csv中各列的氨基酸组成百分比

egdjgwm8  于 2022-12-15  发布在  其他
关注(0)|答案(2)|浏览(142)

示例文件:

Column header 95: A|T|E|A|A|Y|E|A|E|A
Column header 96: W|I|Q|Q|A|L|P|K|E|A
Column header 97: S|D|F|Q|G|Y|E|A|E|A

我想从csv文件中计算每列氨基酸组成的百分比。我只能计算第一列,但无法迭代其余列并打印所有列的百分比。

import csv
with open ('test.csv', 'r') as f:
    reader = csv.reader(f)
    column = [row[0] for row in reader]
    amino_acids = {}
    for aa in column:
        if aa in amino_acids:
            amino_acids[aa] += 1
        else:
            amino_acids[aa] = 1
    for aa, count in amino_acids.items():
        #print(f'{aa}: {count}')
        percentage = count / len (column) *100
        print (f"{aa}: {percentage: .2f}%")

预期产出:

column header 95:
A=50%
E=30% and so on
similarly for the remaining columns.

请建议

vxf3dgd4

vxf3dgd41#

不清楚输入方式,但可以对每一行应用以下代码,
代码:

s = 'A|T|E|A|A|Y|E|A|E|A'.split('|')
['{}={}%'.format(i, ls.count(i)/len(ls)*100) for i in set(ls)]

输出:

['T=10.0%', 'A=50.0%', 'E=30.0%', 'Y=10.0%']

mklgxw1f

mklgxw1f2#

过程使用基本Python文件读取,因为不是CSV文件

代码

with open('test.csv', 'r') as f:
    for line in f:
        line = line.rstrip().split(':')         # remove trailing '\' and split on ':'
        column_info, sequence = line            # separate into colum info and amino acid sequence
        sequence = sequence.strip().split('|')  # remove leading & trailing whitesplace and split on '|'
        amino_acids = {}                        # Get count of each amino acid
        for aa in sequence:
            amino_acids[aa] = amino_acids.get(aa, 0) + 1
            
        total = sum(count for count in amino_acids.values())                     # total of all counts
        
        # sort count by amino acids (not necessary, but better for displaying)
        amino_acids = dict(sorted(amino_acids.items(), key = lambda kv: kv[0]))   
                  
        print(column_info)
               
        # Output percentages
        for aa, count in amino_acids.items():
            percentage = count / total *100                            
            print (f"{aa}={percentage: .2f}%")

产出

Column header 95
A= 50.00%
E= 30.00%
T= 10.00%
Y= 10.00%
Column header 96
A= 20.00%
E= 10.00%
I= 10.00%
K= 10.00%
L= 10.00%
P= 10.00%
Q= 20.00%
W= 10.00%
Column header 97
A= 20.00%
D= 10.00%
E= 20.00%
F= 10.00%
G= 10.00%
Q= 10.00%
S= 10.00%
Y= 10.00%

相关问题