scipy 如何将预先计算的距离矩阵(csv格式)导入到python?

eimct9ow  于 2022-11-10  发布在  Python
关注(0)|答案(1)|浏览(112)

我一直在尝试导入一个使用Pandas预先计算的距离矩阵,我想用它来制作一个使用seborn的热图。我使用了以下代码:

import pandas as pd
msa = pd.read_csv("Multiple_alignment_distance_matrix.csv")

下面的输出看起来不像距离矩阵。

sp|Q9BYW2|SETD2_HUMAN Histone-lysine N-methyltransferase SETD2 OS=Homo sapiens OX=9606 GN=SETD2 PE=1 SV=3   sp|P42858|HD_HUMAN Huntingtin OS=Homo sapiens OX=9606 GN=HTT PE=1 SV=2  sp|Q8IUH5|ZDH17_HUMAN Palmitoyltransferase ZDHHC17 OS=Homo sapiens OX=9606 GN=ZDHHC17 PE=1 SV=2 sp|O75400|PR40A_HUMAN Pre-mRNA-processing factor 40 homolog A OS=Homo sapiens OX=9606 GN=PRPF40A PE=1 SV=2  tr|F8VU11|F8VU11_HUMAN PRP40 pre-mRNA processing factor 40 homolog B (Yeast), isoform CRA_a OS=Homo sapiens OX=9606 GN=PRPF40B PE=1 SV=2    sp|Q6NWY9|PR40B_HUMAN Pre-mRNA-processing factor 40 homolog B OS=Homo sapiens OX=9606 GN=PRPF40B PE=1 SV=1  sp|P43357|MAGA3_HUMAN Melanoma-associated antigen 3 OS=Homo sapiens OX=9606 GN=MAGEA3 PE=1 SV=1 tr|A0A024RBM8|A0A024RBM8_HUMAN AMPylator FICD OS=Homo sapiens OX=9606 GN=HYPE PE=3 SV=1 sp|Q9BVA6|FICD_HUMAN Protein adenylyltransferase FICD OS=Homo sapiens OX=9606 GN=FICD PE=1 SV=2 tr|B3KSH4|B3KSH4_HUMAN Huntingtin interacting protein 2, isoform CRA_a OS=Homo sapiens OX=9606 GN=HIP2 PE=2 SV=1    tr|B4DIZ2|B4DIZ2_HUMAN cDNA FLJ57995, moderately similar to Ubiquitin-conjugating enzyme E2-25 kDa OS=Homo sapiens OX=9606 PE=2 SV=1    sp|P61086|UBE2K_HUMAN Ubiquitin-conjugating enzyme E2 K OS=Homo sapiens OX=9606 GN=UBE2K PE=1 SV=3
0   sp|Q9BYW2|SETD2_HUMAN Histone-lysine N-methylt...   2564    409 69  114 109 107 41  89  89  9   13  19
1   sp|P42858|HD_HUMAN Huntingtin OS=Homo sapiens ...   409 3142    90  126 143 143 59  58  58  15  14  18
2   sp|Q8IUH5|ZDH17_HUMAN Palmitoyltransferase ZDH...   69  90  632 5   10  10  1   16  16  0   2   2
3   sp|O75400|PR40A_HUMAN Pre-mRNA-processing fact...   114 126 5   957 502 498 15  5   5   0   0   0
4   tr|F8VU11|F8VU11_HUMAN PRP40 pre-mRNA processi...   109 143 10  502 892 870 17  3   3   0   0   0
5   sp|Q6NWY9|PR40B_HUMAN Pre-mRNA-processing fact...   107 143 10  498 870 871 16  3   3   0   0   0
6   sp|P43357|MAGA3_HUMAN Melanoma-associated anti...   41  59  1   15  17  16  314 1   1   0   0   0
7   tr|A0A024RBM8|A0A024RBM8_HUMAN AMPylator FICD ...   89  58  16  5   3   3   1   458 458 19  29  42
8   sp|Q9BVA6|FICD_HUMAN Protein adenylyltransfera...   89  58  16  5   3   3   1   458 458 19  29  42
9   tr|B3KSH4|B3KSH4_HUMAN Huntingtin interacting ...   9   15  0   0   0   0   0   19  19  97  67  97
10  tr|B4DIZ2|B4DIZ2_HUMAN cDNA FLJ57995, moderate...   13  14  2   0   0   0   0   29  29  67  139 139
11  sp|P61086|UBE2K_HUMAN Ubiquitin-conjugating en...   19  18  2   0   0   0   0   42  42  97  139 200

列看起来不错,但是行被索引了(如0、1、2...)。我已经尝试使用它来创建热图

import seaborn as sns
sns.heatmap(msa)

但是我得到了一个TypeError。我试着阅读Pandas和scipy文档。但是我很难理解它。

txu3uszq

txu3uszq1#

正如我所预料的那样,可以将index_col=0参数添加到read_csv函数中:

import pandas as pd
import seaborn as sns
df = pd.read_csv('Multiple_alignment_distance_matrix.csv', index_col=0)
sns.heatmap(df)

奖励:更好听的名字

def prot_name(s):
    import re
    match = re.search('^[^ ]+ (.*) OS=', s)
    if match:
        return match.group(1)

sns.heatmap(df.rename(columns=prot_name, index=prot_name))

相关问题