python-3.x Pandas:为多维键建立索引

ncgqoxb0  于 2023-02-01  发布在  Python
关注(0)|答案(2)|浏览(121)

我正在使用一个CSV文件,如下所示:

strain  contig          homology
1A42    ctg.s1.000000F  Chr
1A42    ctg.s1.000001F  pSymA
1A42    ctg.s1.3        pSymB
1A42    ctg.s2.000000F  Other
4B41    ctg.s1.000000F  Chr
4B41    ctg.s1.3        pSymA
4B41    ctg.s1.1        pSymB
7B22    ctg.s2.12       other
7B22    ctg.s1.000000F  Chr
7B22    ctg.s1.3        pSymA
7B22    ctg.s1.1        pSymB
8A52    ctg.s1.0        pSymB
8A52    ctg.s1.4        Chr
8A52    ctg.s1.2        pSymA

contig列中,一些字符串在strain列的不同菌株之间重复,例如1A424B417B22中存在ctg.s1.000000F
我编写了下面几行代码来定义一个函数,在该函数中,给定菌株名称和CSV文件作为输入,它将在homology列中打印出每个contig值的相应值:

def myfunction(strain, csv):
    with open(csv, 'r') as h:
        h_df = pd.read_csv(h, index_col=False, dtype='unicode', on_bad_lines='skip', sep=";")
        match = h_df.loc[(h_df == strain).any(1), 'contig']
        for element in match:
            contig = h_df.loc[(h_df == element).any(1), 'homology']
            print(element, contig)

myfunction(1A42, mycsv)

它实际上可以工作,但是返回了整列的同源性值,并且只返回了与“1A42”相关的值。
我该怎么做?谢谢。

8hhllhi2

8hhllhi21#

尝试以下(更简单的)方法:

def myfunction(strain, csv):
    with open(csv, 'r') as h:
        h_df = pd.read_csv(h, index_col=False, dtype='unicode', on_bad_lines='skip', sep=";")
    print(h_df[h_df['strain'] == strain][['homology', 'contig']])
w1e3prcc

w1e3prcc2#

如果只想打印与感兴趣菌株对应的同源性值,可以使用以下代码:

def myfunction(strain, csv):
    with open(csv, 'r') as h:
        h_df = pd.read_csv(h, index_col=False, dtype='unicode', on_bad_lines='skip', sep=";")
        match = h_df.loc[(h_df['strain'] == strain), 'contig']
        for element in match:
            contig = h_df.loc[(h_df['contig'] == element) & (h_df['strain'] == strain), 'homology']
            print(element, contig.iloc[0])

myfunction('1A42', 'mycsv')

相关问题