python-3.x Pandas：为多维键建立索引

ncgqoxb0 于 2023-02-01 发布在 Python

关注(0)|答案(2)|浏览(147)

我正在使用一个CSV文件，如下所示：

strain  contig          homology
1A42    ctg.s1.000000F  Chr
1A42    ctg.s1.000001F  pSymA
1A42    ctg.s1.3        pSymB
1A42    ctg.s2.000000F  Other
4B41    ctg.s1.000000F  Chr
4B41    ctg.s1.3        pSymA
4B41    ctg.s1.1        pSymB
7B22    ctg.s2.12       other
7B22    ctg.s1.000000F  Chr
7B22    ctg.s1.3        pSymA
7B22    ctg.s1.1        pSymB
8A52    ctg.s1.0        pSymB
8A52    ctg.s1.4        Chr
8A52    ctg.s1.2        pSymA

在contig列中，一些字符串在strain列的不同菌株之间重复，例如1A42、4B41和7B22中存在ctg.s1.000000F。
我编写了下面几行代码来定义一个函数，在该函数中，给定菌株名称和CSV文件作为输入，它将在homology列中打印出每个contig值的相应值：

def myfunction(strain, csv):
    with open(csv, 'r') as h:
        h_df = pd.read_csv(h, index_col=False, dtype='unicode', on_bad_lines='skip', sep=";")
        match = h_df.loc[(h_df == strain).any(1), 'contig']
        for element in match:
            contig = h_df.loc[(h_df == element).any(1), 'homology']
            print(element, contig)
myfunction(1A42, mycsv)

它实际上可以工作，但是返回了整列的同源性值，并且只返回了与“1A42”相关的值。
我该怎么做？谢谢。

python-3.x

来源：https://stackoverflow.com/questions/75286529/pandas-indexing-a-multidimensional-key

2条答案

按热度按时间

8hhllhi21#

尝试以下（更简单的）方法：

def myfunction(strain, csv):
    with open(csv, 'r') as h:
        h_df = pd.read_csv(h, index_col=False, dtype='unicode', on_bad_lines='skip', sep=";")
    print(h_df[h_df['strain'] == strain][['homology', 'contig']])

赞(0）回复(0）举报 2023-02-01

w1e3prcc2#

如果只想打印与感兴趣菌株对应的同源性值，可以使用以下代码：

def myfunction(strain, csv):
    with open(csv, 'r') as h:
        h_df = pd.read_csv(h, index_col=False, dtype='unicode', on_bad_lines='skip', sep=";")
        match = h_df.loc[(h_df['strain'] == strain), 'contig']
        for element in match:
            contig = h_df.loc[(h_df['contig'] == element) & (h_df['strain'] == strain), 'homology']
            print(element, contig.iloc[0])
myfunction('1A42', 'mycsv')

赞(0）回复(0）举报 2023-02-01

我来回答

python-3.x Pandas：为多维键建立索引

2条答案

相关问题

热门标签

最新问答