python-3.x Pandas:为多维键建立索引

ncgqoxb0  于 2023-02-01  发布在  Python
关注(0)|答案(2)|浏览(147)

我正在使用一个CSV文件,如下所示:

  1. strain contig homology
  2. 1A42 ctg.s1.000000F Chr
  3. 1A42 ctg.s1.000001F pSymA
  4. 1A42 ctg.s1.3 pSymB
  5. 1A42 ctg.s2.000000F Other
  6. 4B41 ctg.s1.000000F Chr
  7. 4B41 ctg.s1.3 pSymA
  8. 4B41 ctg.s1.1 pSymB
  9. 7B22 ctg.s2.12 other
  10. 7B22 ctg.s1.000000F Chr
  11. 7B22 ctg.s1.3 pSymA
  12. 7B22 ctg.s1.1 pSymB
  13. 8A52 ctg.s1.0 pSymB
  14. 8A52 ctg.s1.4 Chr
  15. 8A52 ctg.s1.2 pSymA

contig列中,一些字符串在strain列的不同菌株之间重复,例如1A424B417B22中存在ctg.s1.000000F
我编写了下面几行代码来定义一个函数,在该函数中,给定菌株名称和CSV文件作为输入,它将在homology列中打印出每个contig值的相应值:

  1. def myfunction(strain, csv):
  2. with open(csv, 'r') as h:
  3. h_df = pd.read_csv(h, index_col=False, dtype='unicode', on_bad_lines='skip', sep=";")
  4. match = h_df.loc[(h_df == strain).any(1), 'contig']
  5. for element in match:
  6. contig = h_df.loc[(h_df == element).any(1), 'homology']
  7. print(element, contig)
  8. myfunction(1A42, mycsv)

它实际上可以工作,但是返回了整列的同源性值,并且只返回了与“1A42”相关的值。
我该怎么做?谢谢。

8hhllhi2

8hhllhi21#

尝试以下(更简单的)方法:

  1. def myfunction(strain, csv):
  2. with open(csv, 'r') as h:
  3. h_df = pd.read_csv(h, index_col=False, dtype='unicode', on_bad_lines='skip', sep=";")
  4. print(h_df[h_df['strain'] == strain][['homology', 'contig']])
w1e3prcc

w1e3prcc2#

如果只想打印与感兴趣菌株对应的同源性值,可以使用以下代码:

  1. def myfunction(strain, csv):
  2. with open(csv, 'r') as h:
  3. h_df = pd.read_csv(h, index_col=False, dtype='unicode', on_bad_lines='skip', sep=";")
  4. match = h_df.loc[(h_df['strain'] == strain), 'contig']
  5. for element in match:
  6. contig = h_df.loc[(h_df['contig'] == element) & (h_df['strain'] == strain), 'homology']
  7. print(element, contig.iloc[0])
  8. myfunction('1A42', 'mycsv')

相关问题