valueerror:不能在python中同时指定mapper\u raw和mapper

mrphzbgm  于 2021-07-13  发布在  Hadoop
关注(0)|答案(1)|浏览(489)

我在努力读书 fna 文件 mrjob 在python中。
这是我的 load_read.py 程序中,所有的代码都可以正常工作而不用使用 mrjob .

  1. from mrjob.job import MRJob
  2. from Bio import SeqIO
  3. from Bio.Seq import Seq
  4. import re
  5. from operator import itemgetter
  6. import sys
  7. def format_read(read):
  8. z = re.split('[|={,]+', read.description)
  9. return read.seq, z[3]
  10. class LoadMetaRead(MRJob):
  11. def mapper_raw(self, file_path, file_uri):
  12. from Bio import SeqIO
  13. from Bio.Seq import Seq
  14. seqs = list(SeqIO.parse(file_path, type='fna'))
  15. is_paired_end = False
  16. if len(seqs) > 2 and seqs[0].id[-1:] != seqs[1].id[-1:]:
  17. is_paired_end = True
  18. label_list = dict()
  19. label_index = 0
  20. for i in range(0, len(seqs), 2 if is_paired_end else 1):
  21. read, label = format_read(seqs[i])
  22. if is_paired_end:
  23. read2, _ = format_read(seqs[i + 1])
  24. read += read2
  25. if label not in label_list:
  26. label_list[label] = label_index
  27. label_index += 1
  28. yield str(i), str(read), str(label_list[label])
  29. def mapper(self, _, line):
  30. yield 'read', line
  31. def reducer(self, key, values):
  32. yield key, values
  33. combiner = reducer
  34. if __name__ == '__main__':
  35. LoadMetaRead.run()

数据文件示例 R4.fna :

  1. >r1.1 |SOURCES={GI=15668172,fw,1146130-1146958}|ERRORS={52_1:A,78_1:G,78_2:G,78_3:G,641_1:G}|SOURCE_1="Methanocaldococcus jannaschii DSM 2661 chromosome" (392b1054a4bf536ea1cc349545ace50120973c3a)
  2. AAACCCTCTTCCACGAACCCTCTTGAAAATCCCCCACATCCACAAAATAAATCAAATAAATTTCA
  3. ACATTATCACCAAAAGGGTAAAAGGTTATTTAAAAAATAAAATAAATTTAAAAATTTAAATTAAA
  4. TACCAAAAAAGCCAAATAACTTATTGTGATTCTTGAGCTTTCTTTAACTCTGCCTTCATATCTTG
  5. ATAGACTTTAGTCCATTTTAATTTTCTTGGATTTCTTCCCATTCTGTAGCTTTTCTCACATTTGG
  6. ATGAGCAGAAATATAATACAGTCCCATCTTTTTCTACGACCATTTTTCCTTTTCCTGGCTCAATT
  7. TCATAACCACAAAAGCTGCATGTTCTCCATTCTGGCATAGCTATCCCCCTTTAATAGTGTTTCAG
  8. TGATTTTAAAATAATTTAAGATTAAATTATTTATCTTCTTCTGTCTAATGGTCTTGCTTCTCTCT
  9. CTGTTTCTCTTAACATAATAATGTCTCCAACTTTAACTGGACCTTTAACGTTTCTAACTAAAACT
  10. CTTCCAGTATCTTTTCCACCTAAGATTTTACATCTAACTTGTATAATTCCTCCAGTAACCCCTGT
  11. TCTACCAATGACTTCAATAACTTCAGCAGCTACTGCTTCCTTATAAACAAATTCATCTTCCGATC
  12. CTCATCACCTAATATTAATGAAGGTTTAAAATTTATAAAAAAGTTAGTAGTAGTGTTTCATAATT
  13. TATATAATAATAACTATATACTATTGATTGATGGTTAAATAGCGTTCTAATAATTTACTGCTTCA
  14. AAACATTTACCTTTTCAATTAATACCTTTAACTCTTCAGCATCTCCTTCGTTG
  15. >r2.1 |SOURCES={GI=15668172,bw,239211-239971}|ERRORS={113:-,217_1:C,281_1:G,627_1:G,717_1:T}|SOURCE_1="Methanocaldococcus jannaschii DSM 2661 chromosome" (392b1054a4bf536ea1cc349545ace50120973c3a)
  16. TAGCATGTAAATCCCTTATTTCTTAATTTCTCCCAGAATTATTTCTATTGCTTTATCAACTGCCT
  17. TGGCAACCTCTTCAGACAACCCTGGTTTTATGTCTGGCATTGTAAATTTTTACCTTGACAACCAA
  18. TAACCACGACTTCTATGCCTTTATTATGTAAATCTTTGAGAAATGGGGCTAATGGAACGTTATGG
  19. GCATCGAAAGAATATTTTTTAACTATTCGGTAATTCATCAACATCTATCTTTTTTATTGTTCCAG
  20. GTTCTAAATCAAAATCAATGGCGATCAACAACAATAATCTTTTTTATATCTTCATCAACCAACGT
  21. CATTAAATAGTATGCTCCACTTGCCCCAGCATCTATAACTTCAACGTTATCTGGCAAGTTCATTT
  22. TTTCTAATTTGCTAACAACCTCACATCCAAAGCCATCATCTCCAAACAACAGATTTCCACAACCA
  23. ACAATTAATATATCCTTCTTTTTCATTTTATCACTTATTTAGCATTTCTTTATATTTTTTAGCCT
  24. CTTCTTTAGGATTTTGTGATTGATAGATTGCCCTTCCAACAATGACGTAATCATTCTCATCTAAA
  25. ATATTTAAAATATCCTCAATCTTCCCTCCCTGAGCTCCGACTCCGTGGTGTTATTACTGGCAATT
  26. CTGCAATTTCTTTAATTTCTTTAAGCCTTTCAGGCCTTGTTGATGGAGCAACTATAGCATCAACT
  27. TTTAGTTTTTTTAGCCATCTCTGACAATTTATCTGCTATTGGCTGTAG

当我用这个命令运行程序时:

  1. python load_read.py R4.fna

它会引发以下错误:

  1. ValueError: Can't specify both mapper_raw and mapper

你知道怎么修吗?

zynd9foi

zynd9foi1#

所以我发现我不能同时定义两者 mapper_raw() 以及 mapper . 我只需要定义其中一个。我曾经 mapper_raw() 因为我读了整个文件,不是一行一行。

  1. class LoadMetaRead(MRJob):
  2. def mapper_raw(self, file_path, file_uri):
  3. from Bio import SeqIO
  4. from Bio.Seq import Seq
  5. seqs = list(SeqIO.parse(file_path, 'fasta'))
  6. is_paired_end = False
  7. if len(seqs) > 2 and seqs[0].id[-1:] != seqs[1].id[-1:]:
  8. is_paired_end = True
  9. label_list = dict()
  10. label_index = 0
  11. for i in range(0, len(seqs), 2 if is_paired_end else 1):
  12. read, label = format_read(seqs[i])
  13. if is_paired_end:
  14. read2, _ = format_read(seqs[i + 1])
  15. read += read2
  16. if label not in label_list:
  17. label_list[label] = label_index
  18. label_index += 1
  19. yield None, (str(read), str(label_list[label]))
  20. def reducer(self, key, values):
  21. for value in values:
  22. yield key, str(value)

此代码按预期工作。

展开查看全部

相关问题