我正试图取代一个字母机管局的变种,以3个字母的代码(更容易阅读)。一切工作完美,但很少错误。下面是我的代码和注解。谢谢你
x <- c("p.G12C","p.F121S","p.P124S","p.P124L","p.E13D",
"p.E203K","p.Q209P","p.Q209P","p.Q209L")
aa3 <- c("Ala", "Arg", "Asn", "Asp", "Cys", "Glu", "Gln", "Gly", "His",
"Ile", "Leu", "Lys", "Met", "Phe", "Pro", "Ser", "Thr", "Trp",
"Tyr", "Val")
aa1 <- c("A", "R", "N", "D", "C", "E", "Q", "G", "H", "I", "L", "K",
"M", "F", "P", "S", "T", "W", "Y", "V")
for (i in 1:length(aa1))
{
xy <- gsub(aa1[i],aa3[i],x,ignore.case = F)
}
输出
# Note that E, F and Q have unusual 3 letter replacement.
I could not figure out what is causing this.
xy
[1] "p.Gly12Cys" "p.Prohe121Ser" "p.Pro124Ser" "p.Pro124Leu"
"p.Glylu13Asp" "p.Glylu203Lys" "p.Glyln209Pro" "p.Glyln209Pro" "p.Glyln209Leu"
预期产量
"p.Gly12Cys" "p.Phe121Ser" "p.Pro124Ser" "p.Pro124Leu" "p.Glu13Asp"
"p.Glu203Lys" "p.Gln209Pro" "p.Gln209Pro" "p.Gln209Leu"
错误
outputs "p.Prohe121Ser"instead of "p.Phe121Ser"
"p.Glylu13Asp" instead of "p.Glu13Asp"
4条答案
按热度按时间dw1jzc5e1#
a6b3iqyw2#
我们可以使用
mgsub
更新
或者使用
gsubfn
eblbsuwk3#
下面是一个基本R解决方案:
你基本上把你的字符串分成组成部分,例如。
"p.Q209L"
分为p.
、Q
、209
和L
。然后,使用参考向量将氨基酸的一个字母表示与它们的3个字母版本交换,或者使用Akrun的方法,可以去掉ref[x]
(和两个额外的行!),并使用aa3[match(x, aa1)]
代替。然后把东西粘回去yb3bgrhw4#