R语言 将set1列与set2列进行比较,以返回与条件匹配的set2的set1列的名称

gab6jxml  于 2023-04-03  发布在  其他
关注(0)|答案(1)|浏览(123)

我有这样一个数据框:

df = read.table(text="ID    S   R1  R2  Sa1 Sa2 Sa4 Sa5 Sa6 Sa7 Sa8 Sa9 Sa10    Sa11
    Chr18_1635988   CC  GG  GG  CC  GG  GG  GG  CC  GG  GG  CC  GG  GG
    Chr18_1636023   AA  TT  TT  AA  TT  TT  TT  AA  AT  TT  AA  TT  TT
    Chr18_1639152   TT  CC  CC  TT  CC  CC  CC  CC  CC  CC  TT  CC  TC
    Chr18_1642235   CC  AA  AA  CC  AA  AA  CA  CC  CA  AA  CC  AA  AA
    Chr18_1643643   --  AA  CC  CC  CC  AA  AA  --  CA  CA  CA  CC  CC
    Chr18_1643660   --  CC  GG  CC  GG  CC  CC  CC  CC  CC  CC  GG  GG
    Chr18_1656020   AA  TT  TT  AA  TT  TT  TT  AA  AA  AT  AA  AA  AT
    Chr18_1657597   CC  TT  TT  CC  TT  TT  TT  CC  CC  CT  CC  TT  TT
    Chr18_1657618   GG  TT  TT  GG  TT  TT  TT  GG  GG  GT  GG  TT  TT", header=T, stringsAsFactors=F)
 set1 columns 2:4, set2 columns: the names starting with "Sa".

我想比较set 1和set 2,创建“result”行,如果满足以下条件,则将set 1的名称用于set 2:如果set 2中任何一行的值与“S”列中同一行的值匹配,而不管其他行,则将“S”列的名称放入“结果”行中相应列的位置;然后将“R1”或“R2”列与set 2中未分配的列进行比较。因此预期结果为:

final = read.table(text="ID S   R1  R2  Sa1 Sa2 Sa4 Sa5 Sa6 Sa7 Sa8 Sa9 Sa10    Sa11
    Chr18_1635988   CC  GG  GG  CC  GG  GG  GG  CC  GG  GG  CC  GG  GG
    Chr18_1636023   AA  TT  TT  AA  TT  TT  TT  AA  AT  TT  AA  TT  TT
    Chr18_1639152   TT  CC  CC  TT  CC  CC  CC  CC  CC  CC  TT  CC  TC
    Chr18_1642235   CC  AA  AA  CC  AA  AA  CA  CC  CA  AA  CC  AA  AA
    Chr18_1643643   --  AA  CC  CC  CC  AA  AA  --  CA  CA  CA  CC  CC
    Chr18_1643660   --  CC  GG  CC  GG  CC  CC  CC  CC  CC  CC  GG  GG
    Chr18_1656020   AA  TT  TT  AA  TT  TT  TT  AA  AA  AT  AA  AA  AT
    Chr18_1657597   CC  TT  TT  CC  TT  TT  TT  CC  CC  CT  CC  TT  TT
    Chr18_1657618   GG  TT  TT  GG  TT  TT  TT  GG  GG  GT  GG  TT  TT
    result              S   R2  R1  R1  S   S   R1  S   S   R2", header=T, stringsAsFactors=F)
iezvtpos

iezvtpos1#

我认为您最好尝试在一个小的帮助函数中定义逻辑

f <- \(s,r1,r2,d) {
  d_indx = d %in% c("CC","AA", "GG", "TT")
  case_when(any(d==s)~"S",
    all(d[d_indx] == r1[d_indx])~"R1",
    all(d[d_indx] == r2[d_indx])~"R2"
  )
}

现在,只需在每个Sa<x>列上调用该函数:

set2 = names(df)[grepl("^Sa\\d+",names(df))]
result = summarize(df, across(all_of(set2), ~f(S,R1,R2,.x)))

输出:

Sa1 Sa2 Sa4 Sa5 Sa6 Sa7 Sa8 Sa9 Sa10 Sa11
1   S  R2  R1  R1   S   S  R1   S    S   R2

如果您确实希望将其附加到原始帧的底部,可以执行以下操作:

bind_rows(df, result %>% mutate(ID="result")

输出:

ID    S   R1   R2 Sa1 Sa2 Sa4 Sa5 Sa6 Sa7 Sa8 Sa9 Sa10 Sa11
1  Chr18_1635988   CC   GG   GG  CC  GG  GG  GG  CC  GG  GG  CC   GG   GG
2  Chr18_1636023   AA   TT   TT  AA  TT  TT  TT  AA  AT  TT  AA   TT   TT
3  Chr18_1639152   TT   CC   CC  TT  CC  CC  CC  CC  CC  CC  TT   CC   TC
4  Chr18_1642235   CC   AA   AA  CC  AA  AA  CA  CC  CA  AA  CC   AA   AA
5  Chr18_1643643   --   AA   CC  CC  CC  AA  AA  --  CA  CA  CA   CC   CC
6  Chr18_1643660   --   CC   GG  CC  GG  CC  CC  CC  CC  CC  CC   GG   GG
7  Chr18_1656020   AA   TT   TT  AA  TT  TT  TT  AA  AA  AT  AA   AA   AT
8  Chr18_1657597   CC   TT   TT  CC  TT  TT  TT  CC  CC  CT  CC   TT   TT
9  Chr18_1657618   GG   TT   TT  GG  TT  TT  TT  GG  GG  GT  GG   TT   TT
10        result <NA> <NA> <NA>   S  R2  R1  R1   S   S  R1   S    S   R2

相关问题