我有一个R Dataframe ,其中包含特定的字符组合(下面的df 1)和另一个具有sampleID列表的 Dataframe 以及它们是否具有每个字符(df 2)。对于每行在df 1中,如果df 2中的sampleID包含所有条目,则尝试添加sampleID。如果sampleID具有来自一个特定组合的所有条目,则应注意sampleID,如果它有更多的条目也没关系。例如,相同的sampleID可以出现在df 1的多个行中。(在真实的的数据集中,不是df 1中的每一行都正好有3个条目。)
df1 <- data.frame(entry1 = c("A","B","C"),
entry2 = c("D","E","F"),
entry3 = c("G","H","I"))
df2 <- data.frame(sampleID = c("1001","1002","1003","1004","1005"),
"A" = c("A","0","0","A","A"),
"B" = c("B","B","B","0","0"),
"C" = c("0","0","0","C","C"),
"D" = c("D","0","D","0","0"),
"E" = c("E","E","0","0","0"),
"F" = c("0","0","0","F","F"),
"G" = c("G","0","0","G","0"),
"H" = c("H","H","H","H","0"),
"I" = c("0","0","I","O","0"))
示例输出如下所示:
df1.2 <- data.frame(entry1 = c("A","B","C"),
entry2 = c("D","E","F"),
entry3 = c("G","H","I"),
sampleID.1 = c("1001","1001",""),
sampleID.2 = c("","1002",""))
使得行/组合1由样本ID 1001实现,行/组合2由样本ID 1001和1002实现,且没有样本ID具有组合3。
我尝试用for循环迭代df 2中的行,但无法正确地将sampleID添加到df 1中。可能有一个更好的策略。我也愿意转换df。
谢谢你的建议。
2条答案
按热度按时间nwlls2ji1#
一种方法是
cross_join
数据pivot_wider
创建sampleID表rename_with
清除名称来完成vtwuwzda2#
如果您对使用循环感兴趣,可以尝试以下操作。
可以构造一个嵌套循环,遍历两个 Dataframe 的行,并将行号和
sampleID
存储在list
中,其中df1
的所有值都在df2
的行中。然后,您可以将
list
转换为matrix
,其中2列表示匹配的行号rn
和sampleID
。您可以使用pivot_wider
放入宽格式,然后连接回原始df1
数据。产出