clin.info$Sample.ID
有重复项。如果有多对重复项,我希望只取第一对。
n_occur <- data.frame(table(clin.info$Sample.ID))
multiple.duplicates <- n_occur[n_occur$Freq > 2,]
if(multiple.duplicates$Var1 %in% clin.info$Sample.ID){
clin.info <- clin.info %>%
group_by(Sample.ID) %>%
distinct
}
追溯:
Error in if (multiple.duplicates$Var1 %in% clin.info$Sample.ID) { :
argument is of length zero
数据:
> dput(clin.info)
structure(list(Sample.ID = c("TCGA.B2.3924.01", "TCGA.B2.3924.01",
"TCGA.B2.3924.01", "TCGA.B2.3924.01", "TCGA.B2.5635.01", "TCGA.B2.5635.01",
"TCGA.B2.5635.01", "TCGA.B2.5635.01", "TCGA.B2.5635.01", "TCGA.B2.5635.01",
"TCGA.A3.3357.01", "TCGA.A3.3357.01", "TCGA.A3.3367.01", "TCGA.A3.3367.01",
"TCGA.A3.3387.01", "TCGA.A3.3387.01", "TCGA.B0.4698.01", "TCGA.B0.4698.01",
"TCGA.B0.4710.01", "TCGA.B0.4710.01"), age = c("73", "73", "73",
"73", "74", "74", "74", "74", "74", "74", "62", "62", "72", "72",
"49", "49", "75", "75", "75", "75")), row.names = c(67L, 68L,
69L, 70L, 71L, 72L, 73L, 74L, 75L, 76L, 1L, 2L, 3L, 4L, 5L, 6L,
7L, 8L, 9L, 10L), class = "data.frame")
> dput(multiple.duplicates)
structure(list(Var1 = structure(6:7, levels = c("TCGA.A3.3357.01",
"TCGA.A3.3367.01", "TCGA.A3.3387.01", "TCGA.B0.4698.01", "TCGA.B0.4710.01",
"TCGA.B2.3924.01", "TCGA.B2.5635.01"), class = "factor"), Freq = c(4L,
6L)), row.names = 6:7, class = "data.frame")
预期输出:
基于multiple.duplicates
,有两个Sample.ID
值,其中有多个重复值。
因此,对于这两个Sample.ID
,仅保留clin.info
中的第一组副本。
2条答案
按热度按时间dz6r00yl1#
创建于2023-05-28带有reprex v2.0.2
输入数据:
xuo3flqw2#
你可以使用下面的代码: