我正在做一个项目,正在寻找一些帮助来使我的代码更有效地运行。我已经搜索了类似的问题,但似乎找不到任何像这个问题一样精细的问题。我提出的解决方案非常笨拙。我相信一定有一种更有效的方法来解决这个问题,比如dplyr
,data.tables
等。
**问题:**我有3列数据,'ids'
,'x.group'
和'times'
。我需要为每个'x.group'
提取每个'times'
块中出现的前3个唯一的'ids'
。
然而,我不想包含任何等于“0”的'ids'
或'x.group'
。我代码底部的输出会产生正确的值,但在我看来,这是一种相当尴尬的方式。
注意:在下面的代码示例中,我使用的是x.groups = ['A','B','0']
,但在我的实际项目中,这些可以接受许多值,因此它们不会总是'A'或'B',但'0'总是存在的(例如,我可以使用['A','K','0']
或['M','W','0']
等)。您可以在本文的底部找到示例数据集。
# find x.groups
xs <- unique(myDF$x.group)[unique(myDF$x.group) != "0"]
# DF without '0's as x.group entries
ps <- unique(myDF[which(myDF$x.group %in% xs) , c("ids","x.group","time")])
first3.x1.t1 <- ps[ps$x.group == xs[1] & ps$ids != "0" & ps$time == "1", ]$ids[1:3]
first3.x2.t1 <- ps[ps$x.group == xs[2] & ps$ids != "0" & ps$time == "1", ]$ids[1:3]
first3.x1.t2 <- ps[ps$x.group == xs[1] & ps$ids != "0" & ps$time == "2", ]$ids[1:3]
first3.x2.t2 <- ps[ps$x.group == xs[2] & ps$ids != "0" & ps$time == "2", ]$ids[1:3]
first3.x1.t3 <- ps[ps$x.group == xs[1] & ps$ids != "0" & ps$time == "3", ]$ids[1:3]
first3.x2.t3 <- ps[ps$x.group == xs[2] & ps$ids != "0" & ps$time == "3", ]$ids[1:3]
# First 3 unique ids from time block 1 for each x.group
> first3.x1.t1; first3.x2.t1;
[1] "2" "17" "11"
[1] "5" "10" "4"
# First 3 unique ids from time block 2 for each x.group
> first3.x1.t2; first3.x2.t2;
[1] "9" "6" "16"
[1] "8" "13" "7"
# First 3 unique ids from time block 3 for each x.group
> first3.x1.t3; first3.x2.t3;
[1] "11" "2" "10"
[1] "1" "3" "13"
数据:
# create data frame
ids <- c("2","0","15","5","17","10","4","2","3","11","11","18","10","8","13","9","6","16","7","14",
"16","7","11","12","14","5","1","11","3","2","10","17","3","13","10","17","2","10","16","10")
x.group <- c("A","A","0","B","A","B","B","A","B","A","A","0","B","B","B","A","A","A","B","B",
"A","A","0","B","A","B","B","A","B","A","A","0","B","B","B","A","A","A","B","B")
time <- c(rep("1",13), rep("2",13), rep("3",14))
myDF <- as.data.frame(cbind(ids, x.group, time), stringsAsFactors = FALSE)
> myDF
ids x.group time
1 2 A 1
2 0 A 1
3 15 0 1
4 5 B 1
5 17 A 1
6 10 B 1
7 4 B 1
8 2 A 1
9 3 B 1
10 11 A 1
11 11 A 1
12 18 0 1
13 10 B 1
14 8 B 2
15 13 B 2
16 9 A 2
17 6 A 2
18 16 A 2
19 7 B 2
20 14 B 2
21 16 A 2
22 7 A 2
23 11 0 2
24 12 B 2
25 14 A 2
26 5 B 2
27 1 B 3
28 11 A 3
29 3 B 3
30 2 A 3
31 10 A 3
32 17 0 3
33 3 B 3
34 13 B 3
35 10 B 3
36 17 A 3
37 2 A 3
38 10 A 3
39 16 B 3
40 10 B 3
3条答案
按热度按时间u7up0aaq1#
这返回了一个嵌套的dataframe。您可以取消嵌套如下:
bnl4lu3b2#
yiytaume3#
这是
data.table
解决方案,我认为它应该是最快的;通过避免为每个组调用.SD
,可以更快。