我想知道是否有一种方法可以基于两个变量对数据框进行分组,然后在分组数据中不存在某个字符串(如名称)时添加新行。
我有一个虚拟名称列表(name_list
),我希望确保将其包含在分组数据中,而不管它们是否首先出现在那里(基于name
列)。例如,我想按date
和drill
分组,但您会注意到2023-01-01
/Activity 1
中缺少Person A
和Person B
。我想确保date
/drill
的每个组合都包含了name_list
中的每个名称,如果添加了名称,则将它们分配为0的duration
,以基本上表示它们不存在于date
/drill
组合中。希望你说得通。谢谢.
library(tidyverse)
# List of names to be joined within grouping variables.
name_list <- c(paste("Person", LETTERS[1:8]))
set.seed(10)
name <- c(name_list[3:8], name_list[1:6], name_list[2:7], name_list[3:8])
date <- rep(seq(as.Date('2023/01/01'), as.Date('2023/01/02'), by = "day"),
each = 12)
drill <- rep(paste("Activity", 1:2), each = 6, times = 2)
duration <- rep(c(5, 8), each = 6, times = 2)
df <- data.frame(name, date, drill, duration)
name date drill duration
1 Person C 2023-01-01 Activity 1 5
2 Person D 2023-01-01 Activity 1 5
3 Person E 2023-01-01 Activity 1 5
4 Person F 2023-01-01 Activity 1 5
5 Person G 2023-01-01 Activity 1 5
6 Person H 2023-01-01 Activity 1 5
7 Person A 2023-01-01 Activity 2 8
8 Person B 2023-01-01 Activity 2 8
9 Person C 2023-01-01 Activity 2 8
10 Person D 2023-01-01 Activity 2 8
11 Person E 2023-01-01 Activity 2 8
12 Person F 2023-01-01 Activity 2 8
13 Person B 2023-01-02 Activity 1 5
14 Person C 2023-01-02 Activity 1 5
15 Person D 2023-01-02 Activity 1 5
16 Person E 2023-01-02 Activity 1 5
17 Person F 2023-01-02 Activity 1 5
18 Person G 2023-01-02 Activity 1 5
19 Person C 2023-01-02 Activity 2 8
20 Person D 2023-01-02 Activity 2 8
21 Person E 2023-01-02 Activity 2 8
22 Person F 2023-01-02 Activity 2 8
23 Person G 2023-01-02 Activity 2 8
24 Person H 2023-01-02 Activity 2 8
1条答案
按热度按时间xwbd5t1u1#
我想你正在寻找
complete()
功能。编辑:
正如@LMc所提到的,只有当
name
的所有值都已经存在于数据中时,使用complete
才有效。将name
列设置为包含所有可能名称name_list
的因子可以解决此问题。