如何在R中将新行添加到分组数据框中?

xvw2m8pv  于 2023-10-13  发布在  其他
关注(0)|答案(1)|浏览(152)

我想知道是否有一种方法可以基于两个变量对数据框进行分组,然后在分组数据中不存在某个字符串(如名称)时添加新行。
我有一个虚拟名称列表(name_list),我希望确保将其包含在分组数据中,而不管它们是否首先出现在那里(基于name列)。例如,我想按datedrill分组,但您会注意到2023-01-01/Activity 1中缺少Person APerson B。我想确保date/drill的每个组合都包含了name_list中的每个名称,如果添加了名称,则将它们分配为0的duration,以基本上表示它们不存在于date/drill组合中。希望你说得通。谢谢.

  1. library(tidyverse)
  2. # List of names to be joined within grouping variables.
  3. name_list <- c(paste("Person", LETTERS[1:8]))
  4. set.seed(10)
  5. name <- c(name_list[3:8], name_list[1:6], name_list[2:7], name_list[3:8])
  6. date <- rep(seq(as.Date('2023/01/01'), as.Date('2023/01/02'), by = "day"),
  7. each = 12)
  8. drill <- rep(paste("Activity", 1:2), each = 6, times = 2)
  9. duration <- rep(c(5, 8), each = 6, times = 2)
  10. df <- data.frame(name, date, drill, duration)
  11. name date drill duration
  12. 1 Person C 2023-01-01 Activity 1 5
  13. 2 Person D 2023-01-01 Activity 1 5
  14. 3 Person E 2023-01-01 Activity 1 5
  15. 4 Person F 2023-01-01 Activity 1 5
  16. 5 Person G 2023-01-01 Activity 1 5
  17. 6 Person H 2023-01-01 Activity 1 5
  18. 7 Person A 2023-01-01 Activity 2 8
  19. 8 Person B 2023-01-01 Activity 2 8
  20. 9 Person C 2023-01-01 Activity 2 8
  21. 10 Person D 2023-01-01 Activity 2 8
  22. 11 Person E 2023-01-01 Activity 2 8
  23. 12 Person F 2023-01-01 Activity 2 8
  24. 13 Person B 2023-01-02 Activity 1 5
  25. 14 Person C 2023-01-02 Activity 1 5
  26. 15 Person D 2023-01-02 Activity 1 5
  27. 16 Person E 2023-01-02 Activity 1 5
  28. 17 Person F 2023-01-02 Activity 1 5
  29. 18 Person G 2023-01-02 Activity 1 5
  30. 19 Person C 2023-01-02 Activity 2 8
  31. 20 Person D 2023-01-02 Activity 2 8
  32. 21 Person E 2023-01-02 Activity 2 8
  33. 22 Person F 2023-01-02 Activity 2 8
  34. 23 Person G 2023-01-02 Activity 2 8
  35. 24 Person H 2023-01-02 Activity 2 8
xwbd5t1u

xwbd5t1u1#

我想你正在寻找complete()功能。

编辑:

正如@LMc所提到的,只有当name的所有值都已经存在于数据中时,使用complete才有效。将name列设置为包含所有可能名称name_list的因子可以解决此问题。

  1. df %>%
  2. mutate(name = factor(name, levels = name_list)) %>%
  3. complete(name, date, drill, fill = list(duration = 0))
  4. # A tibble: 32 × 4
  5. name date drill duration
  6. <fct> <date> <chr> <dbl>
  7. 1 Person A 2023-01-01 Activity 1 0
  8. 2 Person A 2023-01-01 Activity 2 8
  9. 3 Person A 2023-01-02 Activity 1 0
  10. 4 Person A 2023-01-02 Activity 2 0
  11. 5 Person B 2023-01-01 Activity 1 0
  12. 6 Person B 2023-01-01 Activity 2 8
  13. 7 Person B 2023-01-02 Activity 1 5
  14. 8 Person B 2023-01-02 Activity 2 0
  15. 9 Person C 2023-01-01 Activity 1 5
  16. 10 Person C 2023-01-01 Activity 2 8
  17. 11 Person C 2023-01-02 Activity 1 5
  18. 12 Person C 2023-01-02 Activity 2 8
  19. 13 Person D 2023-01-01 Activity 1 5
  20. 14 Person D 2023-01-01 Activity 2 8
  21. 15 Person D 2023-01-02 Activity 1 5
  22. 16 Person D 2023-01-02 Activity 2 8
  23. 17 Person E 2023-01-01 Activity 1 5
  24. 18 Person E 2023-01-01 Activity 2 8
  25. 19 Person E 2023-01-02 Activity 1 5
  26. 20 Person E 2023-01-02 Activity 2 8
  27. 21 Person F 2023-01-01 Activity 1 5
  28. 22 Person F 2023-01-01 Activity 2 8
  29. 23 Person F 2023-01-02 Activity 1 5
  30. 24 Person F 2023-01-02 Activity 2 8
  31. 25 Person G 2023-01-01 Activity 1 5
  32. 26 Person G 2023-01-01 Activity 2 0
  33. 27 Person G 2023-01-02 Activity 1 5
  34. 28 Person G 2023-01-02 Activity 2 8
  35. 29 Person H 2023-01-01 Activity 1 5
  36. 30 Person H 2023-01-01 Activity 2 0
  37. 31 Person H 2023-01-02 Activity 1 0
  38. 32 Person H 2023-01-02 Activity 2 8
展开查看全部

相关问题