R语言 填充缺失行中的空格

hgqdbh6s  于 2023-01-28  发布在  其他
关注(0)|答案(1)|浏览(237)

我有以下数据集:

  1. name = c("john", "john", "john", "sarah", "sarah", "peter", "peter", "peter", "peter")
  2. year = c(2010, 2011, 2014, 2010, 2015, 2011, 2012, 2013, 2015)
  3. age = c(21, 22, 25, 55, 60, 61, 62, 63, 65)
  4. gender = c("male", "male", "male", "female", "female", "male", "male", "male", "male" )
  5. country_of_birth = c("australia", "australia", "australia", "uk", "uk", "mexico", "mexico", "mexico", "mexico")
  6. my_data = data.frame(name, year, age, gender, country_of_birth)
  7. name year age gender country_of_birth
  8. 1 john 2010 21 male australia
  9. 2 john 2011 22 male australia
  10. 3 john 2014 25 male australia
  11. 4 sarah 2010 55 female uk
  12. 5 sarah 2015 60 female uk
  13. 6 peter 2011 61 male mexico
  14. 7 peter 2012 62 male mexico
  15. 8 peter 2013 63 male mexico
  16. 9 peter 2015 65 male mexico

我们在这里可以看到,这个数据集中有些人漏掉了"年份",假设一个人对应的第一行是最早的年份,最后一行是最大的年份。

    • 对于此数据集中的每个人-我希望在缺失行之间"填充"。**例如-在每个缺失行中:
  • 我希望"年龄"变量增加1(例如,在2012年,约翰应该是23岁-在2012年,约翰应该是24岁)
  • 我希望"性别"变量保持不变
  • 我希望"country_of_birth"变量保持不变

下面是我使用的R代码:

  1. library(tidyr)
  2. library(dplyr)
  3. my_data %>%
  4. group_by(name) %>%
  5. complete(year = full_seq(year, period = 1)) %>%
  6. fill(year, age, gender, country_of_birth, .direction = "downup") %>%
  7. mutate(real_age= age - (row_number() - 1)) %>%
  8. ungroup

这段代码运行后似乎添加了缺失的行-但是没有正确添加age变量:

  1. # A tibble: 16 x 6
  2. name year age gender country_of_birth real_age
  3. <chr> <dbl> <dbl> <chr> <chr> <dbl>
  4. 1 john 2010 21 male australia 21
  5. 2 john 2011 22 male australia 21
  6. 3 john 2012 22 male australia 20
  7. 4 john 2013 22 male australia 19
  8. 5 john 2014 25 male australia 21
  9. 6 peter 2011 61 male mexico 61
  10. 7 peter 2012 62 male mexico 61
  11. 8 peter 2013 63 male mexico 61
  12. 9 peter 2014 63 male mexico 60
  13. 10 peter 2015 65 male mexico 61
  14. 11 sarah 2010 55 female uk 55
  15. 12 sarah 2011 55 female uk 54
  16. 13 sarah 2012 55 female uk 53
  17. 14 sarah 2013 55 female uk 52
  18. 15 sarah 2014 55 female uk 51
  19. 16 sarah 2015 60 female uk 55

目前,我正试图通过尝试mutate(real_age= age - (row_number() - 1))的不同组合来解决这个问题-但到目前为止,似乎没有任何效果。
"有人能告诉我怎么修吗
谢谢!

s4n0splo

s4n0splo1#

一种方法是:

  1. library(dplyr)
  2. library(tidyr)
  3. my_data %>%
  4. group_by(name) %>%
  5. complete(year = first(year): last(year)) %>%
  6. mutate(age = ifelse(is.na(age), first(age)+row_number()-1,age)) %>%
  7. fill(c(gender, country_of_birth), .direction = "down")
  1. name year age gender country_of_birth
  2. <chr> <dbl> <dbl> <chr> <chr>
  3. 1 john 2010 21 male australia
  4. 2 john 2011 22 male australia
  5. 3 john 2012 23 male australia
  6. 4 john 2013 24 male australia
  7. 5 john 2014 25 male australia
  8. 6 peter 2011 61 male mexico
  9. 7 peter 2012 62 male mexico
  10. 8 peter 2013 63 male mexico
  11. 9 peter 2014 64 male mexico
  12. 10 peter 2015 65 male mexico
  13. 11 sarah 2010 55 female uk
  14. 12 sarah 2011 56 female uk
  15. 13 sarah 2012 57 female uk
  16. 14 sarah 2013 58 female uk
  17. 15 sarah 2014 59 female uk
  18. 16 sarah 2015 60 female uk
展开查看全部

相关问题