R语言 填充缺失行中的空格

hgqdbh6s  于 2023-01-28  发布在  其他
关注(0)|答案(1)|浏览(219)

我有以下数据集:

name = c("john", "john", "john", "sarah", "sarah", "peter", "peter", "peter", "peter")
year = c(2010, 2011, 2014, 2010, 2015, 2011, 2012, 2013, 2015)
age = c(21, 22, 25, 55, 60, 61, 62, 63, 65)
gender = c("male", "male", "male", "female", "female", "male", "male", "male", "male" )
country_of_birth = c("australia", "australia", "australia", "uk", "uk", "mexico", "mexico", "mexico", "mexico")

my_data = data.frame(name, year, age, gender, country_of_birth)

  name year age gender country_of_birth
1  john 2010  21   male        australia
2  john 2011  22   male        australia
3  john 2014  25   male        australia
4 sarah 2010  55 female               uk
5 sarah 2015  60 female               uk
6 peter 2011  61   male           mexico
7 peter 2012  62   male           mexico
8 peter 2013  63   male           mexico
9 peter 2015  65   male           mexico

我们在这里可以看到,这个数据集中有些人漏掉了"年份",假设一个人对应的第一行是最早的年份,最后一行是最大的年份。

    • 对于此数据集中的每个人-我希望在缺失行之间"填充"。**例如-在每个缺失行中:
  • 我希望"年龄"变量增加1(例如,在2012年,约翰应该是23岁-在2012年,约翰应该是24岁)
  • 我希望"性别"变量保持不变
  • 我希望"country_of_birth"变量保持不变

下面是我使用的R代码:

library(tidyr)
library(dplyr)
my_data %>% 
    group_by(name) %>% 
    complete(year = full_seq(year, period = 1)) %>% 
    fill(year, age, gender, country_of_birth, .direction = "downup") %>%
    mutate(real_age= age - (row_number() - 1)) %>%
    ungroup

这段代码运行后似乎添加了缺失的行-但是没有正确添加age变量:

# A tibble: 16 x 6
   name   year   age gender country_of_birth real_age
   <chr> <dbl> <dbl> <chr>  <chr>               <dbl>
 1 john   2010    21 male   australia              21
 2 john   2011    22 male   australia              21
 3 john   2012    22 male   australia              20
 4 john   2013    22 male   australia              19
 5 john   2014    25 male   australia              21
 6 peter  2011    61 male   mexico                 61
 7 peter  2012    62 male   mexico                 61
 8 peter  2013    63 male   mexico                 61
 9 peter  2014    63 male   mexico                 60
10 peter  2015    65 male   mexico                 61
11 sarah  2010    55 female uk                     55
12 sarah  2011    55 female uk                     54
13 sarah  2012    55 female uk                     53
14 sarah  2013    55 female uk                     52
15 sarah  2014    55 female uk                     51
16 sarah  2015    60 female uk                     55

目前,我正试图通过尝试mutate(real_age= age - (row_number() - 1))的不同组合来解决这个问题-但到目前为止,似乎没有任何效果。
"有人能告诉我怎么修吗
谢谢!

s4n0splo

s4n0splo1#

一种方法是:

library(dplyr)
library(tidyr)

my_data %>% 
  group_by(name) %>% 
  complete(year = first(year): last(year)) %>% 
  mutate(age = ifelse(is.na(age), first(age)+row_number()-1,age)) %>% 
  fill(c(gender, country_of_birth), .direction = "down")
name   year   age gender country_of_birth
   <chr> <dbl> <dbl> <chr>  <chr>           
 1 john   2010    21 male   australia       
 2 john   2011    22 male   australia       
 3 john   2012    23 male   australia       
 4 john   2013    24 male   australia       
 5 john   2014    25 male   australia       
 6 peter  2011    61 male   mexico          
 7 peter  2012    62 male   mexico          
 8 peter  2013    63 male   mexico          
 9 peter  2014    64 male   mexico          
10 peter  2015    65 male   mexico          
11 sarah  2010    55 female uk              
12 sarah  2011    56 female uk              
13 sarah  2012    57 female uk              
14 sarah  2013    58 female uk              
15 sarah  2014    59 female uk              
16 sarah  2015    60 female uk

相关问题