R语言 创建从0到表的变量值的新列

nfeuvbwi  于 2023-04-03  发布在  其他
关注(0)|答案(5)|浏览(122)

**可复制的tibble:**我有一个类似于下面所示的数据库。不同的是,我使用的数据库要大得多。

general_tibble <- tibble(gender = c("female", "female", "male"),
                             age = c(18, 19,18),
                             age_partner = c(22,20,17),
                             max_age = c(60, 60, 65), 
                             nrs =c(42,41,47))

general_tibble导致:

gender age age_partner max_age nrs
1 female  18          22      60  42
2 female  19          20      60  41
3   male  18          17      65  47

**问题:**如何从上一个表创建一个新表,该表接受nrs的值,并创建一个名为n的列变量,该变量从0到nrs中的值?

为了进一步说明,在general_tibble的第1行中,列nrs等于42,因此列将从0到42,在第2行中,nrs等于41,因此列将从0到41,并且对于第3行也是如此。
我目前正在使用下面的代码。它可以工作,但是当general_tibble太大时,代码执行得非常慢。

general_list <- list()

for(i in 1:NROW(general_tibble)){
  general_list[[i]] <- data.frame(general_tibble[i, ], 
                             n = 0:general_tibble[[i, "nrs"]])
}

则我bind_rows()general_list得到general_binded

general_binded <- bind_rows(general_list)

general_binded[c(1:5, 38:42),]结果:

gender age age_partner max_age nrs  n
1  female  18          22      60  42  0
2  female  18          22      60  42  1
3  female  18          22      60  42  2
4  female  18          22      60  42  3
5  female  18          22      60  42  4
38 female  18          22      60  42 37
39 female  18          22      60  42 38
40 female  18          22      60  42 39
41 female  18          22      60  42 40
42 female  18          22      60  42 41

**附言:**在for循环中,我使用data.frame()而不是tibble(),因为我想循环使用行。如果你有一些关于tibble或dataframe的建议,我很乐意接受。

hlswsv35

hlswsv351#

最简单的方法是使用tidyr::expand()函数在nrs列上扩展general_tibble

library(tidyverse)

general_tibble %>% 
        group_by_all()%>% 
        expand(n = 0:nrs)

#> # A tibble: 133 x 6
#> # Groups:   gender, age, age_partner, max_age, nrs [3]
#>    gender   age age_partner max_age   nrs     n
#>    <chr>  <dbl>       <dbl>   <dbl> <dbl> <int>
#>  1 female    18          22      60    42     0
#>  2 female    18          22      60    42     1
#>  3 female    18          22      60    42     2
#>  4 female    18          22      60    42     3
#>  5 female    18          22      60    42     4
#>  6 female    18          22      60    42     5
#>  7 female    18          22      60    42     6
#>  8 female    18          22      60    42     7
#>  9 female    18          22      60    42     8
#> 10 female    18          22      60    42     9
#> # ... with 123 more rows

创建于2019-05-21由reprex package(v0.2.1)
只使用 * base R * 函数的另一个想法:

expanded_vars <- do.call(rbind,lapply(general_tibble$nrs, 
                                              function(x) expand.grid(x, 0:x)))
names(expanded_vars) <- c("nrs", "n")

merge(y = expanded_vars, x = general_tibble, by = "nrs", all = TRUE)
h22fl7wq

h22fl7wq2#

使用data.tabletidyverse的一个好处是,你不需要考虑你所做的操作是mutateexpand还是summarize,你只需要把你想要的放在df[i, j, k]j部分,不管解析成多少行,你都得到了。

library(data.table)
setDT(general_tibble)

general_tibble[, .(n = seq(0, nrs))
               , by = names(general_tibble)]

#      gender age age_partner max_age nrs  n
#   1: female  18          22      60  42  0
#   2: female  18          22      60  42  1
#   3: female  18          22      60  42  2
#   4: female  18          22      60  42  3
#   5: female  18          22      60  42  4
#  ---                                      
# 129:   male  18          17      65  47 43
# 130:   male  18          17      65  47 44
# 131:   male  18          17      65  47 45
# 132:   male  18          17      65  47 46
# 133:   male  18          17      65  47 47
deikduxw

deikduxw3#

我们可以使用uncount

library(tidyverse)
general_tibble %>% 
   mutate(grp = row_number(), nrsN = nrs + 1) %>% 
   uncount(nrsN) %>%
   group_by(grp) %>% 
   mutate(n = row_number() - 1) %>%
   ungroup %>%
   select(-grp)
# A tibble: 133 x 6
#   gender   age age_partner max_age   nrs     n
#   <chr>  <dbl>       <dbl>   <dbl> <dbl> <dbl>
# 1 female    18          22      60    42     0
# 2 female    18          22      60    42     1
# 3 female    18          22      60    42     2
# 4 female    18          22      60    42     3
# 5 female    18          22      60    42     4
# 6 female    18          22      60    42     5
# 7 female    18          22      60    42     6
# 8 female    18          22      60    42     7
# 9 female    18          22      60    42     8
#10 female    18          22      60    42     9
# … with 123 more rows

另一个选项是unnest

general_tibble %>% 
   mutate(n = map(nrs+1, ~  seq(.x) - 1)) %>%
   unnest
bvuwiixz

bvuwiixz4#

使用dplyrtidyr,您还可以:

general_tibble %>%
 group_by(rowid = row_number()) %>%
 mutate(n = nrs) %>%
 complete(n = seq(0, n, 1)) %>%
 fill(everything(), .direction = "up") %>%
 ungroup() %>%
 select(-rowid)

       n gender   age age_partner max_age   nrs
   <dbl> <chr>  <dbl>       <dbl>   <dbl> <dbl>
 1     0 female    18          22      60    42
 2     1 female    18          22      60    42
 3     2 female    18          22      60    42
 4     3 female    18          22      60    42
 5     4 female    18          22      60    42
 6     5 female    18          22      60    42
 7     6 female    18          22      60    42
 8     7 female    18          22      60    42
 9     8 female    18          22      60    42
10     9 female    18          22      60    42
8yparm6h

8yparm6h5#

一种方法与基地R(减去tibble包)。
首先,按nrs组拆分。其次,按nrs值扩展每个 Dataframe 的行。第三,创建一个表示0的id列:无论行数是多少。第四,将其恢复为tibble

library(tibble)

df <- tibble(
  gender      = c("female", "female", "male"),
  age         = c(18, 19, 18),
  age_partner = c(22, 20, 17),
  max_age     = c(60, 60, 65), 
  nrs         = c(42, 41, 47)
  )

nrs_split <- split(df, df$nrs)
df_list <- lapply(nrs_split, function(i) i[rep(seq_len(nrow(i)), each=i$nrs + 1), ])
df_renum <- lapply(df_list, function(i) {i$id <- 0:rle(i$nrs)$values; return(i)})
df <- do.call("rbind", df_renum)
df
#> # A tibble: 133 x 6
#>    gender   age age_partner max_age   nrs    id
#>  * <chr>  <dbl>       <dbl>   <dbl> <dbl> <int>
#>  1 female    19          20      60    41     0
#>  2 female    19          20      60    41     1
#>  3 female    19          20      60    41     2
#>  4 female    19          20      60    41     3
#>  5 female    19          20      60    41     4
#>  6 female    19          20      60    41     5
#>  7 female    19          20      60    41     6
#>  8 female    19          20      60    41     7
#>  9 female    19          20      60    41     8
#> 10 female    19          20      60    41     9
#> # … with 123 more rows

相关问题