有效地连接for循环输出的tibble

7gcisfzg  于 2023-04-18  发布在  其他
关注(0)|答案(2)|浏览(127)

我在一个for循环中有一些复杂的分析代码,我试图在一个dataframe / tibble中编译输出。按照here的答案,我避免修改循环中的主输出tibble,而是尝试将每次迭代的输出附加到一个列表(或类似的列表)中,并在之后将其连接起来。但我在准确的语法和正确的维度方面遇到了麻烦。
具体来说,我有这样的东西:

months <- c("Jan", "Feb", "Mar", "Apr")

iter_out <- list() # Container for output

## Iterate over each month of data
for (i in 1:length(months)) {

  ## Subset data
  iter_data <- data %>% filter(month < months[[i]])
  ## Note that actual operation is more complex than this, so simply grouping by month and doing a grouped apply won't work

  ## Run analysis  
  output_1 <- run_first_analysis(iter_data)
  output_2 <- run_second_analysis(iter_data)

  iter_out <- c(iter_out, list(months[[i]], output_1, output_2))
}

(Note更复杂的是,output_1output_2本身也是列表,我希望保持这种方式。)
从循环(或从iter_out),我想得到这样的输出:

# A tibble: 4 × 3
  month  output_1   output_2 
  <chr>   <list>     <list>
1 "Jan" <list [3]> <list [3]>
2 "Feb" <list [3]> <list [3]>
3 "Mar" <list [3]> <list [3]>
4 "Apr" <list [3]> <list [3]>

使用列表列的输出,第一列/行名称是月份。然而,我所得到的只是一个12 x 1的tibble中的所有内容。
什么是最好的方式来连接迭代输出的方式我喜欢的?

to94eoyn

to94eoyn1#

如果你想创建嵌套数据,你可以使用tibble。它是这样的。

library(dplyr)
months <- c("Jan", "Feb", "Mar", "Apr")

iter_out <- tibble(month = character(), output_1 = list(), output_2 = list()) # Container for output

## Iterate over each month of data
for (i in 1:length(months)) {
  
  ## Subset data
  iter_data <- data %>% filter(month < months[[i]])
  ## Note that actual operation is more complex than this, so simply grouping by month and doing a grouped apply won't work
  
  ## Run analysis  
  output_1 <- run_first_analysis(iter_data)
  output_2 <- run_second_analysis(iter_data)
  
  iter_out <- bind_rows(iter_out, tibble(month = months[[i]], output_1 = list(output_1), output_2 = list(output_2)))
}

但我更喜欢使用lapply而不是for循环。

library(dplyr)
months <- c("Jan", "Feb", "Mar", "Apr")

lapply(months, function(x){
  ## Subset data
  iter_data <- data %>% filter(month < x)
  ## Note that actual operation is more complex than this, so simply grouping by month and doing a grouped apply won't work
  
  ## Run analysis  
  output_1 <- run_first_analysis(iter_data)
  output_2 <- run_second_analysis(iter_data)
  
  tibble(month = x, output_1 = list(output_1), output_2 = list(output_2))
}) %>% 
  bind_rows()
ulmd4ohb

ulmd4ohb2#

你链接到的答案警告说,每次for循环的迭代都会增加你的输出。解决这个问题的方法不是完全避免使用 Dataframe ,而是为它们初始化一个容器。你的解决方案涉及到增加一个列表而不是一个 Dataframe ,所以对于大量的迭代来说仍然是低效的。
为了让你的代码更高效,你需要初始化一个长度正确的空列表,然后在for循环中用值填充这个列表:

months <- c("Jan", "Feb", "Mar", "Apr")

# Create an empty list of length 5
iter_out <- vector("list", length(months))

## Iterate over each month of data
for (i in 1:length(months)) {
  
  ## Subset data
  iter_data <- data %>% filter(month < months[[i]])
  ## Note that actual operation is more complex than this, so simply grouping by month and doing a grouped apply won't work
  
  ## Run analysis  
  output_1 <- run_first_analysis(iter_data)
  output_2 <- run_second_analysis(iter_data)
  
  # Set the relevant item in the list to your data
  iter_out[[i]] <- tibble(
    month = months[[i]], 
    output_1 = list(output_1), 
    output_2 = list(output_2)
  )
}

使用dplyr::bind_rows()purrr::list_rbind()可以很容易地组合结果列表(我创建了一个超级基本的示例列表来演示这一点)。

library(tidyverse)

iter_out <- list(
  tibble(month = "x", output_1 = list(list()), output_2 = list(list())),
  tibble(month = "x", output_1 = list(list()), output_2 = list(list()))
)

iter_out
#> [[1]]
#> # A tibble: 1 × 3
#>   month output_1   output_2  
#>   <chr> <list>     <list>    
#> 1 x     <list [0]> <list [0]>
#> 
#> [[2]]
#> # A tibble: 1 × 3
#>   month output_1   output_2  
#>   <chr> <list>     <list>    
#> 1 x     <list [0]> <list [0]>

list_rbind(iter_out)
#> # A tibble: 2 × 3
#>   month output_1   output_2  
#>   <chr> <list>     <list>    
#> 1 x     <list [0]> <list [0]>
#> 2 x     <list [0]> <list [0]>

bind_rows(iter_out)
#> # A tibble: 2 × 3
#>   month output_1   output_2  
#>   <chr> <list>     <list>    
#> 1 x     <list [0]> <list [0]>
#> 2 x     <list [0]> <list [0]>

相关问题