将一个 Dataframe 拆分为多个 Dataframe ，以运行仅采用两列 Dataframe 的函数

gkn4icbw 于 2022-12-05 发布在其他

关注(0)|答案(2)|浏览(178)

I want to perform a column-wise operation in R on column pairs. The function I actually want to use is not the one shown here, because it would complicate this example.
I have a dataframe:

df <- data.frame(p1 = c(-5, -4, 2, 0, -2, 1, 3, 4, 2, 7)
                 ,p2 = c(0, 1, 2, 0, -2, 1, 3, 3, 2, 0))

and a vector of the same length as the df :

tocompare <- c(0, 0, 2, 0, 2, 4, 16, 12, 6, 9)

I want to run a function that compares each column of df to the tocompare object. The steps I need to take is:

Make a two-element list. First element is a two-column dataframe x , in which the first column comes from the df and the second column is the tocompare object. Second element is a number. (this is needed for my actual function to work, I appreciate that it is not needed in this example). This number is constant for all iterations of this process (it's a number of rows in df / length of tocompare ) in this example, it's 10 .

data1 <- list(x = cbind(df %>% select(1), tocompare), N = length(tocompare))

# select(1) is used rather than df[,1] ensures the column header is kept

Compare the two columns of the first element (called x ) of the data1 list. The function that I use in real life is not cor ; this simplified example captures the problem. I wrote my_function in such a way that it needs the data1 object created above.

my_function <- function(data1){
x <- data1[[1]]
cr <- cor(x[,1], x[,2])
header <- colnames(x)[1]
print(c(header, cr))
}

cr_df1 <- my_function(data1)

I can do the same for the second df column:

data2 <- list(x = cbind(df %>% select(2), tocompare), N = length(tocompare))
cr_df2 <- my_function(data2)

And make a dataframe of final results:

final_df <- rbind(cr_df1, cr_df2) %>% 
`rownames<-`(NULL) %>% 
`colnames<-`(c("p", "R")) %>% 
as.data.frame()

the output will look like this:

> final_df 
   p         R
1 p1 0.7261224
2 p2 0.6233169

I would like to do this on a dataframe with thousands of columns. The bit I don't know is how to split the single dataframe into multiple two-column dataframes and then run my_function on these many small dataframes to return a single output. I think I would be able to do it with a loop and with transposing the df , but maybe there is a better way (I feel I should try to use map here)?

来源：https://stackoverflow.com/questions/74641351/split-a-dataframe-into-multiple-to-run-a-function-that-only-takes-two-column-dat

2条答案

按热度按时间

mspsb9vt1#

您可以使用map来迭代地应用您的函数，而不是循环。要将 Dataframe 拆分为列，只需一次选择一列。1:ncol(df)将生成列号序列。因此

library(tidyverse)
map(1:ncol(df), function(column_number) df %>% select(all_of(column_number)))
#> [[1]]
#>    p1
#> 1  -5
#> 2  -4
#> 3   2
#> 4   0
#> 5  -2
#> 6   1
#> 7   3
#> 8   4
#> 9   2
#> 10  7
#> 
#> [[2]]
#>    p2
#> 1   0
#> 2   1
#> 3   2
#> 4   0
#> 5  -2
#> 6   1
#> 7   3
#> 8   3
#> 9   2
#> 10  0

要使函数处理这些列，请首先将其更改为输出 Dataframe

my_function2 <- function(data1){
x <- data1[[1]]
cr <- cor(x[,1], x[,2])
header <- colnames(x)[1]
tibble(header = header, cr = cr)
}

然后使用map将其打包，但使用map_df，以便每次迭代都作为一行绑定到 Dataframe

compare_fn <- function(df, compare_list, my_function){
    map_df(1:ncol(df),
           function(column_number) my_function(list(x = cbind(df %>% select(all_of(column_number)), compare_list),
                N = length(tocompare))))
}

并与

compare_fn(df, tocompare, my_function2)
#> # A tibble: 2 × 2
#>   header    cr
#>   <chr>  <dbl>
#> 1 p1     0.726
#> 2 p2     0.623

编辑：如果map_df调用中的函数花费的时间太长，而您希望使用furrr对其进行并行化，则可以使用

library(furrr)
plan(multisession, workers = 10) # or however many threads you have available

compare_fn2 <- function(df, compare_list, my_function){
    future_map(1:ncol(df),
           function(column_number) my_function(list(x = cbind(df %>% select(all_of(column_number)), compare_list),
                N = length(tocompare))))
}

#Since future_map() returns a list instead of a dataframe
#add bind_rows() to group each element into a dataframe

compare_fn2(df, tocompare, my_function2) %>% bind_rows

赞(0）回复(0）举报 2022-12-05

unguejic2#

更通用的方法是使用split.default()，

lapply(split.default(df, seq(ncol(df))), function(i) cbind(i, tocompare))

$`1`
   p1 tocompare
1  -5         0
2  -4         0
3   2         2
4   0         0
5  -2         2
6   1         4
7   3        16
8   4        12
9   2         6
10  7         9

$`2`
   p2 tocompare
1   0         0
2   1         0
3   2         2
4   0         0
5  -2         2
6   1         4
7   3        16
8   3        12
9   2         6
10  0         9

然后将函数应用于列表的每个元素

赞(0）回复(0）举报 2022-12-05

我来回答

将一个 Dataframe 拆分为多个 Dataframe ，以运行仅采用两列 Dataframe 的函数

2条答案

相关问题

热门标签

最新问答