R语言 lapply很慢，有没有更快的方法？

xxls0lw8 于 11个月前发布在其他

关注(0)|答案(2)|浏览(129)

我有一个相当大的数据集，它被组织成如下的列表：

set.seed(0)
v <- rnorm(5000)
names(v) <- seq(1001, 6000, 1)
dates <- seq.Date(as.Date('2023-01-01'), by='day', length.out=365)
ls <- list()
ls <- sapply(dates, function(d) {ls[[length(ls) + 1]] <- v; ls})
names(ls) <- dates
str(ls[1:5])

List of 5
 $ 2023-01-01: Named num [1:5000] 1.263 -0.326 1.33 1.272 0.415 ...
  ..- attr(*, "names")= chr [1:5000] "1001" "1002" "1003" "1004" ...
 $ 2023-01-02: Named num [1:5000] 1.263 -0.326 1.33 1.272 0.415 ...
  ..- attr(*, "names")= chr [1:5000] "1001" "1002" "1003" "1004" ...
 $ 2023-01-03: Named num [1:5000] 1.263 -0.326 1.33 1.272 0.415 ...
  ..- attr(*, "names")= chr [1:5000] "1001" "1002" "1003" "1004" ...
 $ 2023-01-04: Named num [1:5000] 1.263 -0.326 1.33 1.272 0.415 ...
  ..- attr(*, "names")= chr [1:5000] "1001" "1002" "1003" "1004" ...
 $ 2023-01-05: Named num [1:5000] 1.263 -0.326 1.33 1.272 0.415 ...
  ..- attr(*, "names")= chr [1:5000] "1001" "1002" "1003" "1004" ...

字符串
如您所见，这是一年中每天的5000个数据点，总计1，825，000个数据点。我们将它们称为“x”。我想对每个数据点x执行以下操作：max（c（0.5 - x，0））。以下代码可以工作，但需要很长时间。

new <- sapply(names(ls), function(d)
          lapply(names(v), function(n) max(c(0.5 - ls[[d]][n], 0))))
rownames(new) <- names(v)
new[1:5, 1:5]

     2023-01-01 2023-01-02 2023-01-03 2023-01-04 2023-01-05
1001 0          0          0          0          0         
1002 0.8262334  0.8262334  0.8262334  0.8262334  0.8262334 
1003 0          0          0          0          0         
1004 0          0          0          0          0         
1005 0.08535857 0.08535857 0.08535857 0.08535857 0.08535857

型
有更快的路吗？？

来源：https://stackoverflow.com/questions/77370954/lapply-is-very-slow-is-there-a-faster-way

2条答案

按热度按时间

svmlkihl1#

使用pmax代替lapply

new <- sapply(ls, \(v) pmax(0.5 - v, 0))

字符串

赞(0）回复(0）举报 11个月前

hlswsv352#

您可以尝试：

m <- 0.5 - do.call(cbind, ls)
m[m < 0] <- 0

字符串
该公式给出：

head(m, c(5,5))

     2023-01-01 2023-01-02 2023-01-03 2023-01-04 2023-01-05
1001 0.00000000 0.00000000 0.00000000 0.00000000 0.00000000
1002 0.82623336 0.82623336 0.82623336 0.82623336 0.82623336
1003 0.00000000 0.00000000 0.00000000 0.00000000 0.00000000
1004 0.00000000 0.00000000 0.00000000 0.00000000 0.00000000
1005 0.08535857 0.08535857 0.08535857 0.08535857 0.08535857

型
基准：

bench::mark(a = sapply(ls, \(v) pmax(0.5 - v, 0)),
            b = {
              m <- do.call(cbind, ls)
              pmax(0.5 - m, 0)
            },
            c = {
              m <- 0.5 - do.call(cbind, ls)
              m[m < 0] <- 0
              m
            })

# A tibble: 3 × 13
  expression      min   median `itr/sec` mem_alloc `gc/sec` n_itr  n_gc total_time result   memory             time       gc      
  <bch:expr> <bch:tm> <bch:tm>     <dbl> <bch:byt>    <dbl> <int> <dbl>   <bch:tm> <list>   <list>             <list>     <list>  
1 a           247.6ms    248ms      4.04     305MB     4.04     1     1      248ms <dbl[…]> <Rprofmem>         <bench_tm> <tibble>
2 b           113.1ms    114ms      8.77     229MB     0        5     0      570ms <dbl[…]> <Rprofmem [4 × 3]> <bench_tm> <tibble>
3 c            99.5ms    106ms      9.16     165MB     0        5     0      546ms <dbl[…]> <Rprofmem [5 × 3]> <bench_tm> <tibble>

型

赞(0）回复(0）举报 11个月前

我来回答

R语言 lapply很慢，有没有更快的方法？

2条答案

相关问题

热门标签

最新问答