R语言 lapply很慢,有没有更快的方法?

xxls0lw8  于 11个月前  发布在  其他
关注(0)|答案(2)|浏览(129)

我有一个相当大的数据集,它被组织成如下的列表:

set.seed(0)
v <- rnorm(5000)
names(v) <- seq(1001, 6000, 1)
dates <- seq.Date(as.Date('2023-01-01'), by='day', length.out=365)
ls <- list()
ls <- sapply(dates, function(d) {ls[[length(ls) + 1]] <- v; ls})
names(ls) <- dates
str(ls[1:5])

List of 5
 $ 2023-01-01: Named num [1:5000] 1.263 -0.326 1.33 1.272 0.415 ...
  ..- attr(*, "names")= chr [1:5000] "1001" "1002" "1003" "1004" ...
 $ 2023-01-02: Named num [1:5000] 1.263 -0.326 1.33 1.272 0.415 ...
  ..- attr(*, "names")= chr [1:5000] "1001" "1002" "1003" "1004" ...
 $ 2023-01-03: Named num [1:5000] 1.263 -0.326 1.33 1.272 0.415 ...
  ..- attr(*, "names")= chr [1:5000] "1001" "1002" "1003" "1004" ...
 $ 2023-01-04: Named num [1:5000] 1.263 -0.326 1.33 1.272 0.415 ...
  ..- attr(*, "names")= chr [1:5000] "1001" "1002" "1003" "1004" ...
 $ 2023-01-05: Named num [1:5000] 1.263 -0.326 1.33 1.272 0.415 ...
  ..- attr(*, "names")= chr [1:5000] "1001" "1002" "1003" "1004" ...

字符串
如您所见,这是一年中每天的5000个数据点,总计1,825,000个数据点。我们将它们称为“x”。我想对每个数据点x执行以下操作:max(c(0.5 - x,0))。以下代码可以工作,但需要很长时间。

new <- sapply(names(ls), function(d)
          lapply(names(v), function(n) max(c(0.5 - ls[[d]][n], 0))))
rownames(new) <- names(v)
new[1:5, 1:5]

     2023-01-01 2023-01-02 2023-01-03 2023-01-04 2023-01-05
1001 0          0          0          0          0         
1002 0.8262334  0.8262334  0.8262334  0.8262334  0.8262334 
1003 0          0          0          0          0         
1004 0          0          0          0          0         
1005 0.08535857 0.08535857 0.08535857 0.08535857 0.08535857


有更快的路吗??

svmlkihl

svmlkihl1#

使用pmax代替lapply

new <- sapply(ls, \(v) pmax(0.5 - v, 0))

字符串

hlswsv35

hlswsv352#

您可以尝试:

m <- 0.5 - do.call(cbind, ls)
m[m < 0] <- 0

字符串
该公式给出:

head(m, c(5,5))

     2023-01-01 2023-01-02 2023-01-03 2023-01-04 2023-01-05
1001 0.00000000 0.00000000 0.00000000 0.00000000 0.00000000
1002 0.82623336 0.82623336 0.82623336 0.82623336 0.82623336
1003 0.00000000 0.00000000 0.00000000 0.00000000 0.00000000
1004 0.00000000 0.00000000 0.00000000 0.00000000 0.00000000
1005 0.08535857 0.08535857 0.08535857 0.08535857 0.08535857


基准:

bench::mark(a = sapply(ls, \(v) pmax(0.5 - v, 0)),
            b = {
              m <- do.call(cbind, ls)
              pmax(0.5 - m, 0)
            },
            c = {
              m <- 0.5 - do.call(cbind, ls)
              m[m < 0] <- 0
              m
            })

# A tibble: 3 × 13
  expression      min   median `itr/sec` mem_alloc `gc/sec` n_itr  n_gc total_time result   memory             time       gc      
  <bch:expr> <bch:tm> <bch:tm>     <dbl> <bch:byt>    <dbl> <int> <dbl>   <bch:tm> <list>   <list>             <list>     <list>  
1 a           247.6ms    248ms      4.04     305MB     4.04     1     1      248ms <dbl[…]> <Rprofmem>         <bench_tm> <tibble>
2 b           113.1ms    114ms      8.77     229MB     0        5     0      570ms <dbl[…]> <Rprofmem [4 × 3]> <bench_tm> <tibble>
3 c            99.5ms    106ms      9.16     165MB     0        5     0      546ms <dbl[…]> <Rprofmem [5 × 3]> <bench_tm> <tibble>

相关问题