如何使purrr map函数运行得更快？

ecbunoof 于 2023-04-18 发布在其他

关注(0)|答案(2)|浏览(234)

我正在使用purrr库中的map函数来应用segmented函数（来自segmented库），如下所示：

library(purrr)
library(dplyr)
library(segmented)

# Data frame is nested to create list column
by_veh28_101 <- df101 %>% 
  filter(LCType=="CFonly", Lane %in% c(1,2,3)) %>% 
  group_by(Vehicle.ID2) %>% 
  nest() %>% 
  ungroup()

# Functions:
segf2 <- function(df){
  try(segmented(lm(svel ~ Time, data=df), seg.Z = ~Time,
                psi = list(Time = df$Time[which(df$dssvel != 0)]),
                control = seg.control(seed=2)),
      silent=TRUE)
}

segf2p <- function(df){
  try(segmented(lm(PrecVehVel ~ Time, data=df), seg.Z = ~Time,
                psi = list(Time = df$Time[which(df$dspsvel != 0)]),
                control = seg.control(seed=2)),
      silent=TRUE)
}  

# map function:
models8_101 <- by_veh28_101 %>% 
  mutate(segs = map(data, segf2),
         segsp = map(data, segf2p))

对象by_veh28_101包含2457个tibbles。最后一步，使用map函数，需要16分钟才能完成。有什么方法可以让它更快吗？

来源：https://stackoverflow.com/questions/41005938/how-to-make-purrr-map-function-run-faster

2条答案

按热度按时间

azpvetkf1#

您可以使用函数future_map而不是map。
此函数来自furrr软件包，是map系列的并行选项。下面是该软件包README的链接。
因为您的代码质疑它是不可复制的，所以我无法准备map和future_map函数之间的基准测试。
使用future_map函数的代码如下：

library(tidyverse)
library(segmented)
library(furrr)

# Data frame stuff....

# Your functions....

# future_map function

# this distribute over the different cores of your computer
# You set a "plan" for how the code should run. The easiest is `multiprocess`
# On Mac this picks plan(multicore) and on Windows this picks plan(multisession)

plan(strategy = multiprocess)

models8_101 <- by_veh28_101 %>% 
  mutate(segs = future_map(data, segf2),
         segsp = future_map(data, segf2p))

赞(0）回复(0）举报 2023-04-18

mum43rcc2#

好了，刚才我用一些简单的Rcpp重写了一个purrr::map循环（通过一些逻辑测试过滤列表中〉40，000个向量元素）。以前它不能在〉2分钟内完成;但现在它在几秒钟内完成运行（准确地说，大约2~3秒）。仅供参考。

赞(0）回复(0）举报 2023-04-18