最快的空间差分方法(st_difference)大型数据集

juud5qan  于 2023-03-27  发布在  其他
关注(0)|答案(1)|浏览(131)

我有一个很大的面数据集,我想从每个面中减去一个唯一的缓冲区宽度,然后将缓冲区与原始面进行比较。我正在使用HydroSHEDS的HydroLAKES数据库https://www.hydrosheds.org/products/hydrolakes(Lake polygons shapefile,820 mb),并将此文件裁剪到加拿大/阿拉斯加/美国大陆的范围。裁剪到我感兴趣的区域后,有~966个,000行/观察。区分缓冲区从原来的多边形需要很长的时间(〉1.5小时在我的电脑上,即使并行处理;我有8个核心和32 GB内存),因此我想知道最有效的方法是什么.也许我不是有效的并行处理.乐意使用data.table包,以实现更快的结果,如果必要的.

library(dplyr)
library(sf)
library(parallel)
library(doParallel)

# generate dataset
hydrolakes_na <- read_sf("HydroLAKES_polys_v10_shp/HydroLAKES_polys_v10_shp/HydroLAKES_polys_v10.shp")
hydrolakes_na <- hydrolakes_na %>% filter(Country %in% c('Canada', 'United States of America'))
hydrolakes_na <- st_transform(hydrolakes_na, 3161) # using st_crs(3161)
ext_of_interest <- st_bbox(c(xmin=-2370387.108629, xmax=3349430.12726586, ymin=9663917.12113056, ymax=16719362.4659996), crs=st_crs(3161)) # desired spatial extent 
hydrolakes_crop <- st_crop(hydrolakes_na, ext_of_interest)

# inward buffer width (in meters)
set.seed(111)
hydrolakes_crop = hydrolakes_crop %>% mutate(inbuff_width = sample(100:1000, nrow(hydrolakes_crop)))

# without parallel processing (note that '-' preceding .$inbuff, to create inward buffer) 
hydrolakes_diff = hydrolakes_crop %>% st_buffer(-.$inbuff_width)
lake_diff = st_difference(st_geometry(hydrolakes_crop), st_union(st_geometry(hydrolakes_diff)))

# with parallel processing
cl <- parallel::makeCluster(4)
doParallel::registerDoParallel(cl)

lake_diff = foreach(i=seq_len(nrow(hydrolakes_crop)), .packages=c("sf")) %dopar% {st_difference(st_geometry(hydrolakes_crop[i,]), st_union(st_geometry(hydrolakes_diff[i,])))
}
mjqavswn

mjqavswn1#

我意识到我可以通过简单地将多边形转换为线串来解决这个问题,然后使用singleSide = F缓冲线串的内侧。

hydrolakes_line = hydrolakes_crop %>% st_cast("MULTILINESTRING")
hydrolakes_innerbuff = hydrolakes_line %>% st_buffer(-.$inbuff, singleSide=T)

相关问题