R语言根据多边形快速过滤SF点网格

ev7lccsx 于 2023-11-14 发布在其他

关注(0)|答案(3)|浏览(208)

我想制作网格（在 x 和 y 坐标的 Dataframe 的意义上）在美国或美国的地区，抛出原始网格中超出美国边界的任何点。我有一些代码，似乎可以工作，但当我将网格增量缩小到1 km时，它非常慢（1e3）左右。如何才能更快地完成这件事？也许有一种方法可以构建我需要的简单功能集合，而不需要lapply或循环，或者也许这可以用MULTIPOINT来完成，而不是POINT s的简单功能集合。

library(sf)

crs.us.atlas = 2163 # https://epsg.io/2163

us = read_sf(paste0("/vsizip/",
    "/tmp/us.zip")) # from: https://www2.census.gov/geo/tiger/GENZ2019/shp/cb_2019_us_state_500k.zip
# Filter down to the lower 48 + DC.
us = us[us$STUSPS %in% setdiff(c(state.abb, "DC"), c("AK", "HI")),]
us = st_transform(us, crs = crs.us.atlas)
l = as.list(st_bbox(us))

increment = 1e5

g = expand.grid(
    x = seq(l$xmin, l$xmax, by = increment),
    y = seq(l$ymin, l$ymax, by = increment))

message("Running the slow part")
print(system.time(g <- g[0 < sapply(FUN = length, st_intersects(
    st_as_sfc(crs = crs.us.atlas, lapply(1 : nrow(g), function(i)
        st_point(as.numeric(g[i, c("x", "y")])))),
    us)),]))

print(nrow(g))

字符串

来源：https://stackoverflow.com/questions/65687204/quickly-filter-down-a-grid-of-sf-points-according-to-a-polygon

3条答案

按热度按时间

ru9i0ody1#

我会解决这个问题如下。首先，加载包。tmap只用于Map，你可以很容易地忽略它

library(sf)
#> Linking to GEOS 3.8.0, GDAL 3.0.4, PROJ 6.3.1
library(sfheaders)
library(tmap)

字符串
下载和读入数据

temp_zip <- tempfile(fileext = ".zip")
download.file(
  "https://www2.census.gov/geo/tiger/GENZ2019/shp/cb_2019_us_state_500k.zip", 
  destfile = temp_zip
)
unzip(temp_zip, exdir = tempdir())
us <- st_read(file.path(tempdir(), "cb_2019_us_state_500k.shp"))
#> Reading layer `cb_2019_us_state_500k' from data source `C:\Users\Utente\AppData\Local\Temp\RtmpCakscO\cb_2019_us_state_500k.shp' using driver `ESRI Shapefile'
#> Simple feature collection with 56 features and 9 fields
#> geometry type:  MULTIPOLYGON
#> dimension:      XY
#> bbox:           xmin: -179.1489 ymin: -14.5487 xmax: 179.7785 ymax: 71.36516
#> geographic CRS: NAD83

型
过滤到较低的48 + DC。

us = us[us$STUSPS %in% setdiff(c(state.abb, "DC"), c("AK", "HI")),]
us = st_transform(us, crs = 2163)

型
定义增量、网格和sfc对象，关键是创建sfc_point对象，为后续操作做准备

increment = 1e5
g = expand.grid(
  x = seq(st_bbox(us)$xmin, st_bbox(us)$xmax, by = increment),
  y = seq(st_bbox(us)$ymin, st_bbox(us)$ymax, by = increment)
)
g_sfc <- sfc_point(as.matrix(g)) %>% 
  st_set_crs(2163)

型
查找美国境内包含的点的ID

my_ids <- unlist(st_contains(us, g_sfc))

型
可视化结果

tm_shape(g_sfc) + 
  tm_dots(col = "grey", size = 0.05) + 
tm_shape(g_sfc[my_ids]) + 
  tm_dots(col = "darkred", size = 0.05) + 
tm_shape(us) + 
  tm_borders(lwd = 1.3)

型
x1c 0d1x的数据
重复1 e3（但我不会添加任何情节，因为这几乎是1300万点）

increment = 1e3
g = expand.grid(
  x = seq(st_bbox(us)$xmin, st_bbox(us)$xmax, by = increment),
  y = seq(st_bbox(us)$ymin, st_bbox(us)$ymax, by = increment)
)

型
生成数据大约需要20秒

system.time({
  g_sfc <- sfc_point(as.matrix(g)) %>% 
    st_set_crs(2163)
})
#>    user  system elapsed 
#>   16.29    0.92   17.27

80秒内找到美国境内各点的身份证

system.time({
  my_ids <- unlist(st_contains(us, g_sfc))
})
#>    user  system elapsed 
#>   67.75    8.32   80.86

创建于2021-01-13由reprex package（v0.3.0）
如果你需要更高效的东西，我建议你polyclip。

赞(0）回复(0）举报 2023-11-14

b5lpy0ml2#

或者，您可以使用fasterize包以所需的分辨率围绕形状文件创建光栅网格，然后使用raster::rasterToPoints函数提取网格坐标。
这几乎可以立即收集xy位置。转换回sf对象需要大约10秒左右。

library(sf)
library(fasterize)
library(raster)

crs.us.atlas = 2163

# https://www2.census.gov/geo/tiger/GENZ2019/shp/cb_2019_us_state_500k.zip
us = read_sf(list.files(pattern='.shp$'))
us = us[us$STUSPS %in% setdiff(c(state.abb, "DC"), c("AK", "HI")),]
us = st_transform(us, crs = crs.us.atlas)

increment = 1e3

# create the empty raster to for fasterize base
usrast = raster::raster(us, resolution=rep(increment, 2))

# rasterize the shapefile
r = fasterize::fasterize(us, usrast)

# extract raster cell coordinates
coords = rasterToPoints(r)

# convert coordinates to sf
shp = coords %>%
    as.data.frame() %>%
    st_as_sf(crs=crs.us.atlas, coords=c('x', 'y'))

字符串

的数据

赞(0）回复(0）举报 2023-11-14

fykwrbwg3#

sf有一个函数st_make_grid可以帮助解决这个问题（在旧版本的sf中，它甚至会自动裁剪到多边形），但奇怪的是，在撰写本文时它是quite slow。
我可以通过使用st_as_sf将g转换为简单的点特征集合来获得合理的性能，而无需额外的依赖性：

g = st_as_sf(crs = crs.us.atlas, coords = c(1, 2), g)

字符串
然后，在@agila使用unlist之后，慢的部分变成了

print(system.time(g <- g[unlist(st_intersects(us, g)),]))

型
在increment = 1e3的情况下，在高端服务器上需要3分钟。

赞(0）回复(0）举报 2023-11-14

我来回答

R语言根据多边形快速过滤SF点网格

3条答案

相关问题

热门标签

最新问答

R语言 根据多边形快速过滤SF点网格

3条答案

相关问题

热门标签

最新问答

R语言根据多边形快速过滤SF点网格