R -从zip存档列表中读入文件列表,而不解压缩它们

ih99xse1  于 2023-02-06  发布在  其他
关注(0)|答案(2)|浏览(140)

我正在尝试从zip存档列表中读入shapefile列表,而不实际解压缩存档。是的,我知道存档将在后台解压缩,但我想避免的是在Windows资源管理器中看到解压缩的文件。
这个例子可以完全重现,你需要从this Github repository下载所有的文件,并将你的工作目录设置为你下载它们的文件夹。
我还想用tidyverse-style,使用管道,不保存中间对象。
我现在要运行的代码是这样的:

library(tidyverse)
library(magrittr)
library(sf)

list.files() %>% 
  map(unzip, list = T) %>% 
  map(filter, grepl(".shp$", Name)) %>% 
  map(~ .x %$% Name) %>% 
  map2(.x = ., .y = list.files(), .f = ~st_read(unzip(zipfile = .y, files = .x)))

但是,这行不通。为什么?

EDIT:为了使示例更简洁,我想您也可以从上面的存储库中只下载两个文件。

kyks70gy

kyks70gy1#

/vsizip GDAL虚拟文件系统驱动程序非常方便:

library(sf)
#> Linking to GEOS 3.9.3, GDAL 3.5.2, PROJ 8.2.1; sf_use_s2() is TRUE
library(dplyr)
library(stringr)
library(purrr)

(file_list <- list.files(pattern = "\\.zip$"))
#> [1] "tl_2019_01_place.zip" "tl_2019_02_place.zip"
sf_list <- file_list %>% 
  # resulting list will have names without ".zip"
  set_names(str_remove(.,"\\.zip$")) %>%  
  map( ~ st_read(paste0("/vsizip/", .x)))
#> Reading layer `tl_2019_01_place' from data source `/vsizip/tl_2019_01_place.zip' using driver `ESRI Shapefile'
#> Simple feature collection with 586 features and 16 fields
#> Geometry type: MULTIPOLYGON
#> Dimension:     XY
#> Bounding box:  xmin: -88.4442 ymin: 30.19825 xmax: -84.96303 ymax: 34.99807
#> Geodetic CRS:  NAD83

#> Reading layer `tl_2019_02_place' from data source `/vsizip/tl_2019_02_place.zip' using driver `ESRI Shapefile'
#> Simple feature collection with 354 features and 16 fields
#> Geometry type: MULTIPOLYGON
#> Dimension:     XY
#> Bounding box:  xmin: -176.6967 ymin: 51.81049 xmax: 173.4299 ymax: 71.34019
#> Geodetic CRS:  NAD83

# 1st sf in the list:
sf_list$tl_2019_01_place %>% select(NAME, geometry)
#> Simple feature collection with 586 features and 1 field
#> Geometry type: MULTIPOLYGON
#> Dimension:     XY
#> Bounding box:  xmin: -88.4442 ymin: 30.19825 xmax: -84.96303 ymax: 34.99807
#> Geodetic CRS:  NAD83
#> First 10 features:
#>           NAME                       geometry
#> 1        Berry MULTIPOLYGON (((-87.6391 33...
#> 2      Fayette MULTIPOLYGON (((-87.85507 3...
#> 3       Gu-Win MULTIPOLYGON (((-87.88578 3...
#> 4     Ashville MULTIPOLYGON (((-86.30442 3...
#> 5     Margaret MULTIPOLYGON (((-86.46153 3...
#> 6    Odenville MULTIPOLYGON (((-86.38406 3...
#> 7  Littleville MULTIPOLYGON (((-87.68859 3...
#> 8      Ragland MULTIPOLYGON (((-86.18473 3...
#> 9   Fort Payne MULTIPOLYGON (((-85.74184 3...
#> 10    Sylvania MULTIPOLYGON (((-85.85684 3...

# 2nd sf in the list:
sf_list$tl_2019_02_place %>% select(NAME, geometry)
#> Simple feature collection with 354 features and 1 field
#> Geometry type: MULTIPOLYGON
#> Dimension:     XY
#> Bounding box:  xmin: -176.6967 ymin: 51.81049 xmax: 173.4299 ymax: 71.34019
#> Geodetic CRS:  NAD83
#> First 10 features:
#>           NAME                       geometry
#> 1   Whale Pass MULTIPOLYGON (((-133.1884 5...
#> 2    Utqiagvik MULTIPOLYGON (((-156.9255 7...
#> 3    Anchorage MULTIPOLYGON (((-150.4199 6...
#> 4  Toksook Bay MULTIPOLYGON (((-165.2769 6...
#> 5       Angoon MULTIPOLYGON (((-134.6313 5...
#> 6     Kaktovik MULTIPOLYGON (((-143.6574 7...
#> 7   Point Hope MULTIPOLYGON (((-166.8401 6...
#> 8        Homer MULTIPOLYGON (((-151.655 59...
#> 9     Kachemak MULTIPOLYGON (((-151.4731 5...
#> 10       Kenai MULTIPOLYGON (((-151.3526 6...

创建于2023年2月5日,使用reprex v2.0.2

lawou6xi

lawou6xi2#

您可以定义一个小函数来下载zip文件,解压缩它,将shape文件读入内存,删除临时文件,然后返回sf对象。
下面的函数可以完成所有这些操作:

read_online_zip_sf <- function(url) {
  dir.create("~/zipdir")
  f <- tempfile(tmpdir = "~/zipdir", fileext = ".zip")
  download.file(url, f)
  files <- unzip(f, list = TRUE)
  unzip(f, files = files$Name, exdir = "~/zipdir/files")
  obj <- sf::st_read("~/zipdir/files")
  unlink("~/zipdir", recursive = TRUE)
  return(obj)
}

所以,现在没有任何mucking约在文件资源管理器,我们可以做:

url <- paste0("https://github.com/generalpiston/geojson-us-city-boundaries/",
              "raw/master/shapes/tl_2019_02_place.zip")

mysf <- read_online_zip_sf(url)
#> Reading layer `tl_2019_02_place' from data source 
#>   `C:\Users\Administrator\Documents\zipdir\files' using driver `ESRI Shapefile'
#> Simple feature collection with 354 features and 16 fields
#> Geometry type: MULTIPOLYGON
#> Dimension:     XY
#> Bounding box:  xmin: -176.6967 ymin: 51.81049 xmax: 173.4299 ymax: 71.34019
#> Geodetic CRS:  NAD83

这似乎是阿拉斯加城市边界的形状文件,为了完整起见,让我们将其绘制出来:

library(ggplot2)
library(rnaturalearth)

usa <- ne_countries(50, country = "United States of America", 
                    returnclass = "sf")

ggplot(usa) + 
  geom_sf() + 
  geom_sf(data = mysf, fill = "red", alpha = 0.5) +
  coord_sf(xlim = c(-180, -131), ylim = c(51, 72)) +
  theme_minimal()

创建于2023年2月5日,使用reprex v2.0.2

相关问题