R语言 如何自动组合多个shapefile并将文件名保留为列?

x33g5p2x  于 2023-06-27  发布在  其他
关注(0)|答案(2)|浏览(97)

我有多个点shapefile,每个包含两列:“id”列表示每个点的唯一标识符,并且这些标识符跨所有单独的shapefile是一致的,这意味着它们是跨多个文件的确切点。
我的目标是将这些shapefile合并到一个文件中。为了避免任何混淆,我想用每个shapefile的原始文件名重命名“Measurement”列。这样,我将能够准确地识别测量的来源。

Shapefile_1.shp

| id|测量|
| - -----|- -----|
| 一个|十点五|
| 2|十二点三|
| 三个|九点八|

Shapefile_2.shp

| id|测量|
| - -----|- -----|
| 一个|8.2|
| 2|十一点七|
| 三个|10.1|

期望输出

| id| Shapefile_1| Shapefile_2|
| - -----|- -----|- -----|
| 一个|十点五|8.2|
| 2|十二点三|十一点七|
| 三个|九点八|10.1|
谢谢!

vsmadaxz

vsmadaxz1#

使用{dplyr},您可以执行以下操作:

library(dplyr)

Shapefile_1.shp <- data.frame(id = 1:3,
                  Measurement = c(10.5, 12.3, 9.8)) 

Shapefile_2.shp <- data.frame(id = 1:3,
                              Measurement = c(8.2, 11.7, 10.1 )) 

left_join(Shapefile_1.shp, Shapefile_2.shp, by = "id") |> 
  rename(Shapefile_1 = Measurement.x, Shapefile_2 = Measurement.y)
#>   id Shapefile_1 Shapefile_2
#> 1  1        10.5         8.2
#> 2  2        12.3        11.7
#> 3  3         9.8        10.1

创建于2023-06-24带有reprex v2.0.2

3bygqnnd

3bygqnnd2#

如果sf对象在一个命名列表中,我们可以只使用purrr::list_rbind()绑定行,并在附加列中包含每个sf对象的名称。如果确实需要宽格式,我们可以使用tidyr::pivot_wider()来实现。
首先,让我们准备一个reprex:

library(sf)
library(dplyr)
library(tidyr)
library(purrr)

# generate some example datasets based on nc.shp, 
# each dataset with a different mean value
point_meas_dataset <- function(points, mean, sd){
  st_sf(id = seq_along(points),
        measurement = rnorm(length(points), mean, sd),
        geomerty = points)
}
nc_c <- st_read(system.file("shape/nc.shp", package="sf")) %>% 
  st_centroid() %>% 
  st_geometry()

set.seed(123)
shapefile_1 <- point_meas_dataset(nc_c, 10, 1)
shapefile_2 <- point_meas_dataset(nc_c, 12, 1)
shapefile_3 <- point_meas_dataset(nc_c, 14, 1)

# keep sf objects in a named list, i.e use list.files() + lapply() + st_read()
# to import list of shapefiles
meas_shapes <- list("shapefile_1.shp" = shapefile_1,
                    "shapefile_2.shp" = shapefile_2,
                    "shapefile_3.shp" = shapefile_3)

合并存储在meas_shapes中的sf对象,同时保留shapefile名称作为附加标识:

# row-bind all list elements, to form a long table, store names in "shapefile" column
meas_long <- list_rbind(meas_shapes, names_to ="shapefile") %>% 
  st_as_sf()
meas_long
#> Simple feature collection with 300 features and 3 fields
#> Geometry type: POINT
#> Dimension:     XY
#> Bounding box:  xmin: -84.05986 ymin: 34.07671 xmax: -75.8095 ymax: 36.49111
#> Geodetic CRS:  NAD27
#> First 10 features:
#>          shapefile id measurement                   geomerty
#> 1  shapefile_1.shp  1    9.439524  POINT (-81.49823 36.4314)
#> 2  shapefile_1.shp  2    9.769823 POINT (-81.12513 36.49111)
#> 3  shapefile_1.shp  3   11.558708 POINT (-80.68573 36.41252)
#> 4  shapefile_1.shp  4   10.070508 POINT (-76.02719 36.40714)
#> 5  shapefile_1.shp  5   10.129288 POINT (-77.41046 36.42236)
#> 6  shapefile_1.shp  6   11.715065 POINT (-76.99472 36.36142)
#> 7  shapefile_1.shp  7   10.460916 POINT (-76.23402 36.40122)
#> 8  shapefile_1.shp  8    8.734939 POINT (-76.70446 36.44428)
#> 9  shapefile_1.shp  9    9.313147 POINT (-78.11042 36.39693)
#> 10 shapefile_1.shp 10    9.554338 POINT (-80.23429 36.40042)

# with long (tidy) format, the actual number of input shapefiles does not matter and it's 
# easy to aggregate over each feature (i.e. name of the shapefile);
# or use grouping & faceting by shapefile name when using ggplot
meas_long %>% 
  st_drop_geometry() %>% 
  group_by(shapefile) %>% 
  summarise(meas_mean = mean(measurement),
            meas_sd = sd(measurement))
#> # A tibble: 3 × 3
#>   shapefile       meas_mean meas_sd
#>   <chr>               <dbl>   <dbl>
#> 1 shapefile_1.shp      10.1   0.913
#> 2 shapefile_2.shp      11.9   0.967
#> 3 shapefile_3.shp      14.1   0.950

# but we can pivot to wide as well; and perhaps drop geometry column 
# if dataframe / tibble is desired and actual coordinates are not needed anymore:
meas_long %>%
  st_drop_geometry() %>% 
  pivot_wider(names_from = "shapefile", values_from = "measurement")
#> # A tibble: 100 × 4
#>       id shapefile_1.shp shapefile_2.shp shapefile_3.shp
#>    <int>           <dbl>           <dbl>           <dbl>
#>  1     1            9.44            11.3            16.2
#>  2     2            9.77            12.3            15.3
#>  3     3           11.6             11.8            13.7
#>  4     4           10.1             11.7            14.5
#>  5     5           10.1             11.0            13.6
#>  6     6           11.7             12.0            13.5
#>  7     7           10.5             11.2            13.2
#>  8     8            8.73            10.3            13.4
#>  9     9            9.31            11.6            15.7
#> 10    10            9.55            12.9            13.9
#> # ℹ 90 more rows

创建于2023-06-24带有reprex v2.0.2

相关问题