我从八个调查地点A到H和九年(1999年,2000年,2001年,2006年,2008年,2011年,2013年,2016年,2019年)获得数据。
library (dplyr)
# create column names for years
years <- c(1999, 2000, 2001, 2006, 2008, 2011, 2013, 2016, 2019)
# create an empty data frame with sites A:H in the first column
df <- data.frame(Site = c("A", "B", "C", "D", "E", "F", "G", "H"),
matrix(NA, nrow = 8, ncol = length(years),
dimnames = list(NULL, years)))
# remove the X before the year in the column names
colnames(df)[-1] <- years
# generate random numbers for each site and year
set.seed(123) # for reproducibility
for (i in 1:nrow(df)) {
df[i, 2:ncol(df)] <- runif(length(years))
}
我计算了调查地点和年份的欧氏距离,将此矩阵存储为 Dataframe ,并将其拆分为Site
的 Dataframe 列表
# gather to long form to calculate dist.
df = df %>% gather (key = "Year", value = "Value", c(2:last_col()))
# calculate dist and set as a df
dist.df <- df %>%
mutate(YrSi = paste(substr(Year, 3, 4), Site)) %>%
select(-Year, -Site) %>%
column_to_rownames(var = "YrSi") %>%
dist() %>%
as.matrix() %>%
as.data.frame()
#split dist.df to a list of dfs per site
dist.df.list = dist.df %>%
rownames_to_column("YrSi") %>%
separate(YrSi, c("Year", "Site"), sep = " ") %>%
mutate (Year = as.numeric (ifelse (Year == "99", sprintf("19%s", Year),sprintf("20%s", Year)))) %>% # Change to yyyy
gather(key = "YrSi", value = "Dist", c(3:last_col())) %>%
separate(YrSi, c("Year2", "Site2"), sep = " ") %>%
mutate (Year2 = as.numeric (ifelse (Year2 == "99", sprintf("19%s", Year2),sprintf("20%s", Year2)))) %>% # Change to yyyy
arrange(Site, Year, Site2, Year2) %>%
spread (key = "Year2", value = "Dist") %>%
group_by(Site, Site2) %>%
subset (Site == Site2) %>%
relocate (Year, .after = Site2) %>%
group_split()
我想创建一个新的数据名result.df
,它将第一列显示为Site
,然后每一列都是从2000年开始的调查年份(见下文)。在每一列下,将显示给定站点的调查年份与前一年调查之间的距离。例如:在2000年下,将显示每个地点1999年和2000年之间的距离;在2008年下,将显示该年与2006年之间的距离;等等。为了简化,我想提取对角线,如图中突出显示的:
为了使它更易于管理,我收集了一个长格式的数据,所以不是每年的一列,而是Year
列和Year2
列:
dist.df.list = dist.df %>%
rownames_to_column("YrSi") %>%
separate(YrSi, c("Year", "Site"), sep = " ") %>%
mutate (Year = as.numeric (ifelse (Year == "99", sprintf("19%s", Year),sprintf("20%s", Year)))) %>% # Change to yyyy
gather(key = "YrSi", value = "Dist", c(3:last_col())) %>%
separate(YrSi, c("Year2", "Site2"), sep = " ") %>%
mutate (Year2 = as.numeric (ifelse (Year2 == "99", sprintf("19%s", Year2),sprintf("20%s", Year2)))) %>% # Change to yyyy
arrange(Site, Year, Site2, Year2) %>%
spread (key = "Year2", value = "Dist") %>%
group_by(Site, Site2) %>%
subset (Site == Site2) %>%
relocate (Year, .after = Site2) %>%
gather (key="Year2", value = "dist", c(4:last_col())) %>%
group_split()
现在生成result.df
# Initialize an empty data frame to store the results
result.df <- data.frame(Site = character(), stringsAsFactors = FALSE)
# Loop through each data frame in the list
for(i in 1:length(dist.df.list)) {
# Extract the site name
site <- dist.df.list[[i]]$Site[1]
# Initialize a new row for the site in the result data frame
new.row <- data.frame(Site = site, stringsAsFactors = FALSE)
# Loop through each survey year and extract the distance between consecutive years
for(j in c(2000, 2001, 2006,2008, 2011, 2013, 2016, 2019)) {
col.name <- as.character(j)
if(col.name %in% colnames(dist.df.list[[i]])) {
# Extract the distance value from the test data frame
dist <- dist.df.list[[i]] %>%
filter(Year2 == j) %>%
select(dist) %>%
pull()
# If the distance value is missing, set it to NA
if(is.na(dist)) {
new.row[[col.name]] <- NA
} else {
# Otherwise, add the distance value to the new row
new.row[[col.name]] <- dist
}
} else {
# If the distance column doesn't exist, set the value to NA
new.row[[col.name]] <- NA
}
}
# Add the new row to the result data frame
result.df <- rbind(result.df, new.row)
}
result.df
Site 2000 2001 2006 2008 2011 2013 2016 2019
1 A NA NA NA NA NA NA NA NA
2 B NA NA NA NA NA NA NA NA
3 C NA NA NA NA NA NA NA NA
4 D NA NA NA NA NA NA NA NA
5 E NA NA NA NA NA NA NA NA
6 F NA NA NA NA NA NA NA NA
7 G NA NA NA NA NA NA NA NA
8 H NA NA NA NA NA NA NA NA
为什么我得到NA而不是距离?有没有更简单的方法来做到这一点?
2条答案
按热度按时间s71maibg1#
看起来你是想把每年的值和下一年的值进行比较,棘手的是年份是非均匀分布的。处理这个问题的一种方法是将年份视为因子,然后将其转换为数字,因此2001年是
year_index
3,2006年是year_index
4,等等。然后,我们可以将每行连接到具有相同站点和一个较低year_index
的行。这使得方法更短。
从第一个块中的初始
df
开始,然后再对其进行整形:结果(第一行与问题中突出显示的值匹配)
bq3bfh9z2#
在
base R
中,我们可以通过从开始和结束处删除列来减去两个大小相等的数据集,从而轻松地完成此操作