将NA值替换为R中所有方向的第一个值

nmpmafwu 于 2023-06-19 发布在其他

关注(0)|答案(2)|浏览(97)

我希望用最近日期的非NA值填充表中的缺失值，无论是参考日期之前还是之后。这意味着一个表，如：

date         value
03.03.2023        1
04.03.2023       NA
06.03.2023        4
09.03.2023       NA 
10.03.2023        3

填写为：

date         value
03.03.2023        1
04.03.2023        1
06.03.2023        4
09.03.2023        3 
11.03.2023        3

说明：从03.03开始。更接近04.03，使用LOCF。从11.03开始。则使用nocb（locf，fromLast = T）。
如果NA值两侧有两个到参考日期距离相等的值，则可能发生最终冲突。在这种情况下，我希望locf是首选。
我目前的代码使用了两次僵硬的“locf”（一次作为标准，一次作为fromLast），并且没有那么灵活：

read.csv("path/to/merged_data.csv",
         colClasses = c("Date", "numeric", "numeric", "numeric", "character")) %>%
  group_by(field_id) %>%
  arrange(date) %>%
  mutate(
    Nearest_l8_locf = ifelse(!is.na(NDVI_l7) & is.na(NDVI_l8), na.locf(NDVI_l8), NDVI_l8),
    Nearest_s2_locf = ifelse(!is.na(NDVI_l7) & is.na(NDVI_s2), na.locf(NDVI_s2), NDVI_s2),
    Nearest_l8_locb = ifelse(!is.na(NDVI_l7) & is.na(NDVI_l8), na.locf(NDVI_l8, fromLast = TRUE), NDVI_l8),
    Nearest_s2_locb = ifelse(!is.na(NDVI_l7) & is.na(NDVI_s2), na.locf(NDVI_s2, fromLast = TRUE), NDVI_s2)
  ) %>%
  filter(!is.na(NDVI_l7)) %>%
  select(-NDVI_l8, -NDVI_s2) %>%
  relocate(field_id, .after = last_col()) %>%
  write_csv(file.path(results, "merged_data_interpolated.csv"))

在我的实际情况中，参考日期是列（NDVI_17）不是NA的所有日期，并且填充NA的过程针对另外两个列（NDVI_18和NDVI_s2）完成。它还按列“field_id”分组，因为这些ID中的每个ID的日期都是重复的。
如何调整代码，以便NA值填充最接近日期的值，而不管它在列中的哪个位置？

来源：https://stackoverflow.com/questions/76454103/replace-na-values-with-the-first-value-in-all-directions-in-r

2条答案

按热度按时间

qqrboqgw1#

我已经设法写了一个我需要的函数。

# Function to find the nearest value to a given date

find_nearest_value <- function(x, target_date) {
  if (length(which(!is.na(x))) == 0) {
    return(NA)
  }
  idx <- max(which(!is.na(x) & !is.na(target_date) & target_date >= x))
  if (is.na(idx)) {
    idx <- min(which(!is.na(x) & !is.na(target_date) & target_date <= x))
  }
  return(x[idx])
}

# Apply function
read.csv("path/to/merged_data.csv",
         colClasses = c("Date", "numeric", "numeric", "numeric", "character")) %>%
  group_by(field_id, year = lubridate::year(date)) %>%
  arrange(date) %>%
  mutate(
    Nearest_l8 = ifelse(!is.na(NDVI_l7) & is.na(NDVI_l8), find_nearest_value(NDVI_l8, date), NDVI_l8),
    Nearest_s2 = ifelse(!is.na(NDVI_l7) & is.na(NDVI_s2), find_nearest_value(NDVI_s2, date), NDVI_s2),
  ) %>%
  ungroup() %>%
  filter(!is.na(NDVI_l7)) %>%
  select(-NDVI_l8, -NDVI_s2, -year) %>%
  relocate(field_id, .after = last_col()) %>%
  write_csv(file.path(results, "merged_data_function_year.csv"))

请注意，我在代码中添加了一个按年份分组的额外步骤。这是因为结果值不是正态分布。我的具体数据受季节性影响，仅适用于4月至7月。将该职能的运作限制在一年内解决了这个问题。

赞(0）回复(0）举报 2023-06-19

yzuktlbb2#

仅具有碱基R的另一变体：
d是您的示例数据：

d <- structure(list(date = structure(c(19419, 19420, 19422, 19425, 
19426), class = "Date"), value = c(1L, NA, 4L, NA, 3L)), row.names = c(NA, 
5L), class = "data.frame")

将列日期转换为Date类：

d$date <- as.Date(d$date, '%d.%m.%Y')

利用dist ance函数找到最近的邻居：

impute_from_neighbours <- function(values, dates){
  dists <- dist(dates) |> as.matrix()
  dists[dists == 0] <- NA
  na_pos <- which(is.na(values))
  closest_non_na_pos <- apply(dists[, na_pos], 2, which.min)
  values[na_pos] <- values[closest_non_na_pos]
  values
}

d$value <- impute_from_neighbours(d$value, d$date)

输出：

> d
        date value
1 2023-03-03     1
2 2023-03-04     1
3 2023-03-06     4
4 2023-03-09     3
5 2023-03-10     3

赞(0）回复(0）举报 2023-06-19

我来回答

将NA值替换为R中所有方向的第一个值

2条答案

相关问题

热门标签

最新问答