R:使用第一个和最后一个NA之间的平均值计算NA

5m1hhzi4  于 2023-01-22  发布在  其他
关注(0)|答案(1)|浏览(128)

考虑一个数据集:

df1 <- tibble::tribble(~Place,  ~Year,  ~Cake,  ~Coffee,    ~Tea,
"Local Cafe 1", 2022,   50, 100,    30,
"Local Cafe 1", 2021,   50, NA, 30,
"Local Cafe 1", 2020,   50, 80, NA,
"Local Cafe 1", 2019,   50, 70, 20,
"Local Cafe 1", 2018,   NA, 60, 20,
"Local Cafe 2", 2022,   60, NA, 40,
"Local Cafe 2", 2021,   NA, 50, NA,
"Local Cafe 2", 2020,   40, 40, NA,
"Local Cafe 2", 2019,   30, 30, NA,
"Local Cafe 3", 2022,   30, 40, NA,
"Local Cafe 3", 2021,   NA, NA, NA)

以下是可视化表示的相同数据集:

预期行动:
1.如果总和介于之间,则取平均值(例如:序列30,NA,40 =〉NA〈-35(30 + 40)/2
1.如果差距大于1,则相等增长(例如:序列30,NA,NA,60 =〉30,40,50,60。
1.如果NA是第一个(例如2021的值是100,并且2022-NA),则如果可用则排序(例如10,20,30,NA =〉10,20,30,40;或者,如果没有序列数据,则最终值可用(例如:30,不适用=〉30,30)

  • 如果NA是最后一个,则逻辑相同:(不适用,10,20,30 =〉0,10,20,30;或如果NA,20 =〉20,20

所需输出:

橙色单元格-是填充的值。我将感谢任何建议的一个很好的解决方案:)

svmlkihl

svmlkihl1#

我们可以按"地点"分组,在列"Cake"到"Tea"上应用na.approx(从zoo),并使用之前的非NA值在末尾应用fill NA

library(dplyr)
library(tidyr)
library(zoo)
df1 %>%
   group_by(Place) %>% 
   mutate(across(Cake:Tea, ~  na.approx(.x, na.rm = FALSE))) %>% 
   fill(Cake:Tea, .direction = "downup") %>% 
   ungroup
  • 输出
# A tibble: 11 × 5
   Place         Year  Cake Coffee   Tea
   <chr>        <dbl> <dbl>  <dbl> <dbl>
 1 Local Cafe 1  2022    50    100    30
 2 Local Cafe 1  2021    50     90    30
 3 Local Cafe 1  2020    50     80    25
 4 Local Cafe 1  2019    50     70    20
 5 Local Cafe 1  2018    50     60    20
 6 Local Cafe 2  2022    60     50    40
 7 Local Cafe 2  2021    50     50    40
 8 Local Cafe 2  2020    40     40    40
 9 Local Cafe 2  2019    30     30    40
10 Local Cafe 3  2022    30     40    NA
11 Local Cafe 3  2021    30     40    NA

相关问题