R语言 基于年份列对数据框进行子集化,最近值位于另一年份列下方,最远值位于另一年份列上方

jdzmm42g  于 2022-12-20  发布在  其他
关注(0)|答案(1)|浏览(154)

基于PipeYear的子集 Dataframe ,最近值低于PropertyYearBuilt,最远值高于PropertyYearBuilt,使用以下R代码:

  1. df <- read.table(text="
  2. PipeID PricePipe PipeYear PropertyYearBuilt Distance_to_property
  3. a 500 2010 2013 1.5
  4. b 600 2007 2008 2.5
  5. c 700 2009 2008 3.0
  6. d 800 1998 2000 4.2
  7. e 900 2003 2000 4.5
  8. f 200 2014 2013 5.0
  9. g 100 2011 2013 5.5
  10. h 850 2018 2008 7.0", header = TRUE)

谢谢!

xxls0lw8

xxls0lw81#

答案与我发布的here类似(如果你有最新的dplyr更新),但这次最上面的只是按PropertyID分组的max

  1. library(tidyverse)
  2. df <- read.table(text="
  3. PipeID PricePipe PipeYear PropertyYearBuilt Distance_to_property
  4. a 500 2010 2013 1.5
  5. b 600 2007 2008 2.5
  6. c 700 2009 2008 3.0
  7. d 800 1998 2000 4.2
  8. e 900 2003 2000 4.5
  9. f 200 2014 2013 5.0
  10. g 100 2011 2013 5.5
  11. h 850 2018 2008 7.0", header = TRUE) |>
  12. mutate(PropertyID = as.numeric(as.factor(PropertyYearBuilt)))
  13. bind_rows(
  14. df |>
  15. select(PropertyYearBuilt, PropertyID) |>
  16. unique() |>
  17. left_join(
  18. df |> select(-PropertyYearBuilt),
  19. join_by(PropertyID, closest(PropertyYearBuilt >= PipeYear))
  20. ),
  21. df |>
  22. group_by(PropertyYearBuilt) |>
  23. filter(PipeYear == max(PipeYear))
  24. ) |>
  25. arrange(PropertyID, PipeYear)
  26. #> PropertyYearBuilt PropertyID PipeID PricePipe PipeYear Distance_to_property
  27. #> 1 2000 1 d 800 1998 4.2
  28. #> 2 2000 1 e 900 2003 4.5
  29. #> 3 2008 2 b 600 2007 2.5
  30. #> 4 2008 2 h 850 2018 7.0
  31. #> 5 2013 3 g 100 2011 5.5
  32. #> 6 2013 3 f 200 2014 5.0
展开查看全部

相关问题