根据其它列和相关非零值的个数过滤 Dataframe

lokaqttq  于 2024-01-03  发布在  其他
关注(0)|答案(2)|浏览(104)

假设我有以下名为df的 Dataframe :

  1. df<- data.frame("id" = c(1,1,1,2,2,2,3,3,3,4,4,4,5,5),
  2. "relation" =c(1,2,3,1,2,3,1,2,3,1,2,3,1,2),
  3. "salary" =c(20,10,0,30,0,0,10,0,0,40,45,42,15,0))

字符串
我想提取两个数据框架,如果一个家庭是两个收入者或一个收入者。那些家庭成员至少有两个非零工资的家庭被认为是两个收入者,但是那些只有一个非零工资的家庭被认为是一个收入者。我的期望输出如下:

  1. one-earner :
  2. id relation salary
  3. 1 2 1 30
  4. 2 2 2 0
  5. 3 2 3 0
  6. 4 3 1 10
  7. 5 3 2 0
  8. 6 3 3 0
  9. 7 5 1 15
  10. 8 5 2 0
  11. two-earner:
  12. id relation salary
  13. 1 1 1 20
  14. 2 1 2 10
  15. 3 1 3 0
  16. 4 4 1 40
  17. 5 4 2 45
  18. 6 4 3 42


我尝试了下面的代码,但我不知道如何指定不同的非零工资数的家庭:

  1. two_earner <- df %>%
  2. group_by(address) %>%
  3. filter(all(salary >=2 ))
  4. one_earner <- df %>%
  5. group_by(address) %>%
  6. filter(all(salary ==1 ))

hlswsv35

hlswsv351#

  1. library(dplyr)
  2. df |>
  3. mutate(group = ifelse(sum(salary > 0) >= 2, "two-earners", "one-earner"), .by = id) |>
  4. split(~ group)

字符串
我更喜欢使用基R split而不是dR group_split(),因为前者返回组的名称。

输出

这将返回一个命名列表:

  1. $`one-earner`
  2. id relation salary group
  3. 4 2 1 30 one-earner
  4. 5 2 2 0 one-earner
  5. 6 2 3 0 one-earner
  6. 7 3 1 10 one-earner
  7. 8 3 2 0 one-earner
  8. 9 3 3 0 one-earner
  9. 13 5 1 15 one-earner
  10. 14 5 2 0 one-earner
  11. $`two-earners`
  12. id relation salary group
  13. 1 1 1 20 two-earners
  14. 2 1 2 10 two-earners
  15. 3 1 3 0 two-earners
  16. 10 4 1 40 two-earners
  17. 11 4 2 45 two-earners
  18. 12 4 3 42 two-earners

展开查看全部
xkftehaa

xkftehaa2#

我认为如果mean(salary > 0) > 1/2

  1. > split(df, with(df, ave(salary, id, FUN=\(x) mean(x > 0) > 1/2))) |>
  2. + setNames(c('one', 'two'))
  3. $one
  4. id relation salary
  5. 4 2 1 30
  6. 5 2 2 0
  7. 6 2 3 0
  8. 7 3 1 10
  9. 8 3 2 0
  10. 9 3 3 0
  11. 13 5 1 15
  12. 14 5 2 0
  13. $two
  14. id relation salary
  15. 1 1 1 20
  16. 2 1 2 10
  17. 3 1 3 0
  18. 10 4 1 40
  19. 11 4 2 45
  20. 12 4 3 42

字符串

  • 数据类型:*
  1. > dput(df)
  2. structure(list(id = c(1, 1, 1, 2, 2, 2, 3, 3, 3, 4, 4, 4, 5,
  3. 5), relation = c(1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2), salary = c(20,
  4. 10, 0, 30, 0, 0, 10, 0, 0, 40, 45, 42, 15, 0)), class = "data.frame", row.names = c(NA,
  5. -14L))

展开查看全部

相关问题