Summarise + case_when with n()

n53p2ov0  于 2023-09-27  发布在  其他
关注(0)|答案(3)|浏览(78)

我想知道我做错了什么。
我尝试使用case_when()summarise()来获得每个id的摘要,具体取决于每个id的行数。

library(dplyr, warn.conflicts = F)
mock <- tibble::tribble(~id, ~name, ~year,
                1, "xy", 2022,
                1, "xyz", 2021,
                2, "aaa", NA,
                3, "xaa", 2021)

mock %>% 
  group_by(id) %>% 
  summarise(
    condition = case_when(
      n() > 1 ~ "problem",
      .default = NA_character_
    ),
    name2 = case_when(
      n() == 1 ~ name,
      .default = NA_character_
    )
  )
#> Warning: Returning more (or less) than 1 row per `summarise()` group was deprecated in
#> dplyr 1.1.0.
#> ℹ Please use `reframe()` instead.
#> ℹ When switching from `summarise()` to `reframe()`, remember that `reframe()`
#>   always returns an ungrouped data frame and adjust accordingly.
#> Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
#> generated.
#> `summarise()` has grouped output by 'id'. You can override using the `.groups`
#> argument.
#> # A tibble: 4 × 3
#> # Groups:   id [3]
#>      id condition name2
#>   <dbl> <chr>     <chr>
#> 1     1 problem   <NA> 
#> 2     1 problem   <NA> 
#> 3     2 <NA>      aaa  
#> 4     3 <NA>      xaa

创建于2023-09-09带有reprex v2.0.2
但我只想有:

#> # A tibble: 3 × 3
#>      id condition name2
#>   <dbl> <chr>     <chr>
#> 2     1 problem   <NA> 
#> 3     2 <NA>      aaa  
#> 4     3 <NA>      xaa
oo7oh9g9

oo7oh9g91#

case_when用于向下迭代一个列,并基于其他列中的现有值创建一个新的向量。这不是你想做的您正尝试根据组大小有条件地选择单个输出,组大小始终为长度为1的整数。实际上,n()的值被回收到与组大小相同长度的向量中。如果希望summarize的输出长度为1,则应使用ifelse,而不是case_whenif_else

mock %>% 
  group_by(id) %>% 
  summarize(
    condition = if(n() > 1) 'problem' else NA_character_, 
    name2     = if(n() == 1) name else NA_character_
  )
#> # A tibble: 3 x 3
#>      id condition name2
#>   <dbl> <chr>     <chr>
#> 1     1 problem   <NA> 
#> 2     2 <NA>      aaa  
#> 3     3 <NA>      xaa

创建于2023-09-09带有reprex v2.0.2

zengzsys

zengzsys2#

你可以这样使用case_when
使用first()[1]将克服@Allan卡梅隆解释的问题

library(dplyr)

mock %>% 
  group_by(id) %>% 
  summarise(
    condition = case_when(
      n() > 1 ~ "problem",
      TRUE ~ NA_character_
    ),
    name2 = case_when(
      # n() == 1 ~ name[1],
      n() == 1 ~ first(name),
      TRUE ~ NA_character_
    ),
    .groups = 'drop'
  )

   id condition name2
  <dbl> <chr>     <chr>
1     1 problem   NA   
2     2 NA        aaa  
3     3 NA        xaa
wko9yo5t

wko9yo5t3#

试试这个

within(mock, {
  condition <- ave(name, id, FUN=\(x) switch(length(unique(x)), NA, 'problem'))
  name1 <- replace(name, !is.na(condition), NA)
  rm(name, year)
  }) |> unique()
#   id name1 condition
# 1  1  <NA>   problem
# 3  2   aaa      <NA>
# 4  3   xaa      <NA>
  • 数据:*
mock <- structure(list(id = c(1, 1, 2, 3), name = c("xy", "xyz", "aaa", 
"xaa"), year = c(2022, 2021, NA, 2021)), row.names = c(NA, -4L
), class = "data.frame")

相关问题