R语言 如何根据一列中的正值或负值从另一列中提取值?

dy1byipe  于 2023-03-20  发布在  其他
关注(0)|答案(2)|浏览(155)

我得到了两个不同数据集的混合结果。第一个dput输出工作正常,但第二个不行。

structure(list(ID = c("4116fa25f9789e2ce647d5920e9500b3", 
"4116fa25f9789e2ce647d5920e9500b3", 
"4116fa25f9789e2ce647d5920e9500b3", "4116fa25f9789e2ce647d5920e9500b3", 
"4116fa25f9789e2ce647d5920e9500b3", "4116fa25f9789e2ce647d5920e9500b3", 
"4116fa25f9789e2ce647d5920e9500b3", "4116fa25f9789e2ce647d5920e9500b3"
), Home = c("Milwaukee Brewers", "Milwaukee Brewers", "Milwaukee 
Brewers", 
"Milwaukee Brewers", "Milwaukee Brewers", "Milwaukee Brewers", 
"Milwaukee Brewers", "Milwaukee Brewers"), Away = c("Los Angeles 
Angels", 
"Los Angeles Angels", "Los Angeles Angels", "Los Angeles Angels", 
"Los Angeles Angels", "Los Angeles Angels", "Los Angeles Angels", 
"Los Angeles Angels"), Team = c("Los Angeles Angels", "Milwaukee 
Brewers", 
"Los Angeles Angels", "Milwaukee Brewers", "Los Angeles Angels", 
"Milwaukee Brewers", "Los Angeles Angels", "Milwaukee Brewers"
), Price = c(-190, 160, -175, 150, -170, 145, -170, 143), Points = 
c(1.5, 
-1.5, 1.5, -1.5, 1.5, -1.5, 1.5, -1.5)), class = c("grouped_df", 
"tbl_df", "tbl", "data.frame"), row.names = c(NA, -8L), groups = 
structure(list(
ID = c("4116fa25f9789e2ce647d5920e9500b3", 
"4116fa25f9789e2ce647d5920e9500b3"
), Team = c("Los Angeles Angels", "Milwaukee Brewers"), .rows = 
structure(list(
    c(1L, 3L, 5L, 7L), c(2L, 4L, 6L, 8L)), ptype = integer(0), class = 
c("vctrs_list_of", 
"vctrs_vctr", "list"))), class = c("tbl_df", "tbl", "data.frame"
), row.names = c(NA, -2L), .drop = TRUE))

这是第二个数据输出

structure(list(ID = c("2f95b45e6f5446c06d55e2eb646da6fd", 
"2f95b45e6f5446c06d55e2eb646da6fd", 
"2f95b45e6f5446c06d55e2eb646da6fd", "2f95b45e6f5446c06d55e2eb646da6fd", 
"2f95b45e6f5446c06d55e2eb646da6fd", "2f95b45e6f5446c06d55e2eb646da6fd", 
"2f95b45e6f5446c06d55e2eb646da6fd", "2f95b45e6f5446c06d55e2eb646da6fd"
), Home = c("Baltimore Orioles", "Baltimore Orioles", "Baltimore 
Orioles", 
"Baltimore Orioles", "Baltimore Orioles", "Baltimore Orioles", 
"Baltimore Orioles", "Baltimore Orioles"), Away = c("Toronto Blue Jays", 
"Toronto Blue Jays", "Toronto Blue Jays", "Toronto Blue Jays", 
"Toronto Blue Jays", "Toronto Blue Jays", "Toronto Blue Jays", 
"Toronto Blue Jays"), Team = c("Baltimore Orioles", "Toronto Blue Jays", 
"Baltimore Orioles", "Toronto Blue Jays", "Baltimore Orioles", 
"Toronto Blue Jays", "Baltimore Orioles", "Toronto Blue Jays"
), Price = c(-175, 145, 155, -180, -170, 145, 158, -190), Points = 
c(1.5, 
-1.5, -1.5, 1.5, 1.5, -1.5, -1.5, 1.5)), class = c("grouped_df", 
"tbl_df", "tbl", "data.frame"), row.names = c(NA, -8L), groups = 
structure(list(
ID = c("2f95b45e6f5446c06d55e2eb646da6fd", 
"2f95b45e6f5446c06d55e2eb646da6fd"
), Team = c("Baltimore Orioles", "Toronto Blue Jays"), .rows = 
structure(list(
    c(1L, 3L, 5L, 7L), c(2L, 4L, 6L, 8L)), ptype = integer(0), class = 
c("vctrs_list_of", 
"vctrs_vctr", "list"))), class = c("tbl_df", "tbl", "data.frame"
), row.names = c(NA, -2L), .drop = TRUE))

我有一个函数,它创建一个最终列,其中的值是使用Price列计算的。

# A tibble: 2 × 7
# Groups:   ID [1]
ID                               Home              Away               
Team               Price Points Value
<chr>                            <chr>             <chr>              
<chr>              <dbl>  <dbl> <dbl>
1 4116fa25f9789e2ce647d5920e9500b3 Milwaukee Brewers Los Angeles Angels 
Los Angeles Angels  -170    1.5 0.014
2 4116fa25f9789e2ce647d5920e9500b3 Milwaukee Brewers Los Angeles Angels 
Milwaukee Brewers    160   -1.5 0.014

当我对第二个dput运行这个函数时,我得到

# A tibble: 2 × 7
# Groups:   ID [1]
ID                               Home              Away              
Team              Price Points  Value
<chr>                            <chr>             <chr>             
<chr>             <dbl>  <dbl>  <dbl>
1 2f95b45e6f5446c06d55e2eb646da6fd Baltimore Orioles Toronto Blue Jays 
Baltimore Orioles   158   -1.5 -0.257
2 2f95b45e6f5446c06d55e2eb646da6fd Baltimore Orioles Toronto Blue Jays 
Toronto Blue Jays   145   -1.5 -0.257

下面是我目前使用的语法

df %>%
group_by(ID, Team) %>% 
    slice_max(Price, with_ties = FALSE) %>% 
    arrange(ID) %>% 
    group_by(ID) %>% 
    mutate(Value = function(Price[1], Price[2]))

第二个dput的最终输出应该有四行,但只显示了两行。问题似乎是这样一个事实,即第二个dput中的每个团队都有正和负的价格,而第一个dput中的每个团队要么是负的,要么是正的。每对价格都应该有一个正和负数,比如dput1。有什么想法吗?

ozxc1zmp

ozxc1zmp1#

你可以在Var_D上使用group_by,然后用summarise返回Var_C的max值,如下所示:

library(dplyr)
df %>%
  group_by(Var_D) %>%
  summarise(max_value = max(Var_C))
#> # A tibble: 2 × 2
#>   Var_D max_value
#>   <dbl>     <int>
#> 1  -1.5       165
#> 2   1.5      -165

创建于2023年3月15日,使用reprex v2.0.2

envsm3lx

envsm3lx2#

如果有多个正的或负的Var_D值,这个变体根据它们的符号将它们分组在一起,每个符号只使用一个,如果Var_D是0,我还使用coalesce--这里将它与任何正的Var_D值分组。

df %>%
  group_by(sign = coalesce(Var_D/abs(Var_D), 1)) %>%
  slice_max(Var_C, n = 1) %>%
  ungroup()

相关问题