我得到了两个不同数据集的混合结果。第一个dput输出工作正常,但第二个不行。
structure(list(ID = c("4116fa25f9789e2ce647d5920e9500b3",
"4116fa25f9789e2ce647d5920e9500b3",
"4116fa25f9789e2ce647d5920e9500b3", "4116fa25f9789e2ce647d5920e9500b3",
"4116fa25f9789e2ce647d5920e9500b3", "4116fa25f9789e2ce647d5920e9500b3",
"4116fa25f9789e2ce647d5920e9500b3", "4116fa25f9789e2ce647d5920e9500b3"
), Home = c("Milwaukee Brewers", "Milwaukee Brewers", "Milwaukee
Brewers",
"Milwaukee Brewers", "Milwaukee Brewers", "Milwaukee Brewers",
"Milwaukee Brewers", "Milwaukee Brewers"), Away = c("Los Angeles
Angels",
"Los Angeles Angels", "Los Angeles Angels", "Los Angeles Angels",
"Los Angeles Angels", "Los Angeles Angels", "Los Angeles Angels",
"Los Angeles Angels"), Team = c("Los Angeles Angels", "Milwaukee
Brewers",
"Los Angeles Angels", "Milwaukee Brewers", "Los Angeles Angels",
"Milwaukee Brewers", "Los Angeles Angels", "Milwaukee Brewers"
), Price = c(-190, 160, -175, 150, -170, 145, -170, 143), Points =
c(1.5,
-1.5, 1.5, -1.5, 1.5, -1.5, 1.5, -1.5)), class = c("grouped_df",
"tbl_df", "tbl", "data.frame"), row.names = c(NA, -8L), groups =
structure(list(
ID = c("4116fa25f9789e2ce647d5920e9500b3",
"4116fa25f9789e2ce647d5920e9500b3"
), Team = c("Los Angeles Angels", "Milwaukee Brewers"), .rows =
structure(list(
c(1L, 3L, 5L, 7L), c(2L, 4L, 6L, 8L)), ptype = integer(0), class =
c("vctrs_list_of",
"vctrs_vctr", "list"))), class = c("tbl_df", "tbl", "data.frame"
), row.names = c(NA, -2L), .drop = TRUE))
这是第二个数据输出
structure(list(ID = c("2f95b45e6f5446c06d55e2eb646da6fd",
"2f95b45e6f5446c06d55e2eb646da6fd",
"2f95b45e6f5446c06d55e2eb646da6fd", "2f95b45e6f5446c06d55e2eb646da6fd",
"2f95b45e6f5446c06d55e2eb646da6fd", "2f95b45e6f5446c06d55e2eb646da6fd",
"2f95b45e6f5446c06d55e2eb646da6fd", "2f95b45e6f5446c06d55e2eb646da6fd"
), Home = c("Baltimore Orioles", "Baltimore Orioles", "Baltimore
Orioles",
"Baltimore Orioles", "Baltimore Orioles", "Baltimore Orioles",
"Baltimore Orioles", "Baltimore Orioles"), Away = c("Toronto Blue Jays",
"Toronto Blue Jays", "Toronto Blue Jays", "Toronto Blue Jays",
"Toronto Blue Jays", "Toronto Blue Jays", "Toronto Blue Jays",
"Toronto Blue Jays"), Team = c("Baltimore Orioles", "Toronto Blue Jays",
"Baltimore Orioles", "Toronto Blue Jays", "Baltimore Orioles",
"Toronto Blue Jays", "Baltimore Orioles", "Toronto Blue Jays"
), Price = c(-175, 145, 155, -180, -170, 145, 158, -190), Points =
c(1.5,
-1.5, -1.5, 1.5, 1.5, -1.5, -1.5, 1.5)), class = c("grouped_df",
"tbl_df", "tbl", "data.frame"), row.names = c(NA, -8L), groups =
structure(list(
ID = c("2f95b45e6f5446c06d55e2eb646da6fd",
"2f95b45e6f5446c06d55e2eb646da6fd"
), Team = c("Baltimore Orioles", "Toronto Blue Jays"), .rows =
structure(list(
c(1L, 3L, 5L, 7L), c(2L, 4L, 6L, 8L)), ptype = integer(0), class =
c("vctrs_list_of",
"vctrs_vctr", "list"))), class = c("tbl_df", "tbl", "data.frame"
), row.names = c(NA, -2L), .drop = TRUE))
我有一个函数,它创建一个最终列,其中的值是使用Price列计算的。
# A tibble: 2 × 7
# Groups: ID [1]
ID Home Away
Team Price Points Value
<chr> <chr> <chr>
<chr> <dbl> <dbl> <dbl>
1 4116fa25f9789e2ce647d5920e9500b3 Milwaukee Brewers Los Angeles Angels
Los Angeles Angels -170 1.5 0.014
2 4116fa25f9789e2ce647d5920e9500b3 Milwaukee Brewers Los Angeles Angels
Milwaukee Brewers 160 -1.5 0.014
当我对第二个dput运行这个函数时,我得到
# A tibble: 2 × 7
# Groups: ID [1]
ID Home Away
Team Price Points Value
<chr> <chr> <chr>
<chr> <dbl> <dbl> <dbl>
1 2f95b45e6f5446c06d55e2eb646da6fd Baltimore Orioles Toronto Blue Jays
Baltimore Orioles 158 -1.5 -0.257
2 2f95b45e6f5446c06d55e2eb646da6fd Baltimore Orioles Toronto Blue Jays
Toronto Blue Jays 145 -1.5 -0.257
下面是我目前使用的语法
df %>%
group_by(ID, Team) %>%
slice_max(Price, with_ties = FALSE) %>%
arrange(ID) %>%
group_by(ID) %>%
mutate(Value = function(Price[1], Price[2]))
第二个dput的最终输出应该有四行,但只显示了两行。问题似乎是这样一个事实,即第二个dput中的每个团队都有正和负的价格,而第一个dput中的每个团队要么是负的,要么是正的。每对价格都应该有一个正和负数,比如dput1。有什么想法吗?
2条答案
按热度按时间ozxc1zmp1#
你可以在Var_D上使用
group_by
,然后用summarise
返回Var_C的max
值,如下所示:创建于2023年3月15日,使用reprex v2.0.2
envsm3lx2#
如果有多个正的或负的Var_D值,这个变体根据它们的符号将它们分组在一起,每个符号只使用一个,如果Var_D是0,我还使用
coalesce
--这里将它与任何正的Var_D值分组。