我试图总结出现在"中间" position
s的f
频率数据,即,在第一个和最后一个position
之间。我完成此任务的方法是过滤这些数据,执行summarise
,然后将新数据与过滤它们的 Dataframe 重新连接。这对于训练数据很有效:
library(tidyverse)
df %>%
group_by(rowid) %>%
# summarize frequencies for middle postions:
filter(position != first(position) & position != last(position)) %>%
# summarise:
summarize(across(position),
middle_position = mean(f, na.rm = TRUE),
word = str_c(word, collapse=" ")
) %>%
left_join(df, ., by = c("rowid", "position"))
但是,应用到实际数据时,我得到以下错误消息:
Error in `left_join()`:
! Join columns must be present in data.
✖ Problem with `position`.
Run `rlang::last_error()` to see where the error occurred.
> rlang::last_error()
<error/rlang_error>
Error in `left_join()`:
! Join columns must be present in data.
✖ Problem with `position`.
---
Backtrace:
1. ... %>% left_join(bnc_X, ., by = c("rowid", "position"))
3. dplyr:::left_join.data.frame(bnc_X, ., by = c("rowid", "position"))
Run `rlang::last_trace()` to see the full context.
> rlang::last_trace()
<error/rlang_error>
Error in `left_join()`:
! Join columns must be present in data.
✖ Problem with `position`.
---
Backtrace:
▆
1. ├─... %>% left_join(bnc_X, ., by = c("rowid", "position"))
2. ├─dplyr::left_join(bnc_X, ., by = c("rowid", "position"))
3. └─dplyr:::left_join.data.frame(bnc_X, ., by = c("rowid", "position"))
4. └─dplyr:::join_mutate(...)
5. └─dplyr:::join_cols(...)
6. └─dplyr:::standardise_join_by(...)
7. └─dplyr:::check_join_vars(by$x, x_names, error_call = error_call)
8. └─rlang::abort(bullets, call = error_call)
主要的问题似乎是变量position
-为什么它不能被识别?我已经花了几个小时试图解决这个问题,但不能,并将感谢帮助!
数据:
df <- data.frame(
size = c(3,3,3,
3,3,3,
4,4,4,4,
5,5,5,5,5,
3,3,3),
rowid = c(1,1,1,2,2,2,3,3,3,3,4,4,4,4,4,5,5,5),
turn = c(rep("How are you?",3),
rep("I'm fine.",3),
rep("How's the weather?",4),
rep("It's really very cold.",5),
rep("I love you",3)),
word = c("how","are","you",
"i","'m","fine",
"how","'s","the","weather",
"it","'s","really", "very","cold",
"i","love","you"),
f = c(400,300,250,
600,555,1,
400,500,700,20,
390,500,177,200,35,
600,199,400),
position = c(1,2,3,
1,2,3,
1,2,3,4,
1,2,3,4,5,
1,2,3)
)
1条答案
按热度按时间mwg9r5ms1#
这对我在
data.table
中有效。不需要连接。