我有几个工作表,我从excel导入。虽然这些工作表是相似的,但有一些差异,由于手动输入。我试图过滤出行,有“总计”和任何超出该行。我的逻辑工作的df1和df3,但我不知道如何让它工作的df2。有人能帮忙吗?
df1<-structure(list(...1 = structure(c(1630022400, 1630108800, 1630195200,
1630281600, 1630368000, NA), tzone = "UTC", class = c("POSIXct",
"POSIXt")), `Vinayak Trading` = c(1984.31, NA, NA, NA, NA, 2916.17
)), row.names = c(NA, -6L), class = c("tbl_df", "tbl", "data.frame"
))
df2<-structure(list(...1 = c("44526", "44527", "44528", "44529", "44530",
"Total"), `Vinayak Trading` = c(NA, NA, NA, NA, NA, 0)), row.names = c(NA,
-6L), class = c("tbl_df", "tbl", "data.frame"))
df3<-structure(list(...1 = c("44680", "44681", NA, "Total", NA, NA
), `Vinayak Trading` = c(NA, NA, NA, 2736.42, NA, NA)), row.names = c(NA,
-6L), class = c("tbl_df", "tbl", "data.frame"))
transform <- function(df) {
names(df)[1] <- "Date"
df <- df %>%
filter(row_number() < which(is.na(Date))) %>% #To tackle sheets where Total is not present
filter(row_number() < which(Date=="Total")) %>% #To remove Total in sheets where it is present
select(-Total) #To remove the Total column
}
df1 <- transform(df1)# Desired reults
df2 <- transform(df2)# Error due to no NAs - don't know how to handle
df3 <- transform(df3)#Desired result with warning
1条答案
按热度按时间vsaztqbk1#
我们可以使用
dplyr::cumany
来删除"Total"
(或NA)以及其他任何内容。rename(Date = 1)
作为names(df)[1] <- "Date"
的内联替换,它看起来更像管道;== NA
不返回真/假,但%in% NA
* 返回 *;我们可以使用is.na(Date) | Date == "Total"
,或者我们可以使用Date %in% c("Total", NA)
,其结果是可以预期的;cumany
是“累积任意”,意味着当一个值返回真时,那么所有后续值也将为真,参见cumany(c(F,T,F))
;其反义词见cumall(c(T,F,T))
;对于以R为底数的等价物,cumany使用cumsum(cond) > 0
,cumall使用cumsum(!cond) == 0
;以及select(-any_of("Total"))
,因为如果列存在,它将删除该列,否则不执行任何操作(您的示例数据中没有包含它,所以我认为最好是安全的)。