如何使用dplyr过滤掉数据框中的特定行?

wztqucjr  于 2022-12-30  发布在  其他
关注(0)|答案(1)|浏览(165)

我有几个工作表,我从excel导入。虽然这些工作表是相似的,但有一些差异,由于手动输入。我试图过滤出行,有“总计”和任何超出该行。我的逻辑工作的df1和df3,但我不知道如何让它工作的df2。有人能帮忙吗?

df1<-structure(list(...1 = structure(c(1630022400, 1630108800, 1630195200, 
1630281600, 1630368000, NA), tzone = "UTC", class = c("POSIXct", 
"POSIXt")), `Vinayak Trading` = c(1984.31, NA, NA, NA, NA, 2916.17
)), row.names = c(NA, -6L), class = c("tbl_df", "tbl", "data.frame"
))

df2<-structure(list(...1 = c("44526", "44527", "44528", "44529", "44530", 
"Total"), `Vinayak Trading` = c(NA, NA, NA, NA, NA, 0)), row.names = c(NA, 
-6L), class = c("tbl_df", "tbl", "data.frame"))

df3<-structure(list(...1 = c("44680", "44681", NA, "Total", NA, NA
), `Vinayak Trading` = c(NA, NA, NA, 2736.42, NA, NA)), row.names = c(NA, 
-6L), class = c("tbl_df", "tbl", "data.frame"))

transform <- function(df) {
  names(df)[1] <- "Date"
  df <- df %>% 
  filter(row_number() < which(is.na(Date))) %>%  #To tackle sheets where Total is not present
  filter(row_number() < which(Date=="Total")) %>% #To remove Total in sheets where it is present
  select(-Total) #To remove the Total column 
}

  df1 <- transform(df1)# Desired reults
  df2 <- transform(df2)# Error due to no NAs - don't know how to handle
  df3 <- transform(df3)#Desired result with warning
vsaztqbk

vsaztqbk1#

我们可以使用dplyr::cumany来删除"Total"(或NA)以及其他任何内容。

transform2 <- function(df) {
  df %>%
    rename(Date = 1) %>%
    filter(!cumany(Date %in% c("Total", NA))) %>%
    select(-any_of("Total"))
}
 
transform2(df1)
# # A tibble: 5 × 2
#   Date                `Vinayak Trading`
#   <dttm>                          <dbl>
# 1 2021-08-27 00:00:00             1984.
# 2 2021-08-28 00:00:00               NA 
# 3 2021-08-29 00:00:00               NA 
# 4 2021-08-30 00:00:00               NA 
# 5 2021-08-31 00:00:00               NA 
transform2(df2)
# # A tibble: 5 × 2
#   Date  `Vinayak Trading`
#   <chr>             <dbl>
# 1 44526                NA
# 2 44527                NA
# 3 44528                NA
# 4 44529                NA
# 5 44530                NA
transform2(df3)
# # A tibble: 2 × 2
#   Date  `Vinayak Trading`
#   <chr>             <dbl>
# 1 44680                NA
# 2 44681                NA
  • 我们可以使用rename(Date = 1)作为names(df)[1] <- "Date"的内联替换,它看起来更像管道;
  • == NA不返回真/假,但%in% NA * 返回 *;我们可以使用is.na(Date) | Date == "Total",或者我们可以使用Date %in% c("Total", NA),其结果是可以预期的;
  • cumany是“累积任意”,意味着当一个值返回真时,那么所有后续值也将为真,参见cumany(c(F,T,F));其反义词见cumall(c(T,F,T));对于以R为底数的等价物,cumany使用cumsum(cond) > 0,cumall使用cumsum(!cond) == 0;以及
  • 我使用select(-any_of("Total")),因为如果列存在,它将删除该列,否则不执行任何操作(您的示例数据中没有包含它,所以我认为最好是安全的)。

相关问题