如何使用dplyr过滤掉数据框中的特定行？

wztqucjr 于 2022-12-30 发布在其他

关注(0)|答案(1)|浏览(165)

我有几个工作表，我从excel导入。虽然这些工作表是相似的，但有一些差异，由于手动输入。我试图过滤出行，有“总计”和任何超出该行。我的逻辑工作的df1和df3，但我不知道如何让它工作的df2。有人能帮忙吗？

df1<-structure(list(...1 = structure(c(1630022400, 1630108800, 1630195200, 
1630281600, 1630368000, NA), tzone = "UTC", class = c("POSIXct", 
"POSIXt")), `Vinayak Trading` = c(1984.31, NA, NA, NA, NA, 2916.17
)), row.names = c(NA, -6L), class = c("tbl_df", "tbl", "data.frame"
))

df2<-structure(list(...1 = c("44526", "44527", "44528", "44529", "44530", 
"Total"), `Vinayak Trading` = c(NA, NA, NA, NA, NA, 0)), row.names = c(NA, 
-6L), class = c("tbl_df", "tbl", "data.frame"))

df3<-structure(list(...1 = c("44680", "44681", NA, "Total", NA, NA
), `Vinayak Trading` = c(NA, NA, NA, 2736.42, NA, NA)), row.names = c(NA, 
-6L), class = c("tbl_df", "tbl", "data.frame"))

transform <- function(df) {
  names(df)[1] <- "Date"
  df <- df %>% 
  filter(row_number() < which(is.na(Date))) %>%  #To tackle sheets where Total is not present
  filter(row_number() < which(Date=="Total")) %>% #To remove Total in sheets where it is present
  select(-Total) #To remove the Total column 
}

  df1 <- transform(df1)# Desired reults
  df2 <- transform(df2)# Error due to no NAs - don't know how to handle
  df3 <- transform(df3)#Desired result with warning

r

来源：https://stackoverflow.com/questions/74940302/how-to-filter-out-specific-rows-in-a-data-frame-using-dplyr

1条答案

按热度按时间

vsaztqbk1#

我们可以使用dplyr::cumany来删除"Total"（或NA）以及其他任何内容。

transform2 <- function(df) {
  df %>%
    rename(Date = 1) %>%
    filter(!cumany(Date %in% c("Total", NA))) %>%
    select(-any_of("Total"))
}
 
transform2(df1)
# # A tibble: 5 × 2
#   Date                `Vinayak Trading`
#   <dttm>                          <dbl>
# 1 2021-08-27 00:00:00             1984.
# 2 2021-08-28 00:00:00               NA 
# 3 2021-08-29 00:00:00               NA 
# 4 2021-08-30 00:00:00               NA 
# 5 2021-08-31 00:00:00               NA 
transform2(df2)
# # A tibble: 5 × 2
#   Date  `Vinayak Trading`
#   <chr>             <dbl>
# 1 44526                NA
# 2 44527                NA
# 3 44528                NA
# 4 44529                NA
# 5 44530                NA
transform2(df3)
# # A tibble: 2 × 2
#   Date  `Vinayak Trading`
#   <chr>             <dbl>
# 1 44680                NA
# 2 44681                NA

我们可以使用rename(Date = 1)作为names(df)[1] <- "Date"的内联替换，它看起来更像管道;
== NA不返回真/假，但%in% NA * 返回 *;我们可以使用is.na(Date) | Date == "Total"，或者我们可以使用Date %in% c("Total", NA)，其结果是可以预期的;
cumany是“累积任意”，意味着当一个值返回真时，那么所有后续值也将为真，参见cumany(c(F,T,F));其反义词见cumall(c(T,F,T));对于以R为底数的等价物，cumany使用cumsum(cond) > 0，cumall使用cumsum(!cond) == 0;以及
我使用select(-any_of("Total"))，因为如果列存在，它将删除该列，否则不执行任何操作（您的示例数据中没有包含它，所以我认为最好是安全的）。

赞(0）回复(0）举报 2022-12-30

我来回答

如何使用dplyr过滤掉数据框中的特定行？

1条答案

相关问题

热门标签

最新问答