R语言 过滤掉特定条件之前的行

nzkunb0c  于 2023-06-27  发布在  其他
关注(0)|答案(6)|浏览(118)

我有以下数据集:

ID<-rep(c("A","B"),times=c(3,4))
Departure<-c("TRUE","FALSE","TRUE","TRUE","FALSE","FALSE","TRUE")
Date<-c("Jan 1","Jan 2","Jan 3","Jan 1","Jan 2","Jan 3","Jan 4")

data<-data.frame(ID,Departure,Date)
data

ID Departure  Date
A      TRUE Jan 1
A     FALSE Jan 2
A      TRUE Jan 3
B      TRUE Jan 1
B     FALSE Jan 2
B     FALSE Jan 3
B      TRUE Jan 4

我想使用以下两个条件来设置此数据集的子集:
1.数据集必须只保留之后的行,包括“出发”列中最后一个FALSE值。
1.每个ID必须满足条件1。
生成的数据集如下所示:

ID Departure  Date

A     FALSE Jan 2
A      TRUE Jan 3
B     FALSE Jan 3
B      TRUE Jan 4

关于如何做到这一点,最好使用dplyr的任何建议?

slhcrj9b

slhcrj9b1#

碱R

data[
  ave(as.logical(data$Departure),data$ID,FUN=function(x){
    t=cumsum(!x)
    t==max(t)
  }),
]

  ID Departure  Date
2  A     FALSE Jan 2
3  A      TRUE Jan 3
6  B     FALSE Jan 3
7  B      TRUE Jan 4
ne5o7dgx

ne5o7dgx2#

使用data.table的另一种方法

注意我准备了你的数据,使用逻辑TRUE/FALSE,我使用日期而不是月日字符串。
数据

ID <- rep(c("A","B"),times=c(3,4))
Departure <- as.logical(c("TRUE","FALSE","TRUE","TRUE","FALSE","FALSE","TRUE"))
Date <- lubridate::parse_date_time(c("Jan 1","Jan 2","Jan 3","Jan 1","Jan 2","Jan 3","Jan 4"), "md")

data <- data.frame(ID, Departure, Date)

解决方案

首先,我们为每个ID提取您最近的FALSE记录的子集,然后将这些结果与最近日期的所有记录连接起来。

library(data.table)
setDT(data)

results <- data[!Departure, .SD[.N], ID][data, .(ID, Departure = i.Departure, Date), on = .(ID, Date <= Date), nomatch = 0]

结果

results

   ID Departure       Date
1:  A     FALSE 2023-01-02
2:  A      TRUE 2023-01-03
3:  B     FALSE 2023-01-03
4:  B      TRUE 2023-01-04

扩展

如果您真的想获得原来的日期格式,只需在之后再次格式化它们

results[, Date := format(Date, "%b %d")]

results

   ID Departure   Date
1:  A     FALSE Jan 02
2:  A      TRUE Jan 03
3:  B     FALSE Jan 03
4:  B      TRUE Jan 04
chy5wohz

chy5wohz3#

这是一个有趣的挑战!我相信你可以想出更聪明的方法,但这是我的方法(使用dplyr):

group_by(data, ID) %>% mutate(row=1:n(), 
                              rowF=ifelse(Departure==F, row, NA), 
                              maxF=max(rowF, na.rm=T)) %>% 
  filter(row >= maxF) %>% 
  select(-c(row:maxF)) %>%  
  # remove the above line if you want to see how the sausage was made
  ungroup()

输出:

# A tibble: 4 x 3
  ID    Departure Date 
  <chr> <chr>     <chr>
1 A     FALSE     Jan 2
2 A     TRUE      Jan 3
3 B     FALSE     Jan 3
4 B     TRUE      Jan 4
jyztefdp

jyztefdp4#

您可以检查取反的Departure列的总和是否等于取反的Departure列的cumsum。请注意,示例数据中的此列是字符列,需要更改为逻辑列:

library(dplyr)

data |>
  mutate(Departure = as.logical(Departure)) |>
  filter(sum(!Departure) == cumsum(!Departure) & any(!Departure), .by = ID)

  ID Departure  Date
1  A     FALSE Jan 2
2  A      TRUE Jan 3
3  B     FALSE Jan 3
4  B      TRUE Jan 4

我不确定是否需要在过滤条件中使用& any(!Departure),但这将确保只保留包含FALSE值的组。

vkc1a9a2

vkc1a9a25#

这适用于您提供的数据,尽管我不确定它是否适用于其他示例(我不清楚ID周围的规则-所以您可能想检查一下,但它应该可以工作:

df %>% 
mutate(Departure = ifelse(Departure == "TRUE", TRUE, FALSE)) %>% # change the Departure column to logical, not character
filter((Departure == TRUE & lag(Departure) == FALSE & ID == lag(ID)) | (Departure == FALSE & lead(Departure) == TRUE & lead(Departure) != FALSE) & ID == lead(ID))
# if the departure is TRUE and the previous departure is FALSE and the ID is the same as the previous ID, 
# or 
# if the departure is FALSE and the next departure is TRUE and the next departure is not FALSE and the ID is the same as the next ID, #then keep the row

# A tibble: 4 × 3
  ID    Departure Date 
  <chr> <lgl>     <chr>
1 A     FALSE     Jan 2
2 A     TRUE      Jan 3
3 B     FALSE     Jan 3
4 B     TRUE      Jan 4
cfh9epnr

cfh9epnr6#

使用rev重置rleby ID。

by(data, data$ID, \(x) tail(x, with(rle(rev(x$Departure)), which.max(values == 'FALSE')))) |>
  do.call(what='rbind')
#     ID Departure  Date
# A.2  A     FALSE Jan 2
# A.3  A      TRUE Jan 3
# B.6  B     FALSE Jan 3
# B.7  B      TRUE Jan 4

注意,如果Departure列为真布尔值,

data$Departure <- as.logical(data$Departure)

它简化为

by(data, data$ID, \(x) tail(x, with(rle(rev(x$Departure)), which.max(!values)))) |>
  do.call(what='rbind')

相关问题