R语言 按行匹配字符串筛选字符串

juzqafwq  于 2023-10-13  发布在  其他
关注(0)|答案(4)|浏览(114)

假设下面的dataframe:

data.frame(Var1=c("1  2  3","1  6","2  5  9","1  5  3"),Var2 = c("1  2","1  6  0  5","3  7","1  5"),Var3=c("2  8","1  3","6  19","1  3"))

     Var1         Var2        Var3
1 1  2  3         1  2       2  8
2    1  6 1  6    0  5       1  3
3 2  5  9         3  7       6  19
4 1  5  3         1  5       1  3

我想保留不包含任何数字的行,这些数字按行和按列重复。
因此,如果任何数字在特定行的至少两列中,则应删除该行。因此,在这种情况下,结果应该是:

Var1       Var2     Var3
3 2  5  9       3  7    6  19

当我的df中的列数增加时,我想使用across()函数来过滤这些行。
非常感谢

bybem2ql

bybem2ql1#

使用regex

pattern <- "(\\d+).+\\b\\1\\b.*"
df[!grepl(pattern, do.call(paste, df)), ]

#      Var1 Var2  Var3
# 3 2  5  9 3  7 6  19

与dobur相同

df |> 
  rowwise() |> 
  filter(!grepl(pattern, paste0(c_across(everything()), collapse = " ")))
t98cgbkg

t98cgbkg2#

您可以像下面这样使用anyDuplicated尝试apply

> df[!apply(df, 1, \(x) anyDuplicated(scan(text = x, quiet = TRUE))), ]
     Var1 Var2  Var3
3 2  5  9 3  7 6  19
ttvkxqim

ttvkxqim3#

奇怪的数据结构。在这里,我将其转换为长格式,将字符串分隔为更长格式的行,重复数据删除,然后将其放回初始格式:

library(dplyr)
library(tidyr)
df |>
  mutate(
    row = row_number()
  ) |>
  pivot_longer(-row) |>
  separate_longer_delim(value, delim = stringr::regex(" +")) |>
  filter(!anyDuplicated(value), .by = row) |>
  summarize(value = paste(value, collapse = "  "), .by = name) |>
  pivot_wider(names_from = name, values_from = value)
# # A tibble: 1 × 3
#   Var1    Var2  Var3 
#   <chr>   <chr> <chr>
# 1 2  5  9 3  7  6  19
fcwjkofz

fcwjkofz4#

library(tidyverse)

df <- data.frame(
  Var1 = c(
    "1  2  3", "1  6",
    "2  5  9", "1  5  3"
  ),
  Var2 = c(
    "1  2", "1  6  0  5",
    "3  7", "1  5"
  ),
  Var3 = c("2  8", "1  3", "6  19", "1  3")
)
# take a string of numbers and spaces and get a vector of numbers
myfunc <- function(x) {
  parse_number(unlist(strsplit(x, split = " ", fixed = TRUE))) |>
    na.omit() |>
    as.integer()
}
# test 
myfunc("1 2 3") |> str()

  # calculate
  (df2 <- mutate(rowwise(df),
                 fstring = paste0(c_across(everything()), collapse = " "),
                 nums = list(myfunc(fstring))
  ))
  
  # analyse
  (keepvec <- map_lgl(df2$nums, \(x)!anyDuplicated(x)))
  
  # final_result
  df |> filter(keepvec)

相关问题