R语言 如何模糊匹配多个列(两对和跨组合)

nc1teljy  于 2023-06-19  发布在  其他
关注(0)|答案(1)|浏览(120)

假设我有四列,我想模糊匹配这些列,以便找出哪些值存在于列对和列组合中。成对是直接的(例如1-2,1-3,1-4,2-3,2-4,3-4),但我还想看看一个字符是否与1-2-3相同,或者是否有值出现在所有4列中(例如,“dolor”应该出现在所有列中)。在R中有没有一个系统的方法来做到这一点?

col1 <- c("Lorem", "ipsum", "dolor", "sit", "amet", "consectetur", "adipiscing", "elit", "sed", "do")

col2 <- c("Lorem", "ipsum", "Dolor", "adipiscing", "elite", "sed", "doo")

col3 <- c("dolore", "adipiscing", "sed", "doo")

col4 <- c("ipsun", "dolor", "sit", "amet", "consecteture", "adipiscing", "elit", "sed", "do")
xggvc2p6

xggvc2p61#

# first, make a vector of all of the unique values of the columns
all_values <- unique(c(col1, col2, col3, col4))

# to get the values in all four columns
in_all_four <- function(c) {
    # return TRUE if the value is in all four columns
    all(c %in% col1, c %in% col2, c %in% col3, c %in% col4)
}

# now, use that function to filter the unique values
sapply(all_values, in_all_four)

# to get the columns that each value is in

in_columns <- function(c) {
    # return a vector of the columns a value is in
    which(sapply(list(col1, col2, col3, col4), function(x) c %in% x))
    
}

# create a dataframe, 
df <- tibble(
    value = all_values,
    in_columns = sapply(all_values, in_columns))

上面计算的df tibble:

如果你还需要什么就告诉我!:—)

相关问题