R -根据值中的子字符串/模式选择 Dataframe 列

nhaq1z21 于 2023-02-27 发布在其他

关注(0)|答案(1)|浏览(125)

我经常使用大型数据集，它们被笨拙地标记为列标题，然后在第二行标记为副标题。当导入R时，这意味着大多数数据会自动转换为字符，但这不是问题所在，因为我可以稍后更正这一点。问题是试图使用第一行的副标题来选择某些列进行进一步分析。
以下是一些示例数据：

df <- data.frame(col1 = c("10='strongly agree' 0='strongly disagree'", "3", "5"),
                 col2 = c("5='far too much' 3='just right' 1='far too little'", "2", "1"),
                 col3 = c("5='far too thick' 3='just right' 1='far too thin'", "4", "5"),
                 col4 = c("10='strongly agree' 0='strongly disagree'", "8", "7"),
                 col5 = c("1='Yes' 2='No'", "1", "1"))

我希望能够根据第一行中找到的（子）字符串/模式来选择列。实际数据有数百列，因此如果能够在几行简单的代码中完成此操作，而不是手动选择列，效率会高得多。
使用示例数据时，我可能需要做出几个选择：
1.选择与第一行完全匹配"10 ='强烈同意' 0 ='强烈不同意'"的列
1.选择具有"1"、"3"和"5"的列，或者在第一行的字符串中的某处同时具有"是"和"否"的列
在上述情况下，输出结果将与以下内容相同：

df_1 <- data.frame(col1 = c("10='strongly agree' 0='strongly disagree'", "3", "5"),
                 col4 = c("10='strongly agree' 0='strongly disagree'", "8", "7"))

df_2 <- data.frame(col2 = c("5='far too much' 3='just right' 1='far too little'", "2", "1"),
                 col3 = c("5='far too thick' 3='just right' 1='far too thin'", "4", "5"),
                 col5 = c("1='Yes' 2='No'", "1", "1"))

了解如何检查列中的任何地方而不仅仅是第一行也会很有用，但这并不重要。
先谢了!

r

来源：https://stackoverflow.com/questions/75511516/r-select-columns-of-dataframe-based-on-substrings-patterns-in-values

1条答案

按热度按时间

enyaitl31#

我们可以创建自定义tidyselect函数并在dplyr::select(where(...))中使用它们：

library(dplyr)

is_10_to_0 <- function(x) {
  dplyr::first(x) == "10='strongly agree' 0='strongly disagree'"
}

is_5_3_1_or_Yes_No <- function(x) {
  grepl("(.*5.*3.*1.*)|(.*Yes.*No.*)", dplyr::first(x))
}

library(dplyr)

df %>% 
  select(where(is_10_to_0))
#>                                        col1
#> 1 10='strongly agree' 0='strongly disagree'
#> 2                                         3
#> 3                                         5
#>                                        col4
#> 1 10='strongly agree' 0='strongly disagree'
#> 2                                         8
#> 3                                         7

df %>% 
  select(where(is_5_3_1_or_Yes_No))
#>                                                 col2
#> 1 5='far too much' 3='just right' 1='far too little'
#> 2                                                  2
#> 3                                                  1
#>                                                col3           col5
#> 1 5='far too thick' 3='just right' 1='far too thin' 1='Yes' 2='No'
#> 2                                                 4              1
#> 3                                                 5              1

来自OP的数据

df <- data.frame(col1 = c("10='strongly agree' 0='strongly disagree'", "3", "5"),
                 col2 = c("5='far too much' 3='just right' 1='far too little'", "2", "1"),
                 col3 = c("5='far too thick' 3='just right' 1='far too thin'", "4", "5"),
                 col4 = c("10='strongly agree' 0='strongly disagree'", "8", "7"),
                 col5 = c("1='Yes' 2='No'", "1", "1"))

如果你想检查字符串是否出现在any()行中，我们可以重写函数如下：

is_10_to_0 <- function(x) {
  any(grepl("(.*5.*3.*1.*)|(.*Yes.*No.*)", x))
}

由reprex package（v2.0.1）于2023年2月20日创建

赞(0）回复(0）举报 2023-02-27

我来回答

R -根据值中的子字符串/模式选择 Dataframe 列

1条答案

相关问题

热门标签

最新问答