R：如何只选择连续数值列

iyfjxgzm 于 2023-07-31 发布在其他

关注(0)|答案(4)|浏览(88)

这可能更像是一个理论问题，而不是一个编码问题。
我正在尝试编写一个闪亮的应用程序，它将循环通过 Dataframe 的连续数字列，并对这些列执行测试。该应用程序允许用户上传自己的 Dataframe ，所以我不知道它会是什么样子的提前。我知道我可以用dplyr包按以下方式只选择数字列

library(dplyr)
data <- data %>%
        select(where(is.numeric))

字符串
这是可行的，但也保留了离散的数值列。我想不出一个只选择连续列的好方法。
我曾经考虑过尝试做一些事情，比如只选择模式重复次数小于 Dataframe 长度的特定比例的列。或者，可能像唯一值的数量需要大于模式重复的次数。但这两种方法似乎都不能很好地推广。他们也不会摆脱id列。
我很感激任何想法，谢谢。

来源：https://stackoverflow.com/questions/66846937/r-how-to-select-only-continuous-numeric-columns

4条答案

按热度按时间

3npbholx1#

有一个库schoolmath，其中包含is.decimal和is.whole函数：

library(schoolmath)
x <- c(1, 1.5)
any(is.decimal(x))
TRUE

字符串
所以你可以用apply处理你的 Dataframe ：

decimal_cols <- apply(df, 2, function(x) any(is.decimal(x))

型
返回的TRUE的索引值将是具有十进制值的列。

赞(0）回复(0）举报 2023-07-31

wb1gzix02#

如何定义is_continuous：

# one of them:
is_discrete   <- function(vec) all(is.numeric(x)) && all(x %% 1 == 0)
is_discrete   <- function(vec, tolerance=0.000001) all(is.numeric(x)) && all(min(abs(c(x %% 1, x %% 1 - 1))) < tolerance)

# and then:
is_continuous <- function(vec) all(is.numeric(vec)) && !is_discrete(vec)

字符串
在此之后，您可以执行以下操作：

library(dplyr)
data <- data %>%
        select(where(is_continuous))

型

赞(0）回复(0）举报 2023-07-31

wpx232ag3#

你考虑过把离散变量转化为因子吗？下面是一个例子，可能有你正在寻找的解决方案：

library(dplyr)

head(mtcars)

> head(mtcars)
                   mpg cyl disp  hp drat    wt  qsec vs am gear carb
Mazda RX4         21.0   6  160 110 3.90 2.620 16.46  0  1    4    4
Mazda RX4 Wag     21.0   6  160 110 3.90 2.875 17.02  0  1    4    4
Datsun 710        22.8   4  108  93 3.85 2.320 18.61  1  1    4    1
Hornet 4 Drive    21.4   6  258 110 3.08 3.215 19.44  1  0    3    1
Hornet Sportabout 18.7   8  360 175 3.15 3.440 17.02  0  0    3    2
Valiant           18.1   6  225 105 2.76 3.460 20.22  1  0    3    1

Then I turn cyl into a factor and then select only numeric columns apart from the factor which is cyl:

mtcars2 %>%
  as_tibble() %>%
  mutate(cyl = as.factor(cyl)) %>%
  select(where( ~ !is.factor(.x) && is.numeric(.x))) %>%
  slice_head(n = 5)

# A tibble: 5 x 10
    mpg  disp    hp  drat    wt  qsec    vs    am  gear  carb
  <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1  21     160   110  3.9   2.62  16.5     0     1     4     4
2  21     160   110  3.9   2.88  17.0     0     1     4     4
3  22.8   108    93  3.85  2.32  18.6     1     1     4     1
4  21.4   258   110  3.08  3.22  19.4     1     0     3     1
5  18.7   360   175  3.15  3.44  17.0     0     0     3     2

字符串
我编辑了我的could，只使用select函数。然而，我假设你的离散变量有有限的范围，比如cyl。如果你能分享一段你的数据，让我们看看它们到底是什么，也许会更好。

赞(0）回复(0）举报 2023-07-31

y53ybaqx4#

您也可以尝试：

hot.deck::is.discrete( mtcars$mpg, cutoff = 7 )
hot.deck::is.discrete( mtcars$carb, cutoff = 7 )

字符串
识别具有7个或更多水平的所有变量

names(mtcars)[as.logical(sapply( mtcars ,  function(X){!hot.deck::is.discrete(X , cutoff = 7 )}))]

型

赞(0）回复(0）举报 2023-07-31

我来回答

R：如何只选择连续数值列

4条答案

相关问题

热门标签

最新问答