R语言按宽度合并矢量

8ftvxx2r 于 2023-01-03 发布在其他

关注(0)|答案(5)|浏览(155)

我有一个连续变量的向量，例如：

x <- c(0.9000000,  1.2666667,  4.0000000,  5.7333333, 19.7333333, 35.7666667, 44.0000000,  4.4333333,  0.4666667,  0.7000000,  0.9333333,  1.0000000,  1.0000000,  1.0000000,  1.2000000,  1.2333333, 1.2666667,  1.4333333,  1.7000000,  4.0666667,  1.9000000,  2.1000000,  0.9333333,  1.2666667,  3.7333333,  0.9333333,  2.7666667,  3.1333333,  3.9333333,  5.0333333,  6.0666667,  8.2333333)

我想把这个向量按宽度（相等的值数）分成三组（低值、中值和高值），这样low组的值在所有值中排第三。
然后我想把低和中的箱子分组，这样我就有了一个分类向量，它有Not high个主题，这将是66%的最低值，和高，这将是33%的最高值。
我已经检查过了，我找不到任何预定义的函数来做这个。

来源：https://stackoverflow.com/questions/72055187/bin-a-vector-by-width

5条答案

按热度按时间

hof1towb1#

您可以使用cut来分割和标记数值向量。实际上，您正在寻找一个断点，该断点等于x的有序值的2/3，因此您可以执行以下操作：

break_point <- sort(x)[round(2 * length(x)/3)]

break_point
#> [1] 3.733333

任何高于3.733333的值都是“高”。因此我们可以：

y <- cut(x, breaks = c(-Inf, break_point, Inf), labels = c('not high', 'high'))

如果我们将其放入x的数据框中，可以看到y正确地标记了最高值：

data.frame(x, y)
#>             x        y
#> 1   0.9000000 not high
#> 2   1.2666667 not high
#> 3   4.0000000     high
#> 4   5.7333333     high
#> 5  19.7333333     high
#> 6  35.7666667     high
#> 7  44.0000000     high
#> 8   4.4333333     high
#> 9   0.4666667 not high
#> 10  0.7000000 not high
#> 11  0.9333333 not high
#> 12  1.0000000 not high
#> 13  1.0000000 not high
#> 14  1.0000000 not high
#> 15  1.2000000 not high
#> 16  1.2333333 not high
#> 17  1.2666667 not high
#> 18  1.4333333 not high
#> 19  1.7000000 not high
#> 20  4.0666667     high
#> 21  1.9000000 not high
#> 22  2.1000000 not high
#> 23  0.9333333 not high
#> 24  1.2666667 not high
#> 25  3.7333333 not high
#> 26  0.9333333 not high
#> 27  2.7666667 not high
#> 28  3.1333333 not high
#> 29  3.9333333     high
#> 30  5.0333333     high
#> 31  6.0666667     high
#> 32  8.2333333     high

您可以看到，大约2/3的病例“不高”，1/3的病例“高”：

table(y) / length(x)
#> y
#> not high     high 
#>  0.65625  0.34375

在“不高”组中不能正好有2/3，因为向量的长度是32，不能被3整除。

赞(0）回复(0）举报 2023-01-03

rdlzhqv92#

您可以使用quantile()：

y <- ifelse(x < quantile(x, 2/3), "not high", "high")

proportions(table(y))

#     high not high 
#  0.34375  0.65625

赞(0）回复(0）举报 2023-01-03

qeeaahzv3#

您可以使用santoku::chop_equally()：

library(santoku)
chopped <- santoku::chop_equally(x, 3, labels = c("low", "medium", "high"))
data.frame(x, chopped)
            x chopped
1   0.9000000     low
2   1.2666667  medium
3   4.0000000    high
4   5.7333333    high
5  19.7333333    high
6  35.7666667    high
7  44.0000000    high
8   4.4333333    high
9   0.4666667     low
10  0.7000000     low
...

然后，您可以对因子进行重新分组（如果要保留低/中/高版本）：

library(forcats)
chopped2 <- forcats::fct_collapse(chopped, 
                                    "High" = "high", 
                                     other_level = "Not high"
                                  )
data.frame(x, chopped2)
            x chopped2
1   0.9000000 Not high
2   1.2666667 Not high
3   4.0000000     High
4   5.7333333     High
5  19.7333333     High
6  35.7666667     High
7  44.0000000     High
8   4.4333333     High
9   0.4666667 Not high
10  0.7000000 Not high
...

或者，如果您只需要"High "/" Not high"版本，请使用chop_quantiles()：

chopped2 <- santoku::chop_quantiles(x, .66, 
                                    labels = c("Not high", "High"))
data.frame(x, chopped2)
            x chopped2
1   0.9000000 Not high
2   1.2666667 Not high
3   4.0000000     High
4   5.7333333     High
5  19.7333333     High
6  35.7666667     High
7  44.0000000     High
8   4.4333333     High
9   0.4666667 Not high
10  0.7000000 Not high
...

你说你想按"宽度（相等数量的值）"进行分组。上面的分组是按相等数量的值进行的，即3个类别中的每一个都是1/3。如果你想按宽度进行分组，即分成相等宽度的间隔，使用santoku::chop_evenly()：

chopped3 <- santoku::chop_evenly(x, 3, labels = c("low", "medium", "high"))
data.frame(x, chopped3)
            x chopped3
1   0.9000000      low
2   1.2666667      low
3   4.0000000      low
4   5.7333333      low
5  19.7333333   medium
6  35.7666667     high
7  44.0000000     high
8   4.4333333      low
9   0.4666667      low
10  0.7000000      low
...

注意：我是三德软件包的维护者。

赞(0）回复(0）举报 2023-01-03

wh6knrhe4#

这是我在R文档中找到的，可能有帮助吗？
bin（x，bin）
关于bin：
“......使用“cut：：n”将向量切割为n个相等部分，B）使用“cut：：a]b[”创建以下bin：[最小值，a]，]a，B[，[b，最大值]..”
使用库fixest https://rdrr.io/cran/fixest/虽然我检查了这只适用于整数，对不起。
来源：https://rdrr.io/cran/fixest/man/bin.html

赞(0）回复(0）举报 2023-01-03

lbsnaicq5#

正如@Ishan_托马尔提到的，您可以使用fixest包中的bin函数。
对于数字，语法是bin(x, "cut::a]b[")，a、b（等）数字或百分位数后跟一个开或关的方括号（请注意，函数bin比这更通用）。
在您的例子中，您希望创建两个组，第一组包含66%的数据，然后您可以写入"cut::p66]"（p66表示第66百分位数），这将创建两个组：[p0, p66]和]p66, p100]。
然后，要给予自定义名称，只需将它们添加到向量中，如下所示：

# You can use the bin function:
x_bin = fixest::bin(x, c("cut::p66]", "Not high", "High"))

# Check it worked
table(x_bin)
#> x_bin
#> Not high     High 
#>       21       11

赞(0）回复(0）举报 2023-01-03

我来回答

R语言按宽度合并矢量

5条答案

相关问题

热门标签

最新问答

R语言 按宽度合并矢量

5条答案

相关问题

热门标签

最新问答

R语言按宽度合并矢量