R中欠采样的最佳方法是什么

vdzxcuhz 于 2023-05-20 发布在其他

关注(0)|答案(1)|浏览(171)

我有一个属性为A，B，C的数据集。C是具有2个标签zz和z的因子。number of (z) > number of (zz)，我想下样本我的数据集，以便在新的数据中有相同的zz和z值。不能使用任何外部包。* 最好使用sample函数 *

--------------------------------------------------
| Attribute A   |   Attribute B . | Attribute c  |
--------------------------------------------------
|  xx           | y1              | zz           |
--------------------------------------------------
|  mm           | r1              |  z           |
--------------------------------------------------
|  ab           | 1r              |  z           |
--------------------------------------------------
|  ry           | cm              |  zz          |
--------------------------------------------------
|  ca           | rx              |  z           |
--------------------------------------------------
|  mm           | zr              |  z           |
--------------------------------------------------

结果应该是-

| Attribute A   |   Attribute B . | Attribute c  |
--------------------------------------------------
|  xx           | y1              | zz           |
--------------------------------------------------
|  mm           | r1              |  z           |
--------------------------------------------------
|  ab           | 1r              |  z           |
--------------------------------------------------
|  ry           | cm              |  zz          |
--------------------------------------------------

这里zz的概率= z = 0.5的概率

r

来源：https://stackoverflow.com/questions/48981550/what-is-the-best-way-of-under-sampling-in-r

1条答案

按热度按时间

2wnc66cl1#

假设您的数据位于名为data的数据框中，列为A、B和C，您可以执行以下操作：

## rows that have "z" and "zz" entries
z_ind <- which(data$C == "z")
zz_ind <- which(data$C == "zz")

nsamp <- 10   #number of elements to sample
## if you want all elements of the smaller class, could be:
## nsamp <- min(length(z_ind), length(zz_ind))

## select `nsamp` entries with "z" and `nsamp` entries with "zz"
pick_z <- sample(z_ind, nsamp)
pick_zz <- sample(zz_ind, nsamp)

new_data <- data[c(pick_z, pick_zz), ]

赞(0）回复(0）举报 2023-05-20

我来回答

R中欠采样的最佳方法是什么

1条答案

相关问题

热门标签

最新问答