R语言选择随机行，直到达到其他列的阈值

h22fl7wq 于 2023-11-14 发布在其他

关注(0)|答案(3)|浏览(107)

我在R中有一个SF对象，它看起来如下：

Type   Value   Geometry 
A       1        ()
A       3        ()
B       2        ()
A       1        ()
C       4        ()

字符串
在几何列中，存储了多边形特征的几何形状。我想随机采样行，直到达到或超过Value列中值之和的阈值（假设为5）。
如果在上面的示例中，采样了1、4和5，则采样停止。

来源：https://stackoverflow.com/questions/77461179/select-random-rows-until-threshold-value-from-other-column-is-reached

3条答案

按热度按时间

aelbi1ox1#

假设：
1.同一行不能选择多次
1.你想保留那一行，那一行让你超过了5的“限制”，

tidyverse版本：

df <- tibble(type = c('a', 'a', 'b', 'a', 'c'), value = c(1, 3, 2, 1, 4))
set.seed(42)
df[sample(nrow(df)),] |> 
  mutate(cumsum = cumsum(value)) |> 
  filter(lag(cumsum,1, default = 0) < 5) |> 
  select(-cumsum)

# A tibble: 2 × 2
  type  value
  <chr> <dbl>
1 a         1
2 c         4

字符串
这会随机化DF中的行，对“value”进行累积求和，并在超过累积和限制5后过滤掉所有行。
我使用了“lag”来确保你也得到了你越过限制的那一行。
四次试验：

# A tibble: 4 × 2
  type  value
  <chr> <dbl>
1 a         1
2 b         2
3 a         1
4 a         3

# A tibble: 2 × 2
  type  value
  <chr> <dbl>
1 c         4
2 a         1

# A tibble: 3 × 2
  type  value
  <chr> <dbl>
1 a         3
2 a         1
3 c         4

# A tibble: 4 × 2
  type  value
  <chr> <dbl>
1 b         2
2 a         1
3 a         1
4 c         4

型

Base R版本：

df[sample(nrow(df)),] |> {\(x) `[`(x, c(TRUE,head(cumsum(x$value) < 5, -1)),)}()

型
说明：

df[sample(nrow(df)),]

型
以随机行顺序提供数据集。

|> {\(x) `[`(x,  ....  ,)}()

型
是在基管道中使用[]子集的一种方法。

c(TRUE,head(cumsum(x$value) < 5, -1))

型
将和小于5的行加上和大于5的行。

赞(0）回复(0）举报 2023-11-14

dm7nw8vv2#

你可以使用while循环来检查每次迭代的总和：

library(tidyverse)

df <- tibble(type = c('a', 'a', 'b', 'a', 'c'), value = c(1, 3, 2, 1, 4))
samples <- tibble()
sample_sum <- 0

while (sample_sum < 5) {
  ix <- sample(1:nrow(df), size = 1, replace = TRUE)
  samples <- bind_rows(samples, slice(df, ix))
  sample_sum <- sum(samples$value)
}

字符串

赞(0）回复(0）举报 2023-11-14

r6l8ljro3#

我不确定这是否是最有效的方法，但你可以在for循环中创建一个子集，删除每一步中拾取的行，然后计算子集中值的总和，如果达到阈值就停止。

df1 <- read.table(text = "Type   Value   Geometry 
A       1        ()
A       3        ()
B       2        ()
A       1        ()
C       4        ()", header = T, stringsAsFactors = F)

df1_step <- df1
df1_subset <- data.frame(matrix(ncol = ncol(df1), nrow = 0))

set.seed(123)

for(i in seq_len(nrow(df1))){
  sub_id <- sample(seq_len(nrow(df1_step)), size = 1)
  df1_subset <- rbind(df1_subset, df1_step[sub_id,])
  df1_step <- df1_step[-sub_id,]
  if (sum(df1_subset$Value) >= 5) { break }
}

## sample
df1_subset
#>   Type Value Geometry
#> 3    B     2       ()
#> 2    A     3       ()

## rows that were not picked up
df1_step
#>   Type Value Geometry
#> 1    A     1       ()
#> 4    A     1       ()
#> 5    C     4       ()

字符串
创建于2023-11-10使用reprex v2.0.2

赞(0）回复(0）举报 2023-11-14

我来回答

R语言选择随机行，直到达到其他列的阈值

3条答案

tidyverse版本：

Base R版本：

相关问题

热门标签

最新问答

R语言 选择随机行，直到达到其他列的阈值

3条答案

tidyverse版本：

Base R版本：

相关问题

热门标签

最新问答

R语言选择随机行，直到达到其他列的阈值