R语言 -1和1之间的随机数求和为0 [关闭]

qcbq4gxm  于 2023-03-27  发布在  其他
关注(0)|答案(6)|浏览(176)

已关闭。此问题需要details or clarity。当前不接受答案。
**想要改进此问题?**添加详细信息并通过editing this post阐明问题。

昨天关门了。
Improve this question
使用R,如何生成n随机数x_1,...,x_n,它们位于-11之间,并且总和为0
推广到另一个和和另一个范围又是什么呢?

ebdffaop

ebdffaop1#

第一个问题的简单解决方案

下面是第一个问题的简单解决方案。在-11之间模拟u_1,...,u_n。然后设置x_1 = (u_1-u_n)/2x_2 = (u_2-u_1)/2x_3 = (u_3-u_2)/2,...,x_n = (u_n-u_{n-1})/2

n <- 10L
u <- runif(n, -1, 1)
x <- c(u[1L]-u[n], diff(u)) / 2
sum(x)
summary(x)

泛化为任意和

现在,如何生成n随机数x_1,...,x_n,它们位于-11之间,并且总和为给定的数字s(在-nn之间)?

如前所述进行,但将u_i减去s/n,并将s/n添加到x_i

s <- 3
n <- 10L
u <- runif(n, -1, 1) - s/n
x <- c(u[1L]-u[n], diff(u))/2 + s/n
sum(x)
summary(x)

泛化到任意范围?

现在,如何生成n随机数x_1,...,x_n,它们位于两个给定的数字ab之间,并且总和为给定的数字s(在n*an*b之间)?

前面的方法可以推广到**a = -b**的情况:

a <- -4; b <- 4; s <- 10
n <- 10L
u <- runif(n, -(b-a)/2, (b-a)/2) - s/n
x <- c(u[1L]-u[n], diff(u))/2 + s/n
sum(x)
summary(x)

最后是DRS算法

Dirichlet-rescale algorithm解决了任意一对数字a < b的最后一个问题。
此外,模拟向量(x_1, ..., x_n)(n-1)维流形{sum x_i = s}a <= x_i <= b上具有***均匀分布***。
这是一个R实现,改编自Roger斯塔福德编写的Matlab实现(代码中给出了参考)。

# adapted from Roger Stafford's Matlab implementation
# Roger Stafford (2023). Random Vectors with Fixed Sum (https://www.mathworks.com/matlabcentral/fileexchange/9700-random-vectors-with-fixed-sum), MATLAB Central File Exchange. Retrieved March 23, 2023.
DRS <- function(n, s, a, b) {
  n <- as.integer(n)
  if(s < n*a || s > n*b || a >= b) {
    stop("Invalid parameters.")
  }
  # Rescale to a unit cube: 0 <= x(i) <= 1
  s <- (s - n*a) / (b - a)
  # Construct the transition probability table, t.
  #   t(i,j) will be used only in the region where j <= i + 1.
  k <- max(min(as.integer(floor(s)), n-1L), 0L) # Must have 0 <= k <= n-1
  s <- max(min(s, k+1L), k)                     # Must have k <= s <= k+1
  s1 <- s - (k:(k-n+1L))   # s1 will never be negative
  s2 <- ((k+n):(k+1L)) - s # s2 will never be negative
  w <- matrix(0, nrow = n, ncol = n+1L)
  w[1L, 2L] <- .Machine$double.xmax # Scale for full 'double' range
  t <- matrix(0, nrow = n-1L, ncol = n)
  tiny <- .Machine$double.eps # The smallest positive 'double' 
  for(i in 2L:n) {
    tmp1 <- w[i-1L, 2L:(i+1L)] * s1[1L:i] / i
    tmp2 <- w[i-1L, 1L:i] * s2[(n-i+1L):n] / i
    w[i, 2L:(i+1L)] <- tmp1 + tmp2
    tmp3 <- w[i, 2L:(i+1L)] + tiny               # In case tmp1 & tmp2 are both 0,
    tmp4 <- as.double(s2[(n-i+1L):n] > s1[1L:i]) #   then t is 0 on left & 1 on right
    t[i-1L, 1L:i] <- (tmp2 / tmp3) * tmp4 + (1 - tmp1 / tmp3) * (1 - tmp4)
  }
  # Derive the polytope volume v from the appropriate element in the bottom row of w.
  v <- n^(3/2) * (w[n, k+2L] / .Machine$double.xmax) * (b - a)^(n - 1L)
  # Now construct the vector x.
  x <- numeric(n)
  rt <- runif(n - 1L) # For random selection of simplex type
  rs <- runif(n - 1L) # For random location within a simplex
  j <- k + 1L # For indexing in the t table
  sm <- 0 # Start with sum zero
  pr <- 1 # Start with product 1
  for(i in (n-1L):1L) { # Work backwards in the t table
    e <- as.double(rt[n-i] <= t[i, j]) # Use rt to choose a transition
    sx <- rs[n-i] ^ (1L/i)             # Use rs to compute next simplex coordinate
    sm <- sm + (1 - sx) * pr * s / (i + 1L) # Update sum
    pr <- sx * pr                           # Update product
    x[n-i] <- sm + pr * e # Calculate x using simplex coordinates
    s <- s - e
    j <- j - e # Transition adjustment
  }
  x[n] <- sm + pr * s # Compute the last x
  # Randomly permute the order in x and rescale
  p <- order(runif(n)) # it is a random permutation
  a + (b - a) * x[p]
}

示例:

x <- DRS(n = 10L, s = 14, a = 1, b = 2)
sum(x)
summary(x)

编辑

有什么不对的:这不是DRS -请参阅给定链接中的幻灯片。DRS更一般地允许x_i的不同边界。

编辑

我刚刚发现这个方法是在R包Surrogate中实现的。

nwlqm0z1

nwlqm0z12#

***首先,我想说,下面的方法只是约束随机性的可能实现,但对于均匀分布属性或类似的东西来说并不安全。

  • ##递归

遵循递归的思想,我们可以首先生成具有更多边界约束的r,这可能会加快很多速度,并且***效率更高***

f <- function(s, n, a, b) {
  if (s < n * a || s > n * b) {
    stop("Invalid parameters.")
  }
  if (n == 1) {
    return(s)
  }
  r <- runif(1, max(a, s - (n - 1) * b), min(b, s - (n - 1) * a))
  c(r, Recall(s - r, n - 1, a, b))
}

我们可以看到

> (v <- f(s = 60, n = 30, a = 1, b = 3))
 [1] 1.544962 1.229845 2.013064 1.510149 2.933672 1.782947 1.650229 2.700521
 [9] 1.151468 1.758759 2.035019 1.355591 2.731922 2.918394 2.288166 2.198345
[17] 1.313646 2.312720 1.232810 1.591426 1.020105 2.788073 1.208734 2.929171
[25] 1.397976 2.044319 1.593190 2.961647 2.849886 2.953244

> summary(v)
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max.
  1.002   1.387   2.126   2.000   2.585   2.892

> length(v)
[1] 30

> sum(v)
[1] 60
  • ##拒绝采样

一种蛮力(低效)但懒惰的方法是拒绝抽样

f <- function(s, n, a, b) {
  repeat {
    v <- runif(n, a, b)
    x <- s * v / sum(v)
    if (all(x >= a & x <= b)) break
  }
  x
}

使得

> (v <- f(s = 60, n = 30, a = 1, b = 3))
 [1] 1.800257 1.706306 2.855300 2.177379 2.844279 2.293842 1.011653 2.820371
 [9] 2.803390 2.427355 1.892209 1.829180 2.240873 1.641185 2.267275 1.899986
[17] 1.042455 1.400519 2.612722 1.018635 2.024762 1.413173 1.376111 2.685723
[25] 1.886224 2.151509 1.598368 1.114850 2.303480 2.860629

> summary(v)
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max.
  1.012   1.609   1.962   2.000   2.396   2.861

> length(v)
[1] 30

> sum(v)
[1] 60
ej83mcc0

ej83mcc03#

Roger斯塔福德的算法不处理不相同的边界,因为它巧妙地利用了相同边界情况的对称性,以便在最坏的情况下有效地进行采样。
Dirichlet-Rescale(DRS)算法处理的是不同的边界,但本质上是生成-测试(在最坏的情况下效率不高),它有一个技巧:当我们在∑xi = s下采样xi ∈ [ai,bi]时,我们可以

  • 生成xi − ai ≥ 0,其中∑xi = s,并且测试xi ≤ bi;
  • 用∑xi = s生成bi − xi ≥ 0,并测试ai ≤ xi。

如果Σ(ai + bi)/2〈s,则第一选项更好,并且如果Σ(ai + bi)/2〉s,则第二选项更好(原始DRS呈现要复杂得多)。
DRS的另一个特性是它努力不消耗比它需要的更多的随机性,但在我看来这是一个错误,因为对于多次迭代样本来说可能会有奇怪的浮点效应,而随机数并没有那么昂贵。
在(测试不佳)R中:

DRS_upper <- function(s, u) {
    if (length(u) == 0 || any(u <= 0)) {
        stop("Invalid parameters.")
    }
    repeat {
        x <- diff(c(0, sort(runif(length(u) - 1, 0, s)), s))
        if (all(x <= u)) {
            return(x)
        }
    }
}

DRS <- function(s, a, b) {
    if (length(a) == 0 || length(b) == 0 || length(a) != length(b) || any(a >= b) ||
        s <= sum(a) || sum(b) <= s) {
        stop("Invalid parameters.")
    }
    if (s - sum(a) < sum(b) - s) {
        return(DRS_upper(s - sum(a), b - a) + a)
    }
    return(b - DRS_upper(sum(b) - s, b - a))
}

# Example
DRS(10, c(1, 2, 3), c(3, 4, 5))
gpfsuwkq

gpfsuwkq4#

这是一个基于狄利克雷采样的解决方案。

# Sample points uniformly on the simplex sum x_i = s, subject to
# constraints a < x_i < b, i = 1, ... m.

# First, transform to the [0, 1]^m cube by working with 
# y_i = (x_i - a)/(b - a)
# The sum becomes s' = (s - ma)/(b - a).

# If s' < 0 or s' > m there are no solutions, so quit.

# If s' < 1, the constraints don't matter, so rescale a Dirichlet:
#  z ~ Dirichlet(1, ..., 1), set y = z * s'

# If s' < m/2, use rejection sampling on a rescaled Dirichlet:
#  z ~ Dirichlet(1, ..., 1), Keep trying until y = z * s' is all
# within [0, 1].  All y values are automatically positive

# If s' > m/2 we'd reject too much, so work with 1 - y_i instead.

# At the end transform back to the original scale, 
# x = (b - a) y + a

# Uses brms and rgl packages

rRescaleDirichlet <- function(n, m, a, b, s) {
  s1 <- (s - m*a)/(b - a)
  if (s1 < 0 || s1 > m)
    stop("Parameters are out of range.")
  
  if (s1 > m/2) {
    flip <- TRUE
    s1 <- m - s1
  } else
    flip <- FALSE
  
  result <- matrix(NA, 0, m)
  oversample <- if (s1 < 1) 1 else 2
  n0 <- n
  while (n > 0) {
    z <- brms::rdirichlet(oversample*n, rep(1, m)) 
    y <- z * s1
    y <- y[apply(y, 1, function(x) all(x < 1)), ]
    keep <- min(n0 - nrow(result), nrow(y))
    result <- rbind(result, y[seq_len(keep),])
    n <- n0 - nrow(result)
  }
  if (flip)
    result <- 1 - result
  
  (b-a)*result + a
}

x <- rRescaleDirichlet(10000, 3, -1, 1, 0)
summary(x)
#>        V1                  V2                  V3           
#>  Min.   :-0.999990   Min.   :-0.999491   Min.   :-0.999842  
#>  1st Qu.:-0.426692   1st Qu.:-0.431709   1st Qu.:-0.423248  
#>  Median :-0.006093   Median : 0.001340   Median : 0.021912  
#>  Mean   :-0.005533   Mean   :-0.001905   Mean   : 0.007438  
#>  3rd Qu.: 0.405129   3rd Qu.: 0.420720   3rd Qu.: 0.425665  
#>  Max.   : 0.999463   Max.   : 0.999554   Max.   : 0.999787
rgl::plot3d(x)

创建于2023-03-24使用reprex v2.0.2

c3frrgcw

c3frrgcw5#

这里有一个简单的方法。

  • 生成0到1之间的n个浮点数
  • 按这些点的平均值左移
  • 将所有点从0向外拉伸以填充所需的空间。

下面是Ruby代码来实现这一点。下面代码中的mult处理拉伸。我使用的拉伸因子旨在保留极值点和范围之间的两个距离中更保守的一个,在加倍因子之后(考虑到范围加倍)

def f(n)
  arr = []
  n.times do
    arr.append(rand)
  end
  
  avg = arr.sum / n
  leftmost_pt = -1 + 2 * arr.min
  rightmost_pt = 2 * arr.max - 1
  mult = [leftmost_pt / (arr.min - avg), rightmost_pt / (arr.max - avg)].min

  0.upto(n-1) do |i|
    arr[i] -= avg
    arr[i] *= mult
  end
  
  puts arr.sort.to_s
end

样品结果:

> f(100)
[-0.9891868749476447, -0.9841797959702733, -0.848670830904512, -0.8483229277098323, -0.8411655155224181, -0.8324813381638559, -0.8181055804749083, -0.8079457473507614, -0.800987369937368, -0.7577576633783468, -0.7524936016842146, -0.7115445481559692, -0.6991425474171663, -0.6795757863364923, -0.6701781937582226, -0.6384927466389793, -0.6294964939183588, -0.6155664737630111, -0.5878014621149229, -0.5764696924978272, -0.5668841490309326, -0.5626603830488923, -0.5615953223333867, -0.5570822656913642, -0.5266628529945551, -0.48092505889101683, -0.46956100587764715, -0.4523341826710813, -0.4401174739476492, -0.43464401255527946, -0.42689187575599197, -0.4213913910405733, -0.38299537530720656, -0.3694495642896308, -0.2961361727287325, -0.28288156441036016, -0.23051721058762914, -0.22371836542884754, -0.20203604935005437, -0.16509606938681623, -0.15707841345756854, -0.14868405790953576, -0.11640501178558098, -0.09294898114518457, -0.09294832773848151, -0.02677332280186984, -0.010445732538003323, -0.007105127384422116, -0.004688040162969207, -0.0016748675082990008, 0.0070890488405637615, 0.015588732468742478, 0.06367463840208484, 0.06953757311859611, 0.08421785548733109, 0.0959447831422179, 0.11215579415565466, 0.16096550839479273, 0.16920990193823807, 0.1811855346946572, 0.18408755203001037, 0.18746303880386458, 0.23421156348587155, 0.24323063939140882, 0.24843662466506716, 0.26230264559033567, 0.2624774985463022, 0.263258301559409, 0.28314031656832084, 0.32070923475089447, 0.35435257074172066, 0.3748873647098375, 0.37923050659537216, 0.3852748462223294, 0.3923950709817138, 0.40799869447822296, 0.4844140183938689, 0.49290817630507755, 0.5020129260996314, 0.5152049976626806, 0.5338640544525307, 0.562752760795784, 0.6377096362138428, 0.6611913612619461, 0.6942175882927449, 0.7197736567323416, 0.7428400038132111, 0.7993062785764886, 0.8325002381387803, 0.8337771194652699, 0.8376220163644345, 0.8609672399947833, 0.8666284648034757, 0.8684073542091211, 0.8692456799466243, 0.926340770835069, 0.9455048113008502, 0.9486137984715667, 0.9500553302312981, 0.9730132962796711]
avg: 4.690909119475784e-17
swvgeqrz

swvgeqrz6#

取所有数字的和。计算平均值。从每个数字中减去平均值。除以新的最大绝对值。现在您有一个新的随机分布,其平均值为0.0,其值在[-1.0,1.0]中。

相关问题