R语言用中心极限定理求样本均值的分布

kb5ga3dv 于 2023-05-11 发布在其他

关注(0)|答案(4)|浏览(133)

设X1，…，X25为正态分布随机样本，均值=37，标准差=45。设xbar为样本平均值。xbar是如何分发的？我得用中心极限定理来验证它。
还计算P（xbar>43.1）

我的尝试

for(i in 1:1000){
    x=rnorm(25,mean=37,sd=45)
    xbar=mean(x)
    z=(xbar-37)/(45/sqrt(25))   
 }

 z

但是我找不到xbar的发行版。

r

来源：https://stackoverflow.com/questions/20306262/finding-distribution-of-sample-mean-by-central-limit-theorem

4条答案

按热度按时间

bksxznpy1#

更改for循环，改用replicate

set.seed(1)
X <- replicate(1000, rnorm(25,mean=37,sd=45)) 
X_bar <- colMeans(X)
hist(X_bar) # this is how the distribution of X_bar looks like

赞(0）回复(0）举报 2023-05-11

rur96b6h2#

xbar=c()
            for(i in 1:1000){
              x=rnorm(25,mean=37,sd=45)
              xbar=c(xbar,mean(x)) #save every time the value of xbar

            }
            hist(xbar) #plot the hist of xbar
            #compute the probability to b    e bigger thant 43.1
            prob=which(xbar>43.1)/length(xbar)

赞(0）回复(0）举报 2023-05-11

7y4bm7vi3#

只是想在这方面扩展一下。
中心极限定理指出平均值的分布是渐近N[mu, sd/sqrt(n)]。其中mu和sd是基础分布的平均值和标准差，n是计算平均值时使用的样本量。因此，在下面的示例中，data是从N[37,45]中提取的大小为2500的数据集，任意分割为100组，每组25个。means是每组均值的数据集。请注意，数据和平均值都是（近似）正态分布的，但平均值的分布要紧密得多（较低的sigma）。从CLT中，我们期望sd(mean) ~ sd(data)/sqrt(25)，它就是。

data  <- data.frame(sample=rep(1:100,each=25),x = rnorm(2500,mean=37,sd=45))
means <- aggregate(data$x,by=list(data$sample),mean)
#plot histoggrams
par(mfrow=c(1,2))
hist(data$x,main="",sub="Histogram of Underlying Data",xlim=c(-150,200))
hist(means$x,main="",sub="Histogram of Means", xlim=c(-150,200))
mtext("Underlying Data ~ N[37,45]",outer=T,line=-3)
c(sd.data=sd(data$x), sd.means=sd(means$x))
sd.data  sd.means 
43.548570  7.184518

但CLT的真实的威力在于，它表明均值的分布是渐近正态的，* 与基础数据的分布无关 *。这里显示了这一点，其中基础数据是从 * 均匀分布 * 中采样的。同样，sd(mean) ~ sd(data)/sqrt(25)。

data  <- data.frame(sample=rep(1:100,each=25),x = runif(2500,min=-150, max=200))
means <- aggregate(data$x,by=list(data$sample),mean)
#plot histoggrams
par(mfrow=c(1,2))
hist(data$x,main="",sub="Histogram of Underlying Data",xlim=c(-150,200))
hist(means$x,main="",sub="Histogram of Means", xlim=c(-150,200))
mtext("Underlying Data ~ U[-150,200]",outer=T,line=-3)
c(sd.data=sd(data$x), sd.means=sd(means$x))
sd.data sd.means 
99.7800  18.8176