R语言 如何手工创建我的统计汇总表

icomxhvb  于 2023-01-15  发布在  其他
关注(0)|答案(1)|浏览(199)
mydata<-structure(list(Weight = c(66.2, 65.2, 69.8, 63.4, 67.4, 66.3, 
                         63.8, 67.8, 66.7, 66.2, 61.9, 66.9, 69.4, 60.8, 64.1, 62.8, 62.5, 
                         60.9, 61.3, 67.8), Age = c(68, 67, 65, 65, 63, 64, 68, 65, 65, 
                                                    71, 64, 65, 68, 61, 65, 62, 60, 66, 62, 58), 
               Sex = c("H", "H", 
                        "H", "H", "H", "H", "F", "F", "F", "F", "H", "H", "H", "F", "F", 
                        "F", "F", "F", "F", "F"),
               Group = c("G1", "G1", "G1", "G1", 
                          "G1", "G1", "G1", "G1", "G1", "G1", "G2", "G2", "G2", "G2", "G2",
                          "G2", "G2", "G2", "G2", "G2")), row.names = c(NA, -20L), 
          class = "data.frame")

我想通过手动创建表格来总结我的数据。我的目标是比较两组之间的变量。我不知道有任何软件可以让我以表格格式获得均值和p值差异的置信区间。我必须用Rmarkdown以word格式导出数据,所以我应该以表格格式导出数据。
我创建的所有参数如下所示:

confInt<-paste(round(t.test(mydata$Weight~mydata$Group)$conf.int[1],2),
               round(t.test(mydata$Weight~mydata$Group)$conf.int[2],2),sep = ";")
p.value<-round(t.test(mydata$Weight~mydata$Group)$p.value,3)

mean1<-mean(mydata$Weight[mydata$Group=="G1"])
mean2<-mean(mydata$Weight[mydata$Group=="G2"])

mean_diff<-(mean(mydata$Weight[mydata$Group=="G1"])-
mean(mydata$Weight[mydata$Group=="G2"]))

我们的目标是通过一个循环或一个函数为每个数值变量创建这些参数。

然后通过rowbind绑定每个变量的统计信息

bhmjp9jg

bhmjp9jg1#

我们可以创建一个函数,它接受数据mydata、数值列col和分组列group

summary_val <- function(mydata,col,group){
  x <- mydata[[col]]
  group_data <- mydata[[group]]
  
  confInt<-paste(round(t.test(x~group_data)$conf.int[1],2),
                 round(t.test(x~group_data)$conf.int[2],2),sep = ";")
  p.value<-round(t.test(x~group_data)$p.value,3)
  
  mean1<-mean(x[group_data=="G1"])
  mean2<-mean(x[group_data=="G2"])
  
  mean_diff<-(mean(x[group_data=="G1"])-
                mean(x[group_data=="G2"]))
  diff <- paste0(mean_diff,"[",confInt,"]")
  return(data.frame(var=col,G1=mean1,G2=mean2,`Diff.CI.`=diff,`p.value`=p.value))
}

summary_val(mydata,"Weight","Group")

     var    G1    G2         Diff.CI. p.value
1 Weight 66.28 63.84 2.44[-0.01;4.89]   0.051

然后我们可以使用以下代码来提取数字列的名称:

num_var <- names(mydata)[unlist(lapply(mydata, is.numeric))]
num_var
[1] "Weight" "Age"

并通过for循环获得摘要输出:

mysummary <- data.frame()
for(var in num_var){
  mysummary <- rbind(mysummary,summary_val(mydata,var,"Group"))
}
mysummary
     var    G1    G2                    Diff.CI. p.value
1 Weight 66.28 63.84            2.44[-0.01;4.89]   0.051
2    Age 66.10 63.10 2.99999999999999[0.43;5.57]   0.025

或使用do.call + lapply

summary_val2 <- function(col,mydata,group){
  summary_val(mydata,col,group)
}

do.call(rbind,lapply(num_var,summary_val2,mydata,"Group"))
     var    G1    G2                    Diff.CI. p.value
1 Weight 66.28 63.84            2.44[-0.01;4.89]   0.051
2    Age 66.10 63.10 2.99999999999999[0.43;5.57]   0.025

相关问题