如何获得R中按组分层的连续变量的描述性LateX表

5ssjco0h  于 2023-01-15  发布在  其他
关注(0)|答案(1)|浏览(153)
dat <- data.frame(outcome = rnorm(25), 
         sex = sample(c("F", "M"),  25, replace = TRUE),
         age_group = sample(c(1, 2, 3), 25, replace = TRUE))
> head(dat)
  outcome sex age_group
1  1.1423   F         2
2  0.0998   M         1
3 -1.6305   F         2
4 -1.6759   F         1
5  0.3825   F         2
6  0.7274   F         3

我有一个包含连续outcome变量的数据集。我想获得一个LaTeX表,其中包含按照sexage_group分层的该变量的描述性统计量。我希望它看起来像这样(它不必具有平均值(SD),但我希望按照年龄组和性别分层的结果布局):

我试过Hmisc软件包:

library(Hmisc)
output <- summaryM(outcome ~ sex + age_group, data = dat, test = TRUE)
latex(output, file = "")

但是输出看起来和我想要的大不相同

vybvopom

vybvopom1#

我对gt包比较熟悉,强烈建议您学习如何使用它。
下面是一个使用gt包的解决方案和您的示例代码。

#Install the package and load the dependencies. Here Ill be using dplyr to 
#group by variables.
install.packages("gt")
library(gt)
library(dplyr)
dat <- data.frame(outcome = rnorm(25), 
                  sex = sample(c("F", "M"),  25, replace = TRUE),
                  age_group = sample(c(1, 2, 3), 25, replace = TRUE))

head(dat) %>%
#Group by desired column
    group_by(sex) %>%
#Create a gt table with the data frame
    gt() %>% 
#Rename columns
    cols_label(outcome = "",
               sex = "Sex",
               age_group = "Cohort") %>% 
#Add a table title
#Notice the `md` function allows to write the title using markdown syntax (which allows HTML)
    tab_header(title = md("Table 1: Descriptive Statistics (N = 7")) %>% 
#Add a data source footnote
    tab_source_note(source_note = "Data: Stackoverflow question 7508787 [user: Adrian]")%>%
#you can customize the table´s body and lines as well using the tab_option
#function and tab_style function.
    tab_options(row.striping.include_table_body = FALSE) %>%
    tab_style(style = cell_borders(
      sides = c("top"),
      color = "black",
      weight = px(1),
      style = "solid"),
      locations = cells_body(
        columns = everything(),
        rows = everything()
      )) %>%
#Finally you can create summaries with different statistics as wanted.
  summary_rows(
    groups = TRUE,
    columns = outcome,
    fns = list(
      average = "mean",
      total = "sum",
      SD = "sd")
  )

最终的表如下所示:

相关问题