如何在r中同时使用s()和I()?

pbpqsu0x  于 2023-05-20  发布在  其他
关注(0)|答案(1)|浏览(168)

我使用gam来拟合数据,从mgcv得到s()gam()。但是,当我尝试输入公式时:

y =  data.frame(n = c(1:5),b = runif(5),c = runif(5,min = 2,max = 3),d = c(5:9),e = scale(c(11:15)),f = c(12:16))

model1 = gam(n ~ b + c + I(s(sqrt(d),bs = 'tp',df=2)) +
                I(s(sqrt(e),bs = 'tp',df=2)) +
                I(s(sqrt(f),bs = 'tp',df=2)),
              family=quasipoisson(),
              data = y,
              method = 'REML',
              select = TRUE)

结果是:

Error in terms.formula(reformulate(term[i])) : 
  invalid model formula in ExtractVars

如何以正确的方式输入公式?我已经尝试了不同的方法,结果都失败了...请给予我一些建议。谢谢你。
编辑:我很确定这个错误与stats::formula有关。我想问一下同时使用s()I()有什么相关的规则吗?
会话信息为:

> sessionInfo()
R version 4.2.2 (2022-10-31 ucrt)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 19045)

Matrix products: default

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] Hmisc_4.8-0       Formula_1.2-5     survival_3.4-0    lattice_0.20-45   mgcv_1.8-41       nlme_3.1-160      dlnm_2.4.7        rvest_1.0.3       plotly_4.10.1     data.table_1.14.8 arrow_11.0.0.2   
[12] magrittr_2.0.3    lubridate_1.9.2   forcats_1.0.0     stringr_1.5.0     dplyr_1.1.0       purrr_1.0.1       readr_2.1.4       tidyr_1.3.0       tibble_3.1.8      ggplot2_3.4.1     tidyverse_2.0.0  
[23] readxl_1.4.2     

loaded via a namespace (and not attached):
  [1] minqa_1.2.5             colorspace_2.1-0        deldir_1.0-6            sjlabelled_1.2.0        htmlTable_2.4.1         estimability_1.4.1      snakecase_0.11.0        base64enc_0.1-3        
  [9] rstudioapi_0.14.0-9000  bit64_4.0.5             fansi_1.0.4             mvtnorm_1.1-3           xml2_1.3.3              splines_4.2.2           cachem_1.0.6            knitr_1.42             
 [17] sjmisc_2.8.9            jsonlite_1.8.4          splitstackshape_1.4.8   nloptr_2.0.3            ggeffects_1.2.1         broom_1.0.4             cluster_2.1.4           png_0.1-8              
 [25] clipr_0.8.0             compiler_4.2.2          httr_1.4.5              sjstats_0.18.2          emmeans_1.8.5           backports_1.4.1         assertthat_0.2.1        Matrix_1.5-1           
 [33] fastmap_1.1.0           lazyeval_0.2.2          cli_3.6.0               htmltools_0.5.4         tools_4.2.2             coda_0.19-4             gtable_0.3.3            glue_1.6.2             
 [41] Rcpp_1.0.10             santoku_0.9.0           cellranger_1.1.0        vctrs_0.5.2             sjPlot_2.8.12           insight_0.19.1          lemon_0.4.6             xfun_0.37              
 [49] metR_0.14.0             openxlsx_4.2.5.2        lme4_1.1-31             timechange_0.2.0        lifecycle_1.0.3         MASS_7.3-58.1           scales_1.2.1            hms_1.1.3              
 [57] RColorBrewer_1.1-3      curl_5.0.0              memoise_2.0.1           gridExtra_2.3           padr_0.6.2              rpart_4.1.19            latticeExtra_0.6-30     stringi_1.7.12         
 [65] bayestestR_0.13.0       checkmate_2.1.0         boot_1.3-28.1           zip_2.2.2               repr_1.1.6              rlang_1.1.0             pkgconfig_2.0.3         tsModel_0.6-1          
 [73] htmlwidgets_1.6.2       bit_4.0.5               tidyselect_1.2.0        plyr_1.8.8              R6_2.5.1                generics_0.1.3          foreign_0.8-83          pillar_1.9.0           
 [81] withr_2.5.0             nnet_7.3-18             datawizard_0.7.1        performance_0.10.2      janitor_2.2.0           modelr_0.1.11           crayon_1.5.2            interp_1.1-3           
 [89] utf8_1.2.3              tzdb_0.3.0              extraInserts_0.0.0.9003 viridis_0.6.2           jpeg_0.1-10             grid_4.2.2              digest_0.6.31           xtable_1.8-4           
 [97] ggeasy_0.1.4            munsell_0.5.0           viridisLite_0.4.1       skimr_2.1.5
j0pj023g

j0pj023g1#

简短回答:这里不需要I(),如果需要,它可能在s()调用中。
出现错误的原因是模型公式的某个部分无效。

> model1 <- gam(n ~ b + c + I(s(sqrt(d),bs = 'tp',df=2)) +
+                 I(s(sqrt(e),bs = 'tp',df=2)) +
+                 I(s(sqrt(f),bs = 'tp',df=2)),
+               family=quasipoisson(),
+               data = y,
+               method = 'REML',
+               select = TRUE)
Error in terms.formula(reformulate(term[i])) : 
  invalid model formula in ExtractVars

查看traceback,我们可以看到:

8. terms.formula(reformulate(term[i]))
7. terms(reformulate(term[i]))
6. s(sqrt(d), bs = "tp", df = 2) at <text>#1
5. eval(parse(text = terms[i]), enclos = p.env, envir = mgcvns)

s()调用就是问题所在。
独立地尝试,看看我们是否得到不同的错误:

> s(e,bs = 'tp',df=2)
Error in terms.formula(reformulate(term[i])) : 
  invalid model formula in ExtractVars

还是个问题让我们查看mgcv::s()的文档

s(..., k=-1,fx=FALSE,bs="tp",m=NA,by=NA,xt=NULL,id=NULL,sp=NULL,pc=NULL)

它不需要自由度的论证。删除df =参数并删除I()

> model1 <- gam(n ~ b + c + s(sqrt(d),bs = 'tp') +
+                 s(sqrt(e),bs = 'tp') +
+                 s(sqrt(f),bs = 'tp'),
+               family=quasipoisson(),
+               data = y,
+               method = 'REML',
+               select = TRUE)
Error in smooth.construct.tp.smooth.spec(object, dk$data, dk$knots) : 
  A term has fewer unique covariate combinations than specified maximum degrees of freedom
In addition: Warning message:
In sqrt(e) : NaNs produced

它可以运行,尽管数据集太小,无法建立一个好的模型。

相关问题