如何完成一个变量在其因子水平上的R相关性，并按日期匹配

axr492tv 于 2023-02-14 发布在其他

关注(0)|答案(2)|浏览(137)

我试图根据因子水平确定变量（浓度，见下文）不同子集之间的相关性--在本例中，湖=（A，B，C）--换句话说，测试A处浓度测量值与B处浓度测量值之间的相关性，然后测试B处浓度测量值与C处浓度测量值之间的相关性，以及A处浓度测量值与C处浓度测量值之间的相关性。
问题是基于因子的子集长度不同，但我只想在相关性中包含日期精确匹配的观测。我尝试在cor. test函数中使用='complete. obs '，希望这样做可以达到目的，但没有成功。

res <- cor.test(Data$Concentration[Data$Lake=="A"], 
            Data$Concentration[Data$Lake=="B"], 
            use='complete.obs', 
            method = "pearson")

但我得到了

Error in cor.test.default(Data$Concentration[Data$Lake=="A"],  : 
  'x' and 'y' must have the same length

尝试搜索，但找不到解决方案。这是什么，可能可以解决与融化/重塑或也许有一个更简单的解决方案，我没有看到。谢谢。
数据如下...

structure(list(Lake = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 
2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L), .Label = c("A", "B", 
"C"), class = "factor"), Date = structure(c(2L, 3L, 4L, 5L, 7L, 
8L, 9L, 1L, 3L, 4L, 6L, 7L, 2L, 3L, 4L, 6L, 7L), .Label = c("1970-04-06", 
"1970-04-07", "1970-04-28", "1970-05-04", "1970-05-14", "1970-05-15", 
"1970-05-28", "1970-05-29", "1970-05-30"), class = "factor"), 
    Concentration = c(10L, 20L, 30L, 40L, 50L, 50L, 50L, 100L, 
    200L, 280L, 410L, 500L, 1L, 3L, 8L, 90L, 1200L)), .Names = c("Lake", 
"Date", "Concentration"), class = "data.frame", row.names = c(NA, 
-17L))

来源：https://stackoverflow.com/questions/62473889/how-can-i-complete-a-correlation-in-r-of-one-variable-across-its-factor-levels

2条答案

按热度按时间

falq053o1#

如果您只需要相关性，您可以执行以下操作：

library(tidyr)
data_wide = Data %>% pivot_wider(names_from="Lake",values_from="Concentration")
data_wide

# A tibble: 9 x 4
  Date           A     B     C
  <fct>      <int> <int> <int>
1 1970-04-07    10    NA     1
2 1970-04-28    20   200     3
3 1970-05-04    30   280     8
4 1970-05-14    40    NA    NA
5 1970-05-28    50   500  1200
6 1970-05-29    50    NA    NA
7 1970-05-30    50    NA    NA
8 1970-04-06    NA   100    NA
9 1970-05-15    NA   410    90

cor(data_wide[,-1],use="p")
          A         B         C
A 1.0000000 0.9973327 0.8805841
B 0.9973327 1.0000000 0.8014733
C 0.8805841 0.8014733 1.0000000

如果你需要相关性和p值，比如使用cor.test，那么就需要更多的编码：

pw = combn(levels(Data$Lake),2)
pw
     [,1] [,2] [,3]
[1,] "A"  "A"  "B" 
[2,] "B"  "C"  "C" 

library(broom)
library(dplyr)
pairwise_c = apply(pw,2,function(i){
tidy(cor.test(data_wide[[i[1]]],data_wide[[i[2]]])))
})

cbind(data.frame(t(pw)),bind_rows(pairwise_c))

  X1 X2  estimate statistic    p.value parameter
1  A  B 0.9973327 13.663956 0.04650826         1
2  A  C 0.8805841  2.627897 0.11941589         2
3  B  C 0.8014733  1.895312 0.19852670         2
                                method alternative   conf.low conf.high
1 Pearson's product-moment correlation   two.sided         NA        NA
2 Pearson's product-moment correlation   two.sided -0.5238283 0.9974832
3 Pearson's product-moment correlation   two.sided -0.6948359 0.9956362

赞(0）回复(0）举报 2023-02-14

nukf8bse2#

使用dplyr/tidyr：

Data <- Data %>%
  pivot_wider(names_from="Lake", values_from="Concentration") %>%
  drop_na()

给你

# A tibble: 3 x 4
  Date           A     B     C
  <fct>      <int> <int> <int>
1 1970-04-28    20   200     3
2 1970-05-04    30   280     8
3 1970-05-28    50   500  1200

现在，通过以下方式获得所需的相关性

cor.test(Data$A, Data$B, method = "pearson")

赞(0）回复(0）举报 2023-02-14

我来回答

如何完成一个变量在其因子水平上的R相关性，并按日期匹配

2条答案

相关问题

热门标签

最新问答