在r中有两个data.frames
:
set.seed(12345)
df1 <- data.frame(a=rnorm(20,0,0.4),
b=rnorm(20,0.3,0.8),
c=rnorm(20,-0.1,0.6),
d=rnorm(20,-0.23,0.3),
e=rnorm(20,0.2,0.4))
library(purrr)
df1 <- as.data.frame(map_df(df1, function(x) {x[sample(c(TRUE, NA), prob = c(0.8, 0.2),
size = length(x), replace = TRUE)]}))
rownames(df1) <- sample(LETTERS, 20, replace=FALSE)
df2 <- data.frame(one=rnorm(23,6,2),
two=rnorm(23,8,4),
three=rnorm(23,12,5),
four=rnorm(23,4,0.4),
five=rnorm(23,3,0.2))
df2 <- as.data.frame(map_df(df2, function(x) {x[sample(c(TRUE, NA), prob = c(0.7, 0.3),
size = length(x), replace = TRUE)]}))
rownames(df2) <- sample(LETTERS, 23, replace=FALSE)
如何使用匹配的行id系统地确定两个data.frames
中每列之间的斯皮尔曼相关性和p值?因此,在df1
中的列“a”和df2
中的列“one”之间,在df1
中的列“a”和df2
中的列“two”之间,...,在df1
中的列“e”和df2
中的列“five”之间,创建一个新的数据.frame。
预期结果:
df3 <- data.frame(letter=c(rep("a", 5), rep("b", 2),"..."),
number=c("one","two","three","four","five","one","two", "..."),
Spearman.r=c(-0.6352, 0.0182, 0.5944, 0.3846, -0.6606, 0.1154, 0.2364, "..."),
p.value=c(0.0171, 0.9730, 0.0457, 0.2183, 0.0438, 0.7097, 0.4854, "..."))
- 我的尝试(不成功):*
我试过了,但结果与预期不同。我不知道怎么解决这个问题!!!
library(dplyr)
# Create empty data.frame for results
df3 <- data.frame(letter = character(),
number = character(),
Spearman.r = numeric(),
p.value = numeric(),
stringsAsFactors = FALSE)
# Loop through each column in df1 and df2
for (col1 in colnames(df1)) {
for (col2 in colnames(df2)) {
# Check for missing values in both 'x' and 'y'
valid_rows <- !is.na(df1[[col1]]) & !is.na(df2[[col2]])
x <- df1[[col1]][valid_rows]
y <- df2[[col2]][valid_rows]
# Calculate Spearman correlation and p-value
if (length(x) > 1 & length(y) > 1) {
result <- cor.test(x, y, method = "spearman")
# Append the results to df3
df3 <- df3 %>%
add_row(letter = col1,
number = col2,
Spearman.r = result$estimate,
p.value = result$p.value)
}
}
}
我认为上面的问题是行名称被忽略/不匹配。我们该怎么解决这个问题???
1条答案
按热度按时间mspsb9vt1#
我会把所需统计数据的元素名称放在一个向量中,
Map
cor.test
在相应的列上,rbind
是结果-它更简洁。注意,这两列应该有相同的长度,我相应地对df 2进行了子集化。