当scipy.stats.spearmanr(df1,df2)用于两个数据框时,您可以将输出值解栈以获得2个numpy相关性和p值数组。如何将它们放回与原始数据框具有相同列名的数据框中?
correlation, p_value = scipy.stats.spearmanr(df1, df2)
生成2个numpy数组,但希望获得具有相关性的 Dataframe ,并且对于x列,p_value与每列相邻,如下所示:
不明显...这里有一些代码来获取随机数据(由ouroboros1建议)
import scipy.stats
import numpy as np
import pandas as pd
# Create array of 5 rows and 3 columns,
# filled with random values from 1 to 10
data = np.random.randint(1,10,size=(5,3))
# Create Dataframe with data
df1 = pd.DataFrame(data, columns=['Col_1','Col_2','Col_3'])
# Set target column in new data frame
df2 = df1['Col_3'].to_frame().copy()
df1.drop(['Col_3'], axis=1, inplace=True)
# Obtain correlation coefficient for each
# value in column against value in every other column
correlation, p_value = scipy.stats.spearmanr(df1, df2)
(编辑)我想我找到了一条路,但必须有一条更短的路:
# Concat frames that contain features with frame that
# contains target
correlation_frame = pd.concat([df1, df2], axis=1)
col_cor_list = list(correlation_frame)
# Create new frame containing all correlation coefficients
cor = pd.DataFrame(correlation, columns=col_list)
cor
Col_1 Col_2 Col_3
0 1.000000 -0.883883 -0.229416
1 -0.883883 1.000000 0.081111
2 -0.229416 0.081111 1.000000
# Get a dataframe for p-values
col_p_list = ['p_value' + str(x) for x in
range(1,len(p_value)+1)]
p_frame = pd.DataFrame(p_value, columns=col_p_list)
p_frame
p_value1 p_value2 p_value3
0 1.404265e-24 0.046662 0.710482
1 4.666188e-02 0.000000 0.896840
2 7.104817e-01 0.896840 0.000000
# Combine values by alternating column names
# so p-values are placed correctly
alter_names = (list(sum(zip(col_cor_list, col_p_list), ())))
final = cor.join(p_frame)
results = final[alter_names]
results
Col_1 p_value1 Col_2 p_value2 Col_3 p_value3
0 1.000000 1.404265e-24 -0.883883 0.046662 -0.229416 0.710482
1 -0.883883 4.666188e-02 1.000000 0.000000 0.081111 0.896840
2 -0.229416 7.104817e-01 0.081111 0.896840 1.000000 0.000000
1条答案
按热度按时间8xiog9wr1#
建议的重构版本:
获取相关性和p_value
变成df
np.hstack
将相关性和p_value按顺序水平堆叠。sorted
对列进行排序。Key由每个列名称末尾的连续数字组成,我们使用rsplit
提取它。因此,这里唯一的变量是
n
。例如,对于n = 5
,它将产生: