我试图使用弗里德曼测试来识别统计学上显著的蛋白质。对于样本数据集，数值为中位数（因为进行了多项试验，但已进行了总结）。有两种单独的培养基和每种培养基的对照。已经收集了给定时间段内每种培养基中蛋白质的量的数据。我的目标是比较两种培养基中的蛋白质，并确定哪些蛋白质在两种培养基中的含量存在统计学差异。
为了做到这一点，我尝试了弗里德曼测试，然后使用事后邓恩测试。我有下面的代码，但我在让它工作时遇到了麻烦。我想知道是否有人可以提供任何帮助或见解，为什么我的代码不工作？如果您有其他建议的测试等。为我所用那也会很受欢迎

import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)
from scipy.stats import friedmanchisquare
from scikit_posthocs import posthoc_dunn

# Sample data
data = {
    'Protein': ['Protein1', 'Protein2', 'Protein3', 'Protein4', 'Protein5'],
    'Control_Medium1': [1.0, 1.1, 1.2, 1.0, 1.3],
    'Control_Medium2': [1.1, 1.0, 1.1, 1.2, 1.0],
    'Medium1_T1': [1.5, 1.5, 1.8, 2.0, 1.6],
    'Medium1_T2': [1.2, 1.3, 1.7, 1.9, 2.1],
    'Medium1_T3': [1.1, 1.6, 1.9, 1.4, 1.7],
    'Medium2_T1': [0.1, 1.2, 1.7, 1.8, 1.5],
    'Medium2_T2': [0.2, 1.4, 1.8, 1.7, 1.9],
    'Medium2_T3': [0.3, 1.5, 1.3, 1.7, 1.4],
}

df = pd.DataFrame(data)

# Perform Friedman test
control_medium1 = df['Control_Medium1']
control_medium2 = df['Control_Medium2']
medium1_t1 = df['Medium1_T1']
medium1_t2 = df['Medium1_T2']
medium1_t3 = df['Medium1_T3']
medium2_t1 = df['Medium2_T1']
medium2_t2 = df['Medium2_T2']
medium2_t3 = df['Medium2_T3']

_, p_value = friedmanchisquare(control_medium1, control_medium2, medium1_t1, medium1_t2, medium1_t3,
                               medium2_t1, medium2_t2, medium2_t3)

# Print Friedman test result
print('Friedman Test:')
print('p-value:', p_value)

# Perform pairwise comparisons with posthoc test
data_for_posthoc = [medium1_t1, medium1_t2, medium1_t3, medium2_t1, medium2_t2, medium2_t3]
labels_for_posthoc = ['Medium1_T1', 'Medium1_T2', 'Medium1_T3', 'Medium2_T1', 'Medium2_T2', 'Medium2_T3']

posthoc_result = sp.posthoc_dunn(data_for_posthoc)

# Apply Bonferroni correction to p-values
alpha = 0.05
corrected_p_values = np.multiply(posthoc_result, len(posthoc_result.columns))

# Get the significantly different proteins
significant_proteins = []

for protein in df.columns[1:]:
    if any(corrected_p_values[protein] < alpha):
        significant_proteins.append(protein)

print('Significant Proteins:', significant_proteins)

字符串
我试着在网上寻找关于这些统计测试代码的更多信息，但我一直无法找到为什么我的代码不工作。

import numpy as np import pandas as pd from scikit_posthocs import posthoc_dunn from scipy.stats import friedmanchisquare df = pd.DataFrame({ 'Control_Medium1': [1.0, 1.1, 1.2, 1.0, 1.3], 'Control_Medium2': [1.1, 1.0, 1.1, 1.2, 1.0], 'Medium1_T1': [1.5, 1.5, 1.8, 2.0, 1.6], 'Medium1_T2': [1.2, 1.3, 1.7, 1.9, 2.1], 'Medium1_T3': [1.1, 1.6, 1.9, 1.4, 1.7], 'Medium2_T1': [0.1, 1.2, 1.7, 1.8, 1.5], 'Medium2_T2': [0.2, 1.4, 1.8, 1.7, 1.9], 'Medium2_T3': [0.3, 1.5, 1.3, 1.7, 1.4], }, index=pd.Index(name='Protein', data=['Protein1', 'Protein2', 'Protein3', 'Protein4', 'Protein5'])) df.columns.name = 'Medium' _, p_value = friedmanchisquare(*( col for name, col in df.items() )) print(f'Friedman Test: p-value = {p_value:.2e}') print() long_df = df.iloc[:, 2:].stack() long_df.name = 'Concentration' long_df = long_df.reset_index(level='Medium') posthoc = posthoc_dunn(a=long_df, group_col='Medium', val_col='Concentration') # , p_adjust='bonferroni') # This is not equivalent to skikit's Bonferroni correction! posthoc *= posthoc.shape[1] # Discard upper symmetric half posthoc.values[np.triu_indices_from(posthoc, 0)] = np.nan posthoc.index.name = 'MediumA' posthoc.columns.name = 'MediumB' posthoc = posthoc.stack() # discards NaN posthoc.name = 'P-value' alpha = 1.0 # adjusted for demonstration significant = posthoc[posthoc < alpha] print('Significant media:') print(significant.to_string())

1条答案

按热度按时间

93ze6v8z1#

问题的根源在于列表、Numpy数组和Pandas数据框架之间的重复和严重混淆。对于这个应用程序，你基本上不应该离开dataframes。

个字符
更好的一点是，你的最后一次迭代似乎是可疑的。你的事后分析不是比较蛋白质对，而是比较实验组（中等）对：
x1c 0d1x的数据
这就是为什么当前输出显示介质名称而不是蛋白质名称的原因。
你的修正系数 * 不 * 等同于Bonferroni，你应该使用Skikit的内置修正，但我已经留下了评论。
根据您当前的指标，没有统计上显著的记录，因此我显示了上面的alpha=1.0的输出。
这似乎是可疑的，你包括你的控制弗里德曼测试，但没有邓恩测试;但我没动

赞(0）回复(0）举报 2023-08-05

scipy 在Python中使用Dunn测试的困难

1条答案

相关问题

热门标签

最新问答