scipy 如何在多个pandas列上运行t-test

ukxgm1gy  于 2024-01-09  发布在  其他
关注(0)|答案(1)|浏览(203)

我想写一段代码(用几行),同时在ProductPurchase_costwarranty_yearsservice_cost上运行t检验。

  1. # dataset
  2. import pandas as pd
  3. from scipy.stats import ttest_ind
  4. data = {'Product': ['laptop', 'printer','printer','printer','laptop','printer','laptop','laptop','printer','printer'],
  5. 'Purchase_cost': [120.09, 150.45, 300.12, 450.11, 200.55,175.89,124.12,113.12,143.33,375.65],
  6. 'Warranty_years':[3,2,2,1,4,1,2,3,1,2],
  7. 'service_cost': [5,5,10,4,7,10,4,6,12,3]
  8. }
  9. df = pd.DataFrame(data)
  10. print(df)

字符串
代码尝试为ProductPurchase_cost。我想运行t-测试为Productwarranty_yearsProductservice cost

  1. #define samples
  2. group1 = df[df['Product']=='laptop']
  3. group2 = df[df['Product']=='printer']
  4. #perform independent two sample t-test
  5. ttest_ind(group1['Purchase_cost'], group2['Purchase_cost'])

wlp8pajw

wlp8pajw1#

ttest_ind可以在2D(ND)输入上工作:

  1. cols = df.columns.difference(['Product'])
  2. # or with an explicit list
  3. # cols = ['Purchase_cost', 'Warranty_years', 'service_cost']
  4. group1 = df[df['Product']=='laptop']
  5. group2 = df[df['Product']=='printer']
  6. out = pd.DataFrame(ttest_ind(group1[cols], group2[cols]),
  7. columns=cols, index=['statistic', 'pvalue'])

字符串
如果不是,你可以使用一个字典理解循环你的列:

  1. out = pd.DataFrame({c: ttest_ind(group1[c], group2[c]) for c in cols},
  2. index=['statistic', 'pvalue'])


输出量:

  1. Purchase_cost Warranty_years service_cost
  2. statistic -1.861113 3.513240 -0.919464
  3. pvalue 0.099760 0.007924 0.384738

泛化到更多对

如果您的产品不仅仅是笔记本电脑/打印机,并且希望比较所有配对,您可以概括为:

  1. from itertools import combinations
  2. cols = df.columns.difference(['Product'])
  3. g = df.groupby('Product')[cols]
  4. out = pd.concat({(a,b): pd.DataFrame(ttest_ind(g.get_group(a), g.get_group(b)),
  5. columns=cols, index=['statistic', 'pvalue'])
  6. for a, b in combinations(df['Product'].unique(), 2)
  7. }, names=['product1', 'product2'])


带有额外类别的输出示例(电话):

  1. Purchase_cost Warranty_years service_cost
  2. product1 product2
  3. laptop printer statistic -1.861113 3.513240 -0.919464
  4. pvalue 0.099760 0.007924 0.384738
  5. phone statistic -1.945836 2.988072 2.766417
  6. pvalue 0.109251 0.030515 0.039533
  7. printer phone statistic -1.286968 0.423659 1.893370
  8. pvalue 0.239026 0.684528 0.100178

  • 如果您有许多组合,请注意,您可能应该对数据进行后处理以考虑multiple testing
展开查看全部

相关问题