pandas gseapy.gsea(python)中的错误:assert len(dat)> 1

ffscu2ro  于 2023-06-20  发布在  Python
关注(0)|答案(1)|浏览(88)

我尝试在我的样本上运行GSEA(n=3),我一直得到这个错误,我不明白为什么。
错误:
运行GSEA ----> 2gsea = gseapy.gsea(data=gsea_df_HDM,#处理的数据矩阵,
3 gene_sets=gene_sets,#pathways DataFrame转换为字典4 cls=gsea_df_condition[“Condition”],#元数据列5 permutation_type ='phenotype',6 min_size=2,#待测试的途径中化合物的最小数量7 permutation_num=100,#减少数量以加速测试8 outdir=None,#不将输出写入磁盘9 method ='signal_to_noise',#排名度量10个线程=4,种子= 7)
File ~/opt/anaconda3/envs/Python_3104/lib/python3.10/site-packages/gseapy/init.py:150,in gsea(data,gene_sets,cls,outdir,min_size,max_size,permutation_num,weighted_score_type,permutation_type,method,ascending,threads,figsize,format,graph_num,no_plot,seed,verbose,*arg,**kwarg)128 threads = kwarg[“processes”] 130 gs = GSEA(131 data,132 gene_sets,(...)148 verbose,149)--> 150 gs.run()152 return gs ... --> 265 assert len(dat)> 1 266 #筛选出基因集并构建基因集字典267 gmt = self.load_gmt(gene_list=dat.index.values,gmt=self.gene_sets)
输入:运行GSEA

gsea = gseapy.gsea(data=gsea_df_HDM, # the processed data matrix,  
                 gene_sets=gene_sets, #pathways DataFrame converted to a dictionary
                 cls=gsea_df_condition["Condition"], # the metadata column
                 permutation_type='phenotype',
                 min_size=2, # minimum number of compounds in a pathway for it to be tested
                 permutation_num=100, # reduce number to speed up test
                 outdir=None,  # do not write output to disk
                 method='signal_to_noise', # Ranking metric
                 threads=4, seed= 7)

数据信息gsea_df_HDM:type:pandas.core.frame.DataFrame shape:(578,6)(rows = protein uniport IDs,columns = samples)
gsea_df_condition[“Condition”]:type:pandas.core.series.series.Series lens:6图纸:HDM_CD101_neg_grouped_1 HDM_CD101_neg_grouped_2 HDM_CD101_neg_grouped_2 HDM_CD101_neg_grouped_3 HDM_CD101_neg_grouped_3 HDM_CD101_neg_grouped_1 HDM_CD101_pos_grouped_1 HDM_CD101_pos_grouped_2 HDM_CD101_pos_grouped_3 HDM_CD101_pos_grouped_3 HDM_CD101_pos_grouped_group名称:条件,数据类型:客体
我试着将gsea_df_condition[“Condition”]更改为列表,确认了它的形状,我试着从另一个表中提取元数据,等等。我已确认长度>1
我试着查看源代码,以更好地理解错误是从哪里来的,但我还不够先进,无法弄清楚它。
我还用一个示例数据集运行了它,它工作。我的输入是相同的格式,所以我不知道哪里错了。
救命啊!
这里是一些源代码(我得到错误的部分)开始分析

self._logger.info("Parsing data files for GSEA.............................")
        # phenotype labels parsing
        cls_vector = self.load_classes()
        # select correct expression genes and values.
        dat, cls_dict = self.load_data(cls_vector)
        self.cls_dict = cls_dict
        # data frame must have length > 1
        assert len(dat) > 1
        # filtering out gene sets and build gene sets dictionary
        gmt = self.load_gmt(gene_list=dat.index.values, gmt=self.gene_sets)
        self.gmt = gmt
        self._logger.info(
            "%04d gene_sets used for further statistical testing....." % len(gmt)
        )
        self._logger.info("Start to run GSEA...Might take a while..................")
        # cpu numbers
        # compute ES, NES, pval, FDR, RES
        if self.permutation_type == "gene_set":
bmp9r5qi

bmp9r5qi1#

问题是我的dataframe是一个“对象”而不是数字,所以我得到了一个空的dataframe用于计算。解决方案为:
gsea_df_HDM = gsea_df_HDM.astype(float)

相关问题