pandas gseapy.gsea(python)中的错误：assert len(dat)> 1

ffscu2ro 于 2023-06-20 发布在 Python

关注(0)|答案(1)|浏览(88)

我尝试在我的样本上运行GSEA（n=3），我一直得到这个错误，我不明白为什么。
错误：
运行GSEA ----> 2gsea = gseapy.gsea（data=gsea_df_HDM，#处理的数据矩阵，
3 gene_sets=gene_sets，#pathways DataFrame转换为字典4 cls=gsea_df_condition[“Condition”]，#元数据列5 permutation_type ='phenotype'，6 min_size=2，#待测试的途径中化合物的最小数量7 permutation_num=100，#减少数量以加速测试8 outdir=None，#不将输出写入磁盘9 method ='signal_to_noise'，#排名度量10个线程=4，种子= 7）
File ~/opt/anaconda3/envs/Python_3104/lib/python3.10/site-packages/gseapy/init.py：150，in gsea（data，gene_sets，cls，outdir，min_size，max_size，permutation_num，weighted_score_type，permutation_type，method，ascending，threads，figsize，format，graph_num，no_plot，seed，verbose，*arg，**kwarg）128 threads = kwarg[“processes”] 130 gs = GSEA（131 data，132 gene_sets，（...）148 verbose，149）--> 150 gs.run（）152 return gs ... --> 265 assert len（dat）> 1 266 #筛选出基因集并构建基因集字典267 gmt = self.load_gmt（gene_list=dat.index.values，gmt=self.gene_sets）
输入：运行GSEA

gsea = gseapy.gsea(data=gsea_df_HDM, # the processed data matrix,  
                 gene_sets=gene_sets, #pathways DataFrame converted to a dictionary
                 cls=gsea_df_condition["Condition"], # the metadata column
                 permutation_type='phenotype',
                 min_size=2, # minimum number of compounds in a pathway for it to be tested
                 permutation_num=100, # reduce number to speed up test
                 outdir=None,  # do not write output to disk
                 method='signal_to_noise', # Ranking metric
                 threads=4, seed= 7)

数据信息gsea_df_HDM：type：pandas.core.frame.DataFrame shape：（578，6）（rows = protein uniport IDs，columns = samples）
gsea_df_condition[“Condition”]：type：pandas.core.series.series.Series lens：6图纸：HDM_CD101_neg_grouped_1 HDM_CD101_neg_grouped_2 HDM_CD101_neg_grouped_2 HDM_CD101_neg_grouped_3 HDM_CD101_neg_grouped_3 HDM_CD101_neg_grouped_1 HDM_CD101_pos_grouped_1 HDM_CD101_pos_grouped_2 HDM_CD101_pos_grouped_3 HDM_CD101_pos_grouped_3 HDM_CD101_pos_grouped_group名称：条件，数据类型：客体
我试着将gsea_df_condition[“Condition”]更改为列表，确认了它的形状，我试着从另一个表中提取元数据，等等。我已确认长度>1
我试着查看源代码，以更好地理解错误是从哪里来的，但我还不够先进，无法弄清楚它。
我还用一个示例数据集运行了它，它工作。我的输入是相同的格式，所以我不知道哪里错了。
救命啊！
这里是一些源代码（我得到错误的部分）开始分析

self._logger.info("Parsing data files for GSEA.............................")
        # phenotype labels parsing
        cls_vector = self.load_classes()
        # select correct expression genes and values.
        dat, cls_dict = self.load_data(cls_vector)
        self.cls_dict = cls_dict
        # data frame must have length > 1
        assert len(dat) > 1
        # filtering out gene sets and build gene sets dictionary
        gmt = self.load_gmt(gene_list=dat.index.values, gmt=self.gene_sets)
        self.gmt = gmt
        self._logger.info(
            "%04d gene_sets used for further statistical testing....." % len(gmt)
        )
        self._logger.info("Start to run GSEA...Might take a while..................")
        # cpu numbers
        # compute ES, NES, pval, FDR, RES
        if self.permutation_type == "gene_set":

pandas

来源：https://stackoverflow.com/questions/76474452/error-in-gseapy-gsea-python-assert-lendat-1