我尝试在我的样本上运行GSEA(n=3),我一直得到这个错误,我不明白为什么。
错误:
运行GSEA ----> 2gsea = gseapy.gsea(data=gsea_df_HDM,#处理的数据矩阵,
3 gene_sets=gene_sets,#pathways DataFrame转换为字典4 cls=gsea_df_condition[“Condition”],#元数据列5 permutation_type ='phenotype',6 min_size=2,#待测试的途径中化合物的最小数量7 permutation_num=100,#减少数量以加速测试8 outdir=None,#不将输出写入磁盘9 method ='signal_to_noise',#排名度量10个线程=4,种子= 7)
File ~/opt/anaconda3/envs/Python_3104/lib/python3.10/site-packages/gseapy/init.py:150,in gsea(data,gene_sets,cls,outdir,min_size,max_size,permutation_num,weighted_score_type,permutation_type,method,ascending,threads,figsize,format,graph_num,no_plot,seed,verbose,*arg,**kwarg)128 threads = kwarg[“processes”] 130 gs = GSEA(131 data,132 gene_sets,(...)148 verbose,149)--> 150 gs.run()152 return gs ... --> 265 assert len(dat)> 1 266 #筛选出基因集并构建基因集字典267 gmt = self.load_gmt(gene_list=dat.index.values,gmt=self.gene_sets)
输入:运行GSEA
gsea = gseapy.gsea(data=gsea_df_HDM, # the processed data matrix,
gene_sets=gene_sets, #pathways DataFrame converted to a dictionary
cls=gsea_df_condition["Condition"], # the metadata column
permutation_type='phenotype',
min_size=2, # minimum number of compounds in a pathway for it to be tested
permutation_num=100, # reduce number to speed up test
outdir=None, # do not write output to disk
method='signal_to_noise', # Ranking metric
threads=4, seed= 7)
数据信息gsea_df_HDM:type:pandas.core.frame.DataFrame shape:(578,6)(rows = protein uniport IDs,columns = samples)
gsea_df_condition[“Condition”]:type:pandas.core.series.series.Series lens:6图纸:HDM_CD101_neg_grouped_1 HDM_CD101_neg_grouped_2 HDM_CD101_neg_grouped_2 HDM_CD101_neg_grouped_3 HDM_CD101_neg_grouped_3 HDM_CD101_neg_grouped_1 HDM_CD101_pos_grouped_1 HDM_CD101_pos_grouped_2 HDM_CD101_pos_grouped_3 HDM_CD101_pos_grouped_3 HDM_CD101_pos_grouped_group名称:条件,数据类型:客体
我试着将gsea_df_condition[“Condition”]更改为列表,确认了它的形状,我试着从另一个表中提取元数据,等等。我已确认长度>1
我试着查看源代码,以更好地理解错误是从哪里来的,但我还不够先进,无法弄清楚它。
我还用一个示例数据集运行了它,它工作。我的输入是相同的格式,所以我不知道哪里错了。
救命啊!
这里是一些源代码(我得到错误的部分)开始分析
self._logger.info("Parsing data files for GSEA.............................")
# phenotype labels parsing
cls_vector = self.load_classes()
# select correct expression genes and values.
dat, cls_dict = self.load_data(cls_vector)
self.cls_dict = cls_dict
# data frame must have length > 1
assert len(dat) > 1
# filtering out gene sets and build gene sets dictionary
gmt = self.load_gmt(gene_list=dat.index.values, gmt=self.gene_sets)
self.gmt = gmt
self._logger.info(
"%04d gene_sets used for further statistical testing....." % len(gmt)
)
self._logger.info("Start to run GSEA...Might take a while..................")
# cpu numbers
# compute ES, NES, pval, FDR, RES
if self.permutation_type == "gene_set":
1条答案
按热度按时间bmp9r5qi1#
问题是我的dataframe是一个“对象”而不是数字,所以我得到了一个空的dataframe用于计算。解决方案为:
gsea_df_HDM = gsea_df_HDM.astype(float)