我有一个数据框
import pandas as pd
data_as_dict={'CHROM': {2241743: 21, 2241744: 3, 2241745: 5, 2241746: 6, 2241747: 6, 2241748: 11, 2241749: 16, 2241750: 20, 0: 'chr1', 1: 'chr2', 2: 'chr3', 3: 'chr5', 4: 'chr5', 5: 'chr8', 6: 'chr9', 7: 'chr10', 8: 'chr14', 9: 'chr14', 10: 'chr15', 11: 'chr15'}, 'POS_GRCh38': {2241743: 41668857, 2241744: 189487243, 2241745: 33946466, 2241746: 396321, 2241747: 32641081, 2241748: 89284793, 2241749: 89960104, 2241750: 34077942, 0: '233276815', 1: '217427435', 2: '169800667', 3: '1279675', 4: '112150207', 5: '32575278', 6: '97775520', 7: '103934543', 8: '36063370', 9: '36269155', 10: '67165147', 11: '67163292'}, 'REF': {2241743: 'T', 2241744: 'A', 2241745: 'A', 2241746: 'C', 2241747: 'C', 2241748: 'G', 2241749: 'T', 2241750: 'G', 0: 'G', 1: 'G', 2: 'G', 3: 'C', 4: 'T', 5: 'T', 6: 'C', 7: 'C', 8: 'C', 9: 'C', 10: 'G', 11: 'C'}, 'Effect_allele': {2241743: 'A', 2241744: 'T', 2241745: 'G', 2241746: 'T', 2241747: 'T', 2241748: 'A', 2241749: 'C', 2241750: 'A', 0: 'A', 1: 'C', 2: 'T', 3: 'T', 4: 'A', 5: 'G', 6: 'A', 7: 'T', 8: 'G', 9: 'T', 10: 'C', 11: 'T'}, 'Effect_size': {2241743: 0.094310679, 2241744: 0.122217633, 2241745: 0.527632769, 2241746: 0.482426149, 2241747: 0.157003749, 2241748: 0.148420005, 2241749: 0.285178942, 2241750: 0.2390169, 0: 0.2776317365982795, 1: 0.3576744442718159, 2: 0.2070141693843261, 3: 0.1823215567939546, 4: 0.3148107398400336, 5: 0.2776317365982795, 6: 0.5247285289349821, 7: 0.3435897043900768, 8: 0.3293037471426003, 9: 0.5933268452777344, 10: 0.2070141693843261, 11: 0.2151113796169455}, 'TYPE': {2241743: 'Basal_cell_carcinoma_PRSWeb', 2241744: 'Squamous_cell_carcinoma_PRSWeb', 2241745: 'Squamous_cell_carcinoma_PRSWeb', 2241746: 'Squamous_cell_carcinoma_PRSWeb', 2241747: 'Squamous_cell_carcinoma_PRSWeb', 2241748: 'Squamous_cell_carcinoma_PRSWeb', 2241749: 'Squamous_cell_carcinoma_PRSWeb', 2241750: 'Squamous_cell_carcinoma_PRSWeb', 0: 'THYROID_PGS', 1: 'THYROID_PGS', 2: 'THYROID_PGS', 3: 'THYROID_PGS', 4: 'THYROID_PGS', 5: 'THYROID_PGS', 6: 'THYROID_PGS', 7: 'THYROID_PGS', 8: 'THYROID_PGS', 9: 'THYROID_PGS', 10: 'THYROID_PGS', 11: 'THYROID_PGS'}, 'Cancer': {2241743: 'NMSC', 2241744: 'NMSC', 2241745: 'NMSC', 2241746: 'NMSC', 2241747: 'NMSC', 2241748: 'NMSC', 2241749: 'NMSC', 2241750: 'NMSC', 0: 'THYROID', 1: 'THYROID', 2: 'THYROID', 3: 'THYROID', 4: 'THYROID', 5: 'THYROID', 6: 'THYROID', 7: 'THYROID', 8: 'THYROID', 9: 'THYROID', 10: 'THYROID', 11: 'THYROID'}, 'Significant_YN': {2241743: 'Y', 2241744: 'Y', 2241745: 'Y', 2241746: 'Y', 2241747: 'Y', 2241748: 'Y', 2241749: 'Y', 2241750: 'Y', 0: 'Y', 1: 'Y', 2: 'Y', 3: 'Y', 4: 'Y', 5: 'Y', 6: 'Y', 7: 'Y', 8: 'Y', 9: 'Y', 10: 'Y', 11: 'Y'}}
all_cancers = pd.DataFrame.from_dict(data_as_dict)
我想在列CHROM
中缺少的地方追加字符串chr
。我可以在R中用grepl
和paste
来做这件事,但我想在Python中尝试一下。我想出了这两个命令,但不确定如何对列进行索引,因为pd.Series正在生成NAN。
pd.Series(all_cancers['CHROM']).str.contains(pat = 'chr', regex = True)
"chr" + all_cancers['CHROM'].map(str)
2条答案
按热度按时间z8dt9xmd1#
Pandas中的字符串操作没有经过优化,所以做你想做的事情的最好方法是通过列表理解。
同样的Pandas手术可以通过
mask()
方法进行。一个简单的
timeit
测试将显示列表Comp快得多。fykwrbwg2#
另见:
例如,使用NumPy的条件过滤器
np.where
: