我处理文本数据,我有一个df,我想保存到Excel或csv,以查看实际的整体图片(非常烦人,只是检查在Pandas)。问题是,显然是因为我的df有未清理的文本(我不想在这一点上清理它显着,因为我可能仍然需要它),这个df不能保存到excel。三个df看起来像这样:
df = pd.DataFrame({
'journal_lines': ['ASTRONOMY AND ASTROPHYSICS', 'ASTRONOMY AND ASTROPHYSICS', 'ATOMIC AND NUCLEAR PHYSICS'],
'abstract_lines': ['ABSTRACT 1', 'ABSTRACT 2', 'ABSTRACT 3'],
'themes_lines': ['(Classical astronomy; Interplanetary and inter...', '(Classical astronomy; Interplanetary and inter...', '(Radiation; Nuclear reactions; Accelerators; S...'],
'type_document': ['GSFC Document', 'GSFC Document', 'GSFC Document'],
'document_digit': ['X-611-67-8', 'X-611-67-52', 'X-640-66-514'],
'document_number': ['GSFC Document X-611-67-8', 'GSFC Document X-611-67-52', 'GSFC Document X-640-66-514'],
'extracted_text': [
'A STUDY OF LOW ENERGY GALACTIC\nCOSMIC RAYS FROM 1961 TO 1965\nB. Teegarden January 1967\nThe results from a series of balloon flights\nbeginning in 1961 and ending in 1965 are pre-\nsented. Measurements of the cosmic ray in-\ntensity were made using a dE/dx and E detector\nsensitive to energies from 15to 80 MeV "nucleon.\nThe early balloon flights provided design informa-\ntion and also aided in the development of data\nhandling techniques for later satellite versions\nof the detector which have been on IMP’s I, Il,\nand Ill and OGO’s | and Il. Proton and helium\nintensities at 85 MeV/nucleon are presented for\nthe five year period covered by the balloon flight\nseries. The behavior of the proton to helium\nratio as a function of time is discussed within\nthe framework of Parker’s model for the solar\nmodulation of cosmic rays.\n\nIn 1965 a modified version of the dE/dx and\nE detector with an extended energy range was\nflown for the first time. A cosmic ray helium\nspectrum from 60 to 500 MeV/nucleon measured\nby this detector is presented. The change in\nproton and helium intensities in this energy\nregion from 1963 to 1965 is examined and com-\npared with the results predicted by the various\nspecial cases of Parker’s model.\n\nA totally empirical atmospheric secondary\nproton spectrum is derived, based on simulta-\nneous balloon and satellite measurements. This\n\nGo gle\n\nGSFC DOCUMENTS\n\nspectrum is compared with the secondary spec-\ntrum obtained from a nuclear emulsion measure-\nment and the differences are discussed. Using\nour empirical secondary spectrum, we obtain an\nupper limit for the re-entrant albedo at Sioux\nFalls which is significantly less than values\nreported by other observers.\n\nA measurement of the intensities of secondary\ndeuterium and tritium was made in 1965. Using\nthese results, we obtain a value for the global\naverage production of tritium in the earth’s\natmosphere. The implications of this result\nregarding the problem of tritium balance in the\natmosphere are discussed.\n\n2'
],
'date': ['January 1967', 'February 1967', 'November 1966'],
'article_name': [
'A STUDY OF LOW ENERGY GALACTIC\nCOSMIC RAYS FROM 1961 TO 1965\nB. Teegarden January 1967',
'THE FLUX OF HEAVY NUCLEI IN THE PRI-\nMARY COS... February 1967',
'EXCITATION OF HYDROGEN MOLECULE BY\nELECTRON I... November 1966'
]
})
我尝试的是使用分隔符“#”并将换行符替换为“\n”
# replace line breaks with "\n"
df['extracted_text'] = df['extracted_text'].str.replace('\n', '\\n')
# save DataFrame to CSV with '#' as separator
df.to_csv('output.csv', sep='#', index=False)
代码运行。然而,这并没有帮助。当我通过excel打开csv(导入为文本数据,告诉分隔符是#),它仍然会混淆抽象与文本,并把它全部,而不是有一个结构化的df。
我该怎么做呢?
1条答案
按热度按时间6mw9ycah1#
最后我找到了一种方法,至少在Excel上是这样的。与csv仍然不知道,但工程不知何故与excel,我猜这是罚款...