将复杂的文本列从pandas保存到csv/Excel

djmepvbi  于 2023-05-20  发布在  其他
关注(0)|答案(1)|浏览(108)

我处理文本数据,我有一个df,我想保存到Excel或csv,以查看实际的整体图片(非常烦人,只是检查在Pandas)。问题是,显然是因为我的df有未清理的文本(我不想在这一点上清理它显着,因为我可能仍然需要它),这个df不能保存到excel。三个df看起来像这样:

df = pd.DataFrame({
    'journal_lines': ['ASTRONOMY AND ASTROPHYSICS', 'ASTRONOMY AND ASTROPHYSICS', 'ATOMIC AND NUCLEAR PHYSICS'],
    'abstract_lines': ['ABSTRACT 1', 'ABSTRACT 2', 'ABSTRACT 3'],
    'themes_lines': ['(Classical astronomy; Interplanetary and inter...', '(Classical astronomy; Interplanetary and inter...', '(Radiation; Nuclear reactions; Accelerators; S...'],
    'type_document': ['GSFC Document', 'GSFC Document', 'GSFC Document'],
    'document_digit': ['X-611-67-8', 'X-611-67-52', 'X-640-66-514'],
    'document_number': ['GSFC Document X-611-67-8', 'GSFC Document X-611-67-52', 'GSFC Document X-640-66-514'],
    'extracted_text': [
        'A STUDY OF LOW ENERGY GALACTIC\nCOSMIC RAYS FROM 1961 TO 1965\nB. Teegarden January 1967\nThe results from a series of balloon flights\nbeginning in 1961 and ending in 1965 are pre-\nsented. Measurements of the cosmic ray in-\ntensity were made using a dE/dx and E detector\nsensitive to energies from 15to 80 MeV "nucleon.\nThe early balloon flights provided design informa-\ntion and also aided in the development of data\nhandling techniques for later satellite versions\nof the detector which have been on IMP’s I, Il,\nand Ill and OGO’s | and Il. Proton and helium\nintensities at 85 MeV/nucleon are presented for\nthe five year period covered by the balloon flight\nseries. The behavior of the proton to helium\nratio as a function of time is discussed within\nthe framework of Parker’s model for the solar\nmodulation of cosmic rays.\n\nIn 1965 a modified version of the dE/dx and\nE detector with an extended energy range was\nflown for the first time. A cosmic ray helium\nspectrum from 60 to 500 MeV/nucleon measured\nby this detector is presented. The change in\nproton and helium intensities in this energy\nregion from 1963 to 1965 is examined and com-\npared with the results predicted by the various\nspecial cases of Parker’s model.\n\nA totally empirical atmospheric secondary\nproton spectrum is derived, based on simulta-\nneous balloon and satellite measurements. This\n\nGo gle\n\nGSFC DOCUMENTS\n\nspectrum is compared with the secondary spec-\ntrum obtained from a nuclear emulsion measure-\nment and the differences are discussed. Using\nour empirical secondary spectrum, we obtain an\nupper limit for the re-entrant albedo at Sioux\nFalls which is significantly less than values\nreported by other observers.\n\nA measurement of the intensities of secondary\ndeuterium and tritium was made in 1965. Using\nthese results, we obtain a value for the global\naverage production of tritium in the earth’s\natmosphere. The implications of this result\nregarding the problem of tritium balance in the\natmosphere are discussed.\n\n2'
    ],
    'date': ['January 1967', 'February 1967', 'November 1966'],
    'article_name': [
        'A STUDY OF LOW ENERGY GALACTIC\nCOSMIC RAYS FROM 1961 TO 1965\nB. Teegarden January 1967',
        'THE FLUX OF HEAVY NUCLEI IN THE PRI-\nMARY COS... February 1967',
        'EXCITATION OF HYDROGEN MOLECULE BY\nELECTRON I... November 1966'
    ]
})

我尝试的是使用分隔符“#”并将换行符替换为“\n”

# replace line breaks with "\n"
df['extracted_text'] = df['extracted_text'].str.replace('\n', '\\n')

# save DataFrame to CSV with '#' as separator
df.to_csv('output.csv', sep='#', index=False)

代码运行。然而,这并没有帮助。当我通过excel打开csv(导入为文本数据,告诉分隔符是#),它仍然会混淆抽象与文本,并把它全部,而不是有一个结构化的df。
我该怎么做呢?

6mw9ycah

6mw9ycah1#

最后我找到了一种方法,至少在Excel上是这样的。与csv仍然不知道,但工程不知何故与excel,我猜这是罚款...

import xlwt

# Create a new workbook and add a sheet
workbook = xlwt.Workbook()
sheet = workbook.add_sheet('Sheet1')

# Write column headers
headers = list(df.columns)
for col_index, header in enumerate(headers):
    sheet.write(0, col_index, header)

# Write data rows
for row_index, row in enumerate(df.itertuples(), start=1):
    for col_index, value in enumerate(row[1:], start=0):
        sheet.write(row_index, col_index, str(value))

# Save the workbook to an Excel file
workbook.save('output.xls')

相关问题