Python导出为.csv,但不覆盖for循环中的列

ee7vknir  于 2022-11-26  发布在  Python
关注(0)|答案(1)|浏览(119)

在Python 3中,我尝试将多个文档中的数据(在for循环中实现)写入csv文件。然而,每一次都要覆盖该列。我如何才能使单个文档中的数据打印在csv文件的下面几行中,而不被覆盖呢?

from pdfminer.high_level import extract_text
for selectedfile in glob.glob(r'C:\Users\...\*.pdf'):
    text = extract_text(selectedfile)

Y = set(text)
Z = []
Znew = []
for val in Y:
    occurrences = wordlist2.count(val)
    if occurrences > 50:  # define min. no. of occurrences
        # print(val, ':', occurrences)
        Z.append(val)
        Znew.append(occurrences)

dict = {'Stem': Z, 'Count': Znew}
df = pd.DataFrame(dict)
df.to_csv('Exported list.csv', header=True, index=True, encoding='utf-8')
1qczuiv0

1qczuiv01#

问题出在第一个for循环中。您一直用新提取的文本替换text,并且只处理最后一个提取。您可以将处理过程移到for循环中来处理每个提取。在本例中,我事先打开了文件,并写入了文件头一次。接下来的问题是确保每次写入的索引都是正确的。

from pdfminer.high_level import extract_text
import pandas as pd
import numpy as np

with open('Exported list.csv', 'w', encoding='utf-8') as outfile:
    outfile.write(",Stem,Count\n") # header
    base = 0
    for selectedfile in glob.glob(r'C:\Users\...\*.pdf'):
        text = extract_text(selectedfile)

        Y = set(text)
        Z = []
        Znew = []
        for val in Y:
            occurrences = wordlist2.count(val)
            if occurrences > 50:  # define min. no. of occurrences
                # print(val, ':', occurrences)
                Z.append(val)
                Znew.append(occurrences)

        dict = {'Stem': Z, 'Count': Znew}
        df = pd.DataFrame(dict, index=np.arange(base, base+len(Z)))
        df.to_csv(outfile, index=True)
        base += len(Z)

相关问题