Python导出为.csv，但不覆盖for循环中的列

ee7vknir 于 2022-11-26 发布在 Python

关注(0)|答案(1)|浏览(119)

在Python 3中，我尝试将多个文档中的数据（在for循环中实现）写入csv文件。然而，每一次都要覆盖该列。我如何才能使单个文档中的数据打印在csv文件的下面几行中，而不被覆盖呢？

from pdfminer.high_level import extract_text
for selectedfile in glob.glob(r'C:\Users\...\*.pdf'):
    text = extract_text(selectedfile)

Y = set(text)
Z = []
Znew = []
for val in Y:
    occurrences = wordlist2.count(val)
    if occurrences > 50:  # define min. no. of occurrences
        # print(val, ':', occurrences)
        Z.append(val)
        Znew.append(occurrences)

dict = {'Stem': Z, 'Count': Znew}
df = pd.DataFrame(dict)
df.to_csv('Exported list.csv', header=True, index=True, encoding='utf-8')

python-3.x

来源：https://stackoverflow.com/questions/74509940/python-export-to-csv-without-overwriting-columns-in-for-loop

1条答案

按热度按时间

1qczuiv01#

问题出在第一个for循环中。您一直用新提取的文本替换text，并且只处理最后一个提取。您可以将处理过程移到for循环中来处理每个提取。在本例中，我事先打开了文件，并写入了文件头一次。接下来的问题是确保每次写入的索引都是正确的。

from pdfminer.high_level import extract_text
import pandas as pd
import numpy as np

with open('Exported list.csv', 'w', encoding='utf-8') as outfile:
    outfile.write(",Stem,Count\n") # header
    base = 0
    for selectedfile in glob.glob(r'C:\Users\...\*.pdf'):
        text = extract_text(selectedfile)

        Y = set(text)
        Z = []
        Znew = []
        for val in Y:
            occurrences = wordlist2.count(val)
            if occurrences > 50:  # define min. no. of occurrences
                # print(val, ':', occurrences)
                Z.append(val)
                Znew.append(occurrences)

        dict = {'Stem': Z, 'Count': Znew}
        df = pd.DataFrame(dict, index=np.arange(base, base+len(Z)))
        df.to_csv(outfile, index=True)
        base += len(Z)

赞(0）回复(0）举报 2022-11-26

我来回答

Python导出为.csv，但不覆盖for循环中的列

1条答案

相关问题

热门标签

最新问答