pandas 将目录中的所有文本文档转换为html文档- python

stszievb 于 2023-08-01 发布在 Python

关注(0)|答案(2)|浏览(113)

我有一个接近10000文本文档，我需要转换/保存成.html，并使用python转换成pdf。
我尝试了一个叫做‘TextToHTML’的包
可以使用

pip install texttohtml

字符串
从终端：

python -m texttohtml.convert C:\Users\User\Downloads\AAPL -o C:\Users\User\Downloads\AAPL_pdfs

型
但没有成功。我运行了两次，它仍然没有给予我任何错误，或者它创建了任何html文档。
x1c 0d1x的数据

pandas

来源：https://stackoverflow.com/questions/76753573/convert-all-text-document-inside-a-directory-to-html-document-python

2条答案

按热度按时间

mrzz3bfm1#

我不对任何丢失的数据负责。
这应该贯穿您的目录，并尝试转换每个文件：

import os, subprocess

from_directory = "C:\\Users\\User\\Downloads\\AAPL"
to_directory = "C:\\Users\\User\\Downloads\\AAPL_pdfs"

for file in os.listdir(from_directory):
    if os.path.isfile(os.path.join(from_directory, file)):
        try:
            subprocess.run(f"python -m texttohtml.convert {os.path.join(from_directory, file)} -o {os.path.join(to_directory, file.split('.')[0]+'.html')}")
        except Exception as e:
            print(f"Failed to convert file: {file} Error: {e}")
            if "No data" in e: print(f"File: {file} was empty.")

字符串
-o意味着成功时，它将写入文件，而不是写入stdout。
我建议你在所有文件上尝试之前先测试一下。
此外，这将只转换为HTML文件，而不是PDF。

赞(0）回复(0）举报 2023-08-01

n53p2ov02#

import os

def change_file_extension(old_extension, new_extension):
    # Get a list of all files in the current directory
    files = os.listdir()

    # Iterate through the files and rename those with the old extension to the new extension
    for file in files:
        if file.endswith(old_extension):
            new_name = file.replace(old_extension, new_extension)
            os.rename(file, new_name)
            print(f"Renamed {file} to {new_name}")

if __name__ == "__main__":
    old_extension = ".txt"
    new_extension = ".html"
    change_file_extension(old_extension, new_extension)

字符串
这对我转换为HTML是有用的。我需要进一步的工作转换每个html文档为pdf

赞(0）回复(0）举报 2023-08-01

我来回答

pandas 将目录中的所有文本文档转换为html文档- python

2条答案

相关问题

热门标签

最新问答