pandas 将列导出为.csv：避免15位以上单元格中的四个零

fdbelqdn 于 2023-05-27 发布在其他

关注(0)|答案(1)|浏览(187)

首先，我是Python和编程世界的新手，我需要你的帮助。

问题：将HTML文件的特定列中的原始数字保留在.csv文件中。

我目前正在运行一个脚本，它从HTML文件中识别游戏提供商和网站，并将信息导出为.csv文件。“ROUND_ID”列必须保留与原始文件中完全相同的数字，因为它是以Excel格式提供给我们的专业团队的。
不幸的是，Excel在数字的末尾追加了四个零，甚至将正确的最后四位数字更改为零。作为一个临时的解决方法，我设法在每个单元格的开头添加了一个“，以保持所需的格式。但是，我正在积极寻求一种永久的解决方案，可以检索原始号码，而不需要任何修改。
我试图运行一个脚本，将列转换为文本，但它似乎不工作。有什么建议吗？
代码如下：

import csv
from bs4 import BeautifulSoup
import pandas as pd
import os

# Set the directory path containing the HTML files
directory = 'C:/Users/gusta/Downloads/HTML/'

# Create an empty list to store the extracted data
filtered_data = []

# Step 1: Loop through each HTML file in the directory
for filename in os.listdir(directory):
    if filename.endswith('.html'):
        # Read the HTML file
        file_path = os.path.join(directory, filename)
        with open(file_path, 'r') as file:
            html_content = file.read()

        # Parsing the HTML content
        soup = BeautifulSoup(html_content, 'html.parser')

        # Filtering the data
        data_rows = soup.find_all('tr')  # Find all table rows
        for row in data_rows[1:]:  # Skip the header row
            cells = row.find_all('td')

            # Handle variations in table structure
            if len(cells) >= 13:
                trader_id = cells[3].text.strip()  # Extract trader ID from the fourth column
                name = cells[5].text.strip()  # Extract name from the sixth column

                # Apply desired filters (e.g., trader ID and name)
                if trader_id == '513' and name == 'PG Soft':
                    # Extract other required columns here
                    collected_data = [
                        cells[0].text.strip(),
                        cells[1].text.strip(),
                        cells[2].text.strip(),
                        trader_id,
                        cells[4].text.strip(),
                        name,
                        cells[6].text.strip(),
                        cells[7].text.strip(),
                        cells[8].text.strip(),
                        cells[9].text.strip(),
                        cells[10].text.strip(),
                        cells[11].text.strip(),
                        cells[12].text.strip()
                    ]
                    # Convert ROUND_ID to a string surrounded by quotes
                    collected_data[7] = f'"{collected_data[7]}"'

                    # Append the collected_data list to filtered_data
                    filtered_data.append(collected_data)
            else:
                print("Unexpected table structure. Skipping row...")

# Generating a single DataFrame from all the extracted data
df = pd.DataFrame(filtered_data, columns=[
    'CUSTOMER_BET_ID',
    'CUSTOMER_CODE',
    'USERNAME',
    'TRADER_ID',
    'TRADER_NAME',
    'NAME',
    'UID_',
    'ROUND_ID',
    'PLAYED_DATE',
    'S',
    'CSN_GAME_ID',
    'PLAYED_AMOUNT_FROM_BALANCE',
    'GAME_NAME'
])

# Save the DataFrame to a single CSV file with a semicolon (;) as the delimiter
output_file = 'output.csv'
df.to_csv(output_file, index=False, quoting=csv.QUOTE_NONNUMERIC, quotechar='"', sep=';')
print(f'Data extraction and spreadsheet generation completed. Output saved to {output_file}.')
type here