使用python从CSV文件中提取表格

jfgube3f 于 2024-01-03 发布在 Python

关注(0)|答案(3)|浏览(164)

我尝试使用Python从报告中提取特定的表，并将数据编译为单个表。然而，似乎一旦它选择或找到所需的一系列列名，它就会复制下面的所有数据，包括机器无法读取的所有必要数据。输出应该是表格格式。我正在运行它。

验证码

import csv
import pandas as pd

# Function to extract relevant data from the CSV file
def extract_data_from_csv(file_path):
    data_start = False
    headers = []
    rows = []

    with open(file_path, newline='') as csvfile:
        csv_reader = csv.reader(csvfile)
        
        for row in csv_reader:
            if not data_start:
                if row and row[0] == "SUBSCRIBER NAME":
                    headers = row
                    data_start = True
            elif data_start and row:
                if row[0].strip().upper() == "PRODUCT TOTAL":
                    break
                rows.append(row)

    return headers, rows

# Specify the path to your CSV file
csv_file_path = 'Testing/sample report.csv'

# Extract data
headers, rows = extract_data_from_csv(csv_file_path)

# Create a DataFrame using pandas
df = pd.DataFrame(rows, columns=headers)

# Display the DataFrame as a table
display(df)

# Save the DataFrame to a new CSV file
output_csv_file_path = 'mark_table_1.csv'
df.to_csv(output_csv_file_path, index=False)

字符串
请参阅谷歌链接的文件：Link

csv

来源：https://stackoverflow.com/questions/77496010/extracting-table-from-csv-file-using-python

3条答案

按热度按时间

k10s72fa1#

可能需要考虑一个自定义的生成器函数，它在文件上循环，在某个条件下开始捕获，然后消耗并产生其余的，直到另一个条件（或文件结束），并中断继续寻找第一个条件，（与您所做的没有很大的不同），但是例如：

import csv
import pandas as pd

def extract(filename):
    with open(filename) as fin:
        csvin = csv.reader(fin)
        for row in csvin:
            if row and row[0] == 'SUBSCRIBER NAME':
                for subrow in csvin:
                    if subrow and subrow[0]: # change me perhaps?
                        yield dict(zip(row, subrow))
                    else:
                        break

字符串
然后通过以下方式获取DataFrame：

df = pd.DataFrame(extract('your_file_name'))

型

赞(0）回复(0）举报 2024-01-03

btqmn9zl2#

我还没有测试过，但你可以尝试用下面的命令更改extract_data_from_csv

# Function to extract relevant data from the CSV file
def extract_data_from_csv(file_path):
    data_start = False
    headers = []
    rows = []

    with open(file_path, newline='') as csvfile:
        csv_reader = csv.reader(csvfile)
        
        for row in csv_reader:
            if not data_start:
                if row and row[0] == "SUBSCRIBER NAME":
                    headers = row
                    data_start = True
            elif data_start and row:
                if row[5].strip().upper() == "PRODUCT TOTAL": #changed row index to 5 
                    data_start = False #in place of break
                rows.append(row)

    return headers, rows

字符串
其中进行了两项更改：
1.索引从row[0].strip().upper() == "PRODUCT TOTAL"更改为row[5].strip().upper() == "PRODUCT TOTAL"
1.将break替换为data_start = False，以暂时停止行写入，直到出现新的“SUBSCRIBER NAME”表;注意headers将在if语句中重写，但这不应成为问题，因为它们始终相同，您也可以进一步修改函数以避免这种情况

赞(0）回复(0）举报 2024-01-03

qaxu7uf23#

我知道了
分享整个代码以供将来参考。

import csv
import pandas as pd

# Function to extract relevant data from the CSV file
def extract_data_from_csv(file_path):
    headers = None
    rows = []

    with open(file_path, newline='') as fin:
        csvin = csv.reader(fin)
        for row in csvin:
            if row and row[0] == 'SUBSCRIBER NAME':
                headers = row
                for subrow in csvin:
                    if subrow and subrow[0].strip().upper() == 'PRODUCT TOTAL':
                        break
                    elif subrow and subrow[0]:
                        rows.append(subrow)
                    else:
                        break

    return headers, rows

# Specify the path to your CSV file
csv_file_path = 'Testing/0976_Bills_Payment_Abstract_of_Collection_11142023_115828.csv'

# Extract data using the new extract function
headers, rows = extract_data_from_csv(csv_file_path)

# Create a DataFrame using pandas
df = pd.DataFrame(rows, columns=headers)

# Display the DataFrame as a table
display(df)

# Save the DataFrame to a new CSV file
output_csv_file_path = 'mark_table_2.csv'
df.to_csv(output_csv_file_path, index=False)

字符串

赞(0）回复(0）举报 2024-01-03

我来回答

使用python从CSV文件中提取表格

3条答案

相关问题

热门标签

最新问答