我有一个以上格式的文件格式说 "test.fmt" . 我得把第三排和最后一排都去掉。 Id 以及 Flag 文件中的行。同时,还要更新lhs上的列号。在这种情况下,去掉两列后,我将得到剩下的5行。“此值” 5 '还必须在当前为' 7 '. 还要注意输出文件中的最后一行是如何具有“ \r\n ,所以也需要改变。如何使用python来实现这一点(如果有人可以帮助编写示例代码或其他东西,这将非常有用),并且还可以参数化变量,以便对50个具有不同行数的文件执行相同的任务。
"test.fmt"
Id
Flag
5
7
\r\n
tcomlyy61#
你们可以用正常的方式读前两行,其余的用 pandas.read_csv(..., sep="\s+") 我用 io.StringIO 仅用于模拟文件,但您应该使用 open() ```data = '''14.071 SQLCHAR 0 12 "," 1 Id ""2 SQLCHAR 0 200 "," 2 value1 SQL_Latin1_General_CP1_CI_AS3 SQLCHAR 0 200 "," 3 value2 SQL_Latin1_General_CP1_CI_AS4 SQLCHAR 0 200 "," 4 value3 SQL_Latin1_General_CP1_CI_AS5 SQLCHAR 0 200 "," 5 value4 SQL_Latin1_General_CP1_CI_AS6 SQLCHAR 0 200 "," 6 value5 SQL_Latin1_General_CP1_CI_AS7 SQLCHAR 0 12 "\r\n" 7 Flag ""'''
pandas.read_csv(..., sep="\s+")
io.StringIO
open()
import pandas as pdimport io
fh = io.StringIO(data)
two_lines = [next(fh), next(fh)]print(two_lines)
df = pd.read_csv(fh, sep="\s+", header=None)print(df)
df = df[ ~df[6].str.contains('Id|Flag') ]
df[0] = range(1, len(df)+1)
fh = open('output.txt', 'w')
fh.write(two_lines[0])fh.write(str(len(df)) + '\n')
for row in df.itertuples():# format line to keep spacesline = '{:<8}{:<20}{:<8}{:<9}{:<8}{:<6}{:<28}{}'.format(*row[1:])
#print(line)# write linefh.write(line + '\n')
#print(line)
# write line
fh.write(line + '\n')
fh.close()
wbgh16ku2#
这应该使用 re.split() 并根据要求跳过/重新编号字段:
re.split()
import reSKIP_WORDS = ['Id', 'Flag']LAST_LINE_CRLF = r'"\r\n"'with open('test.fmt') as input_file: vers = next(input_file).strip() _ = next(input_file) new_lines, new_field_no = [], 0 for line in input_file: if not any(s in line for s in SKIP_WORDS): new_field_no += 1 data = re.split(r'( +(?=.*))', line.strip()) data[0] = str(new_field_no) # renumber field data[10] = str(new_field_no) # renumber field new_lines.append(data)with open('test_new.fmt', 'w') as output_file: output_file.write(vers + '\n') output_file.write(str(new_field_no) + '\n') for idx, line in enumerate(new_lines, start=1): if idx == new_field_no: # adjust last line delimiter line[8] = LAST_LINE_CRLF line[9] = ' ' output_file.write(''.join(line) + '\n')
import re
SKIP_WORDS = ['Id', 'Flag']
LAST_LINE_CRLF = r'"\r\n"'
with open('test.fmt') as input_file:
vers = next(input_file).strip()
_ = next(input_file)
new_lines, new_field_no = [], 0
for line in input_file:
if not any(s in line for s in SKIP_WORDS):
new_field_no += 1
data = re.split(r'( +(?=.*))', line.strip())
data[0] = str(new_field_no) # renumber field
data[10] = str(new_field_no) # renumber field
new_lines.append(data)
with open('test_new.fmt', 'w') as output_file:
output_file.write(vers + '\n')
output_file.write(str(new_field_no) + '\n')
for idx, line in enumerate(new_lines, start=1):
if idx == new_field_no: # adjust last line delimiter
line[8] = LAST_LINE_CRLF
line[9] = ' '
output_file.write(''.join(line) + '\n')
2条答案
按热度按时间tcomlyy61#
你们可以用正常的方式读前两行,其余的用
pandas.read_csv(..., sep="\s+")
我用io.StringIO
仅用于模拟文件,但您应该使用open()
```data = '''14.0
7
1 SQLCHAR 0 12 "," 1 Id ""
2 SQLCHAR 0 200 "," 2 value1 SQL_Latin1_General_CP1_CI_AS
3 SQLCHAR 0 200 "," 3 value2 SQL_Latin1_General_CP1_CI_AS
4 SQLCHAR 0 200 "," 4 value3 SQL_Latin1_General_CP1_CI_AS
5 SQLCHAR 0 200 "," 5 value4 SQL_Latin1_General_CP1_CI_AS
6 SQLCHAR 0 200 "," 6 value5 SQL_Latin1_General_CP1_CI_AS
7 SQLCHAR 0 12 "\r\n" 7 Flag ""'''
import pandas as pd
import io
open file for reading
fh = io.StringIO(data)
fh = open(filename)
read first two line
two_lines = [next(fh), next(fh)]
print(two_lines)
read rest as CSV file without headers
df = pd.read_csv(fh, sep="\s+", header=None)
print(df)
skip rows which have
Id
orFlag
in 7th columndf = df[ ~df[6].str.contains('Id|Flag') ]
assign new values in first column
df[0] = range(1, len(df)+1)
print(df)
open file for writing
fh = open('output.txt', 'w')
write first two lines
fh.write(two_lines[0])
fh.write(str(len(df)) + '\n')
if you don't need spaces then you can use this
df.to_csv(fh, sep=' ', index=False, header=False)
if you need spaces then you can use this
for row in df.itertuples():
# format line to keep spaces
line = '{:<8}{:<20}{:<8}{:<9}{:<8}{:<6}{:<28}{}'.format(*row[1:])
close file
fh.close()
wbgh16ku2#
这应该使用
re.split()
并根据要求跳过/重新编号字段: