- 此问题在此处已有答案**:
How can I parse CSV files with quoted fields containing commas, in awk?(3个答案)
Bash script Find Duplicate Values and modify them(2个答案)
2天前关闭。
我有一个csv文件和bash脚本,但它不能按需工作
剧本
#!/bin/bash
#create path to redirect accounts.csv to same directory as accounts_new.csv
path=$(dirname $1)
awk '
BEGIN { FS="\""; OFS="," } # input is delimited by double qutoes
NR==1 { print; next }
{ line=""
for (i=1;i<NF;i+=2) { # loop through odd numbered fields
gsub(/,/,"|",$(i+1)) # in even numbered double-quote-delimited fields replace commas with pipes
line=line $i FS $(i+1) FS # rebuild the current line
}
line=line $NF # add last field to new line
split(line,a,",") # split new line on commas
split(tolower(a[3]),b,/[[:space:]]+/) # split tolower(name field) on white space
# rebuild name with first characters of first/last names uppercased
name=toupper(substr(b[1],1,1)) substr(b[1],2) " " toupper(substr(b[2],1,1)) substr(b[2],2)
acct=substr(b[1],1,1) b[2] # build email acct name
lines[NR]=a[1] OFS a[2] OFS name OFS a[4] # rebuild current line based on first 4 fields
locid[NR]=a[2] # make note of location_id for current line
email[NR]=acct # make note of email acct for current line
count[acct]++ # keep count of number of times we see this email acct
}
END { for (i=2;i<=NR;i++) { # loop through our lines of output
gsub(/\|/,",",lines[i]) # replace pipes with original commas
# print final line of output; if email acct has been seen more than once then append the location_id to the email acct; add the "@abc.com" domain and the trailing comma
print lines[i] OFS email[i] (count[email[i]] > 1 ? locid[i] : "") "@abc.com" OFS
}
}' $1 > $path"/accounts_new.csv"
CSV文件内容
id,location_id,name,title,email,department
1,1,Susan houston,Director of Services,,
2,1,Christina Gonzalez,Director,,
3,2,Brenda brown,"Director, Second Career Services",,
4,3,Howard Lader,"Manager, Senior Counseling",,
5,4,Kimberly Pesavento,Commercial director,,
6,5,Joe Bloom,Financial Empowerment Programs Program Director,,
7,6,peter Olson,Director,,
8,6,Bart charlow,Executive Director,,
9,7,Bart Charlow,Executive Director,,
10,7,Barbara Kalt,Director,,
11,8,Marilyn Baker-Venturini,Director,,
12,8,Graciela Hernandez,Assistant Manager,,
13,8,Julie avelino,Assessment Specialist,,
14,9,Dave Genesy,Library Director,,
15,9,maria kramer,Library Divisions Manager,,
16,10,Dave Genesy,Tester,,
17,10,Maria kramer,Library Division Manager,,
18,11,Dave Genesy,Head of office,,
19,11,Elizabeth Meeks,Branch Manager,,
20,12,Kathy Endaya,Director,,
21,13,dave genesy,Library Director,,
22,14,Andres Espinoza,"Manager, Commanding Officer",,
23,15,Jack Phillips,Administrator,,
24,16,James Lee,Commanding Officer,,
25,17,Kenneth Gibson,Tester,,
26,18,Sharon Petersen,Administrator,,
27,19,Sharon Petersen,Administrator,,
28,21,Moncef Salah,Tester,,Office of Innovation
29,22,Suzanne Badenhoop,Tester,suzanne@example.com,Referrals
30,20,Sean Houston,Director of new Services,,
31,8,David Genesy,Account Manager,,
32,8,Elizabeth Feeney,CEO,e.feeney@foobar.org,Operations
33,8,Erika Meeks,Tester,e.meeks@foobar.org,Operations
我想要的输出如下
id,location_id,name,title,email,department
1,1,Susan Houston,Director of Services,shouston1@abc.com,
2,1,Christina Gonzalez,Director,cgonzalez@abc.com,
3,2,Brenda Brown,"Director, Second Career Services",bbrown@abc.com,
4,3,Howard Lader,"Manager, Senior Counseling",hlader@abc.com,
5,4,Kimberly Pesavento,Commercial director,kpesavento@abc.com,
6,5,Joe Bloom,Financial Empowerment Programs Program Director,jbloom@abc.com,
7,6,Peter Olson,Director,polson@abc.com,
8,6,Bart Charlow,Executive Director,bcharlow6@abc.com,
9,7,Bart Charlow,Executive Director,bcharlow7@abc.com,
10,7,Barbara Kalt,Director,bkalt@abc.com,
11,8,Marilyn Baker-venturini,Director,mbaker-venturini@abc.com,
12,8,Graciela Hernandez,Assistant Manager,ghernandez@abc.com,
13,8,Julie Avelino,Assessment Specialist,javelino@abc.com,
14,9,Dave Genesy,Library Director,dgenesy9@abc.com,
15,9,Maria Kramer,Library Divisions Manager,mkramer9@abc.com,
16,10,Dave Genesy,Tester,dgenesy10@abc.com,
17,10,Maria Kramer,Library Division Manager,mkramer10@abc.com,
18,11,Dave Genesy,Head of office,dgenesy11@abc.com,
19,11,Elizabeth Meeks,Branch Manager,emeeks11@abc.com,
20,12,Kathy Endaya,Director,kendaya@abc.com,
21,13,Dave Genesy,Library Director,dgenesy13@abc.com,
22,14,Andres Espinoza,"Manager, Commanding Officer",aespinoza@abc.com,
23,15,Jack Phillips,Administrator,jphillips@abc.com,
24,16,James Lee,Commanding Officer,jlee@abc.com,
25,17,Kenneth Gibson,Tester,kgibson@abc.com,
26,18,Sharon Petersen,Administrator,spetersen18@abc.com,
27,19,Sharon Petersen,Administrator,spetersen19@abc.com,
28,21,Moncef Salah,Tester,msalah@abc.com,Office of Innovation
29,22,Suzanne Badenhoop,Tester,sbadenhoop@abc.com,Referrals
30,20,Sean Houston,Director of new Services,shouston20@abc.com,
31,8,David Genesy,Account Manager,dgenesy8@abc.com,
32,8,Elizabeth Feeney,CEO,efeeney@abc.com,Operations
33,8,Erika Meeks,Tester,emeeks8@abc.com,Operations
但是我得到的输出忽略了department列的内容
id,location_id,name,title,email,department
1,1,Susan Houston,Director of Services,shouston1@abc.com,
2,1,Christina Gonzalez,Director,cgonzalez@abc.com,
3,2,Brenda Brown,"Director, Second Career Services",bbrown@abc.com,
4,3,Howard Lader,"Manager, Senior Counseling",hlader@abc.com,
5,4,Kimberly Pesavento,Commercial director,kpesavento@abc.com,
6,5,Joe Bloom,Financial Empowerment Programs Program Director,jbloom@abc.com,
7,6,Peter Olson,Director,polson@abc.com,
8,6,Bart Charlow,Executive Director,bcharlow6@abc.com,
9,7,Bart Charlow,Executive Director,bcharlow7@abc.com,
10,7,Barbara Kalt,Director,bkalt@abc.com,
11,8,Marilyn Baker-venturini,Director,mbaker-venturini@abc.com,
12,8,Graciela Hernandez,Assistant Manager,ghernandez@abc.com,
13,8,Julie Avelino,Assessment Specialist,javelino@abc.com,
14,9,Dave Genesy,Library Director,dgenesy9@abc.com,
15,9,Maria Kramer,Library Divisions Manager,mkramer9@abc.com,
16,10,Dave Genesy,Tester,dgenesy10@abc.com,
17,10,Maria Kramer,Library Division Manager,mkramer10@abc.com,
18,11,Dave Genesy,Head of office,dgenesy11@abc.com,
19,11,Elizabeth Meeks,Branch Manager,emeeks11@abc.com,
20,12,Kathy Endaya,Director,kendaya@abc.com,
21,13,Dave Genesy,Library Director,dgenesy13@abc.com,
22,14,Andres Espinoza,"Manager, Commanding Officer",aespinoza@abc.com,
23,15,Jack Phillips,Administrator,jphillips@abc.com,
24,16,James Lee,Commanding Officer,jlee@abc.com,
25,17,Kenneth Gibson,Tester,kgibson@abc.com,
26,18,Sharon Petersen,Administrator,spetersen18@abc.com,
27,19,Sharon Petersen,Administrator,spetersen19@abc.com,
28,21,Moncef Salah,Tester,msalah@abc.com,
29,22,Suzanne Badenhoop,Tester,sbadenhoop@abc.com,
30,20,Sean Houston,Director of new Services,shouston20@abc.com,
31,8,David Genesy,Account Manager,dgenesy8@abc.com,
32,8,Elizabeth Feeney,CEO,efeeney@abc.com,
33,8,Erika Meeks,Tester,emeeks8@abc.com,
你能帮我达到我想要的产出吗?
我认为问题是部门值后面没有逗号,这就是为什么它不能按预期工作,可以在代码中更改什么来实现这一点?
2条答案
按热度按时间dgjrabp21#
正如我在对previous answer的评论中提到的:
locid[NR]=a[2]
之后加上dept[NR]=a[6]
print
中添加dept[i]
修改后的代码(无原始注解):
这将产生:
7fhtutme2#
一旦你开始使用带有引号和标题的CSV,Ruby就容易多了:
根据您的输入,打印: