for my application i have to parse CSV file using Erlang.following is the code which will parse CSV using Erlang:-
parse_file(Fn) ->
{ok, Data} = file:read_file(Fn),
parse(binary_to_list(Data)).
parse(Data) -> lists:reverse(parse(Data, [])).
parse([], Acc) -> Acc;
parse(Data, Acc) ->
{Line, Tail} = parse_line(Data),
parse(Tail, [Line|Acc]).
parse_line(Data) ->
{Line, Tail} = parse_line(Data, []),
{lists:reverse(Line), Tail}.
parse_line([13,10|Data], Acc) -> {Acc, Data};
parse_line([10|Data], Acc) -> {Acc, Data};
parse_line([13|Data], Acc) -> {Acc, Data};
parse_line([], Acc) -> {Acc, []};
parse_line([$,,$,|Data], Acc) -> parse_line(Data, [""|Acc]);
parse_line([$,|Data], Acc) -> parse_line(Data, Acc);
parse_line(Data, Acc) ->
{Fld, Tail} = parse_field(Data),
parse_line(Tail, [Fld|Acc]).
parse_field([34|Data]) ->
{Fld, Tail} = parse_fieldq(Data, ""),
{lists:reverse(Fld), Tail};
parse_field(Data) ->
{Fld, Tail} = parse_field(Data, ""),
{lists:reverse(Fld), Tail}.
parse_field([$,|Tail], Acc) -> {Acc, [$,|Tail]};
parse_field([13|Tail], Acc) -> {Acc, [13|Tail]};
parse_field([10|Tail], Acc) -> {Acc, [10|Tail]};
parse_field([], Acc) -> {Acc, []};
parse_field([Ch|Tail], Acc) -> parse_field(Tail, [Ch|Acc]).
parse_fieldq([34,34|Tail], Acc) -> parse_fieldq(Tail, [34|Acc]);
parse_fieldq([34|Tail], Acc) -> {Acc, Tail};
parse_fieldq([Ch|Tail], Acc) -> parse_fieldq(Tail, [Ch|Acc]).
this code works fine but having two issues:- 1-since the code parse using double quote ("") and comma(,) and separate each value..but in following example if First name consist of double quote sting within it then the parser will create one more field.
"Type","First Name","Last Name","Email"
"Contact","Ashwani Garg ------"All Pain Will End."","","itisashwani4u@gmail.com"
result:-
[["contact"],["Ashwani Garg ------"],["All Pain Will End."],[],["itisashwani4u@gmail.com"]]
expected result:-
[["contact"],["Ashwani Garg ------All Pain Will End."],[],["itisashwani4u@gmail.com"]]
2-for the following kind of csv its for value,its truncate some value:- First Name,Last Name,Middle Name,Name,Nickname,E-mail Address,Home Street,Home City,Home Postal Code,Home State,Home Country/Region,Home Phone,Home Fax,Mobile Phone,Personal Web Page,Business Street,Business City,Business Postal Code,Business State,Business Country/Region,Business Web Page,Business Phone,Business Fax,Pager,Company,Job Title,Department,Office Location,Notes
Affection,,,Affection,,,,,,,,+919845141544,,+919845141544,,,,,,,,,,,,,,,
result:-
[["Affection"],[],[],["Affection"],[],[],[],[],[],[],[],["+919845141544"],[],["+919845141544"],[],[],[],[],[],[],[]]
expected result:-
[["Affection"],[],[],["Affection"],[],[],[],[],[],[],[],["+919845141544"],[],["+919845141544"],[],[],[],[],[],[],[],[],[],[],[],[],[],[]]
Please help me ...for refernce please use the following link:- http://ppolv.wordpress.com/2008/02/25/parsing-csv-in-erlang/
8条答案
按热度按时间czq61nw11#
without file:read_line :
ilmyapht2#
附带问题:
您是如何创建CSV输入的?它看起来不是有效的CSV(虽然CSV没有特别严格的规范)。
通常,要在CSV字段中使用双引号,需要将它们转义为一对双引号,因此您的示例如下:
这将很好地导入到Open Office电子表格中,而您的原始示例却不能。
slwdgvem3#
I came across your implementation the other day and started playing around with it.
I made you a parser as well.
I even wrote a small blogpost on this csv parser
44u64gxh4#
在Trapexit中也讨论了从文件中阅读行。根据您的需要进行修改应该是很简单的:
http://www.trapexit.org/Reading_Lines_from_a_File
qjp7pelc5#
我的实现:
但是,这种解决方案不能处理嵌套引号...
例如:
1,“您好,“World”",“她说:“这是“解决方案”,不是吗?"",2000\r\n
ijnw1ujt6#
另一个可能的解决方案。可以很容易地改为懒惰求值,这样就不需要一次阅读整个文件。
2uluyalo7#
我在zed的答案中添加了一些增强功能。
ycggw6v28#
Wisher的答案还不错,只是它丢失了每一行csv的最后一个元素。这里有一个修复程序。尽管它仍然不能处理嵌入的引号。