hadoop仍然将逗号视为分隔符

vsnjm48y  于 2021-05-30  发布在  Hadoop
关注(0)|答案(1)|浏览(493)

我当前正在将数据导入配置单元表。当我们创建我们使用的表时

CREATE EXTERNAL TABLE Customers
(
Code      string,
Company      string,
FirstName     string,
LastName     string,
DateOfBirth string,
PhoneNo     string,
)
ROW FORMAT DELIMITED FIELDS TERMINATED BY '|' LINES TERMINATED BY '\n';

因为我们的数据中有逗号。但是,我们现在发现逗号仍然被视为字段分隔符,以及用来分隔字段的|。有办法解决这个问题吗?我们是否必须对数据中的每一个逗号都进行转义,还是有更简单的方法来设置它?
示例数据

1|2|3|4
a|b|c|d
John|Joe|Bob, Jr|Alex

当放在table上的时候

1 2 3 4
a b c d
John Joe Bob Jr

jr占据了自己的专栏,把亚历克斯从table上撞了下来。

mwg9r5ms

mwg9r5ms1#

使用你的数据对我来说效果很好。配置单元版本为0.13

hive> create external table foo(
    > first string,
    > second string,
    > third string,
    > forth string)
    > row format delimited fields terminated by '|' lines terminated by '\n';
OK
Time taken: 3.222 seconds
hive> load data inpath '/user/xilan/data.txt' overwrite into table foo;

hive> select third from foo;
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_1422157058628_0001, Tracking URL =    http://host:8088/proxy/application_1422157058628_0001/
Kill Command = /scratch/xilan/hadoop/bin/hadoop job  -kill job_1422157058628_0001
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 0
2015-03-27 07:05:41,901 Stage-1 map = 0%,  reduce = 0%
2015-03-27 07:05:50,190 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 1.24 sec
MapReduce Total cumulative CPU time: 1 seconds 240 msec
Ended Job = job_1422157058628_0001
MapReduce Jobs Launched:
Job 0: Map: 1   Cumulative CPU: 1.24 sec   HDFS Read: 245 HDFS Write: 12     SUCCESS
Total MapReduce CPU Time Spent: 1 seconds 240 msec
OK
3
c
Bob, Jr
Time taken: 18.853 seconds, Fetched: 3 row(s)
hive>

相关问题