我有一个csv文件,里面有这样的内容。
"DepartmentID","Name","GroupName","ModifiedDate"
"1","Engineering","Research and Development","2008-04-30 00:00:00"
我有
create external table if not exists AdventureWorks2014.Department
(
DepartmentID smallint ,
Name string ,
GroupName string,
rate_code string,
ModifiedDate timestamp
)
ROW FORMAT DELIMITED FIELDS TERMINATED BY '","' lines terminated by '\n'
STORED AS TEXTFILE LOCATION 'wasb:///ds/Department' TBLPROPERTIES('skip.header.line.count'='1');`
加载数据后
LOAD DATA INPATH 'wasb:///ds/Department.csv' INTO TABLE AdventureWorks2014.Department;
数据未加载。
select * from AdventureWorks2014.Department;
上面的select不返回任何内容。
我认为每个文件的双引号是问题所在。有没有一种方法可以将数据从这样的文件加载到配置单元表中,而不必去掉双引号?
3条答案
按热度按时间lxkprmvk1#
试试这个(手机…)
限制
此serde将所有列视为string类型。即使使用此serde创建具有非字符串列类型的表,describe表输出也将显示字符串列类型。类型信息是从serde检索的。要将表中的列转换为所需类型,可以在表上创建一个视图,该视图将转换为所需类型。
https://cwiki.apache.org/confluence/display/hive/csv+serde
krcsximq2#
将数据本地inpath'/home/hadoop/hive/log\u 2013805\u 16210.log'加载到表\u name中
1cosmwyk3#
FIELDS TERMINATED BY '","'
不正确。字段的结尾是,而不是“,”。将ddl更改为FIELDS TERMINATED BY ','
.