我正在编写一个定制的hiveserde来解析日志(目标是将用户代理解析到hive表中的complexe结构中,但它还没有成为代码)。
但是,当我尝试将数据放入非字符串类型的列中时,会出现classcastexception。
我的hive版本是0.9.0
这是我的定制服务:
@Override
public void initialize(Configuration conf, Properties tbl)
throws SerDeException {
String colNamesStr = tbl.getProperty(serdeConstants.LIST_COLUMNS);
colNames = Arrays.asList(colNamesStr.split(","));
String colTypesStr = tbl.getProperty(serdeConstants.LIST_COLUMN_TYPES);
List<TypeInfo> colTypes = TypeInfoUtils.getTypeInfosFromTypeString(colTypesStr);
rowTypeInfo = (StructTypeInfo) TypeInfoFactory.getStructTypeInfo(colNames, colTypes);
rowOI = TypeInfoUtils.getStandardJavaObjectInspectorFromTypeInfo(rowTypeInfo);
}
@Override
public Object deserialize(Writable blob) throws SerDeException {
row.clear();
String[] line = blob.toString().split("\t");
row.add(line[0]);
row.add(Long.parseLong(line[1]));
row.add(line[2]);
return row;
}
以下是表格创建:
CREATE EXTERNAL TABLE logs (
token STRING,
tmstmp BIGINT,
user_agent STRING )
ROW FORMAT SERDE 'com.hive.serde.LogsSerDe'
LOCATION '/user/Input/logs';
下面是错误:
java.io.IOException: java.lang.ClassCastException: java.lang.String cannot be cast to java.lang.Long
at org.apache.hadoop.hive.ql.exec.FetchTask.fetch(FetchTask.java:173)
at org.apache.hadoop.hive.ql.Driver.getResults(Driver.java:1382)
at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:270)
at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:216)
at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:412)
at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:699)
at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:563)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
Caused by: java.lang.ClassCastException: java.lang.String cannot be cast to java.lang.Long
at org.apache.hadoop.hive.serde2.objectinspector.primitive.JavaLongObjectInspector.get(JavaLongObjectInspector.java:39)
at org.apache.hadoop.hive.serde2.lazy.LazyUtils.writePrimitiveUTF8(LazyUtils.java:203)
at org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe.serialize(LazySimpleSerDe.java:483)
at org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe.serializeField(LazySimpleSerDe.java:436)
at org.apache.hadoop.hive.serde2.DelimitedJSONSerDe.serializeField(DelimitedJSONSerDe.java:69)
at org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe.serialize(LazySimpleSerDe.java:420)
at org.apache.hadoop.hive.ql.exec.FetchTask.fetch(FetchTask.java:163)
... 11 more
似乎“反序列化”函数返回的所有值都是字符串。
事先谢谢你的帮助
1条答案
按热度按时间ctehm74n1#
这个
tmstmp
ddl中的列是bigint。你返回一个长而Hive期待一个长可写的。尝试:row.add(new LongWritable(Long.valueOf(line[1])));
同样,您可能需要将字符串转换为Text
使用:new Text(javaStringObject);