现在我正在使用配置单元jdbc存储处理程序在配置单元上创建表:
参考文献:https://github.com/qubole/hive-jdbc-storage-handler
hadoop信息:
-hadoop 2.7.3-配置单元1.2.1
架构:
CREATE EXTERNAL TABLE tbl_google_stats_adgroups_summary
(
campaignid BIGINT COMMENT 'from deserializer',
adgroupid BIGINT COMMENT 'from deserializer',
localadgroupid BIGINT COMMENT 'from deserializer',
position FLOAT COMMENT 'from deserializer',
cost FLOAT COMMENT 'from deserializer',
impression INT COMMENT 'from deserializer',
clicks INT COMMENT 'from deserializer',
conversions INT COMMENT 'from deserializer',
conversionsbydate INT COMMENT 'from deserializer',
uniqueconversions INT COMMENT 'from deserializer',
uniqueconversionsbydate INT COMMENT 'from deserializer',
datestats TIMESTAMP COMMENT 'from deserializer',
quantity INT COMMENT 'from deserializer',
quantitybydate INT COMMENT 'from deserializer',
revenue FLOAT COMMENT 'from deserializer',
revenuebydate FLOAT COMMENT 'from deserializer',
uniquerevenue FLOAT COMMENT 'from deserializer',
uniquerevenuebydate FLOAT COMMENT 'from deserializer',
deviceid INT COMMENT 'from deserializer',
conv1perclick INT COMMENT 'from deserializer',
adwordstype INT COMMENT 'from deserializer'
)
ROW FORMAT SERDE 'org.apache.hadoop.hive.jdbc.storagehandler.JdbcSerDe'
STORED BY 'org.apache.hadoop.hive.jdbc.storagehandler.JdbcStorageHandler' WITH SERDEPROPERTIES ( 'serialization.format' = '1' ) TBLPROPERTIES ( 'mapred.jdbc.driver.class' = 'com.mysql.jdbc.Driver' , 'mapred.jdbc.hive.lazy.split' = 'false' ,
'mapred.jdbc.input.table.name' = 'tbl_adgroup' ,
'mapred.jdbc.password' = '' , 'mapred.jdbc.url' = 'jdbc:mysql://localhost:3306/<databae_name>?useUnicode=true&useJDBCCompliantTimezoneShift=true&useLegacyDatetimeCode=false&serverTimezone=UTC' , 'mapred.jdbc.username' = 'root' )
数据从mysql很好地同步,然后我使用apachekylin用这个表构建了一个演示立方体。但是,构建多维数据集过程的第一个状态:创建中间平面配置单元表失败,查询失败:
INSERT OVERWRITE TABLE default.kylin_intermediate_test_cube_adgroup_mysql_164b0ca3_6050_49bb_838b_49ee49f6d1e5 SELECT
TBL_GOOGLE_STATS_ADGROUPS_SUMMARY.CAMPAIGNID
,TBL_GOOGLE_STATS_ADGROUPS_SUMMARY.ADGROUPID
,TBL_GOOGLE_STATS_ADGROUPS_SUMMARY.POSITION
,TBL_GOOGLE_STATS_ADGROUPS_SUMMARY.DATESTATS
,TBL_GOOGLE_STATS_ADGROUPS_SUMMARY.DEVICEID
,TBL_GOOGLE_STATS_ADGROUPS_SUMMARY.ADWORDSTYPE
,TBL_GOOGLE_STATS_ADGROUPS_SUMMARY.COST
FROM <database>.TBL_GOOGLE_STATS_ADGROUPS_SUMMARY as TBL_GOOGLE_STATS_ADGROUPS_SUMMARY ;
Status: Failed
Vertex failed, vertexName=Map 1, vertexId=vertex_1478248621961_0005_1_00, diagnostics=[Task failed, taskId=task_1478248621961_0005_1_00_000000, diagnostics=[TaskAttempt 0 failed, info=[Error: Failure while running task:java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: java.io.IOException: java.io.IOException: InputFormatWrapper can not support RecordReaders that don't return same key & value objects. current reader class : class org.apache.hadoop.mapreduce.lib.db.MySQLDBRecordReader
at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:173)
at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:139)
at org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:347)
at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:194)
at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:185)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1724)
at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:185)
at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:181)
at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: java.io.IOException: java.io.IOException: InputFormatWrapper can not support RecordReaders that don't return same key & value objects. current reader class : class org.apache.hadoop.mapreduce.lib.db.MySQLDBRecordReader
at org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.pushRecord(MapRecordSource.java:71)
at org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.run(MapRecordProcessor.java:325)
at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:150)
... 14 more
有人能帮我吗?除此之外,我还想知道hadoop系统的结构:
现在我们需要像mysql这样的事务sql引擎,因为一些outreport数据需要更新数据。配置单元提供了acid表,但它不支持更新内部连接,。。。这件事确实是我们生意上的问题。这就是我设置jdbcstoragehandler的原因。那么这个结构能处理十亿行数据吗?谢谢!
暂无答案!
目前还没有任何答案,快来回答吧!