我们使用带有pam身份验证的mapr在配置单元上执行select查询。selectquery使用一个myudfs.jar,我们在其中定义了自定义项。
我尝试了许多链接,但不明白为什么会发生这种情况。从stacktrace来看,hadoop似乎无法将jar复制到/.staging中的libjars目录中。但是udf在类路径中。
任何帮助都将不胜感激。
java.sql.SQLException: org.apache.hive.service.cli.HiveSQLException: Error while processing statement: FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask. 2063.142.526190 /var/mapr/cluster/yarn/rm/staging/username/.staging/job_112233333_0002/libjars/myudfs.jar (Invalid argument)
at org.apache.hive.service.cli.operation.Operation.toSQLException(Operation.java:380)
at org.apache.hive.service.cli.operation.SQLOperation.runQuery(SQLOperation.java:257)
at org.apache.hive.service.cli.operation.SQLOperation.access$800(SQLOperation.java:91)
at org.apache.hive.service.cli.operation.SQLOperation$BackgroundWork$1.run(SQLOperation.java:348)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1669)
at org.apache.hive.service.cli.operation.SQLOperation$BackgroundWork.run(SQLOperation.java:362)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.io.IOException: 2063.142.526190 /var/mapr/cluster/yarn/rm/staging/username/.staging/job_112233333_0002/libjars/myudfs.jar (Invalid argument)
at com.mapr.fs.Inode.throwIfFailed(Inode.java:390)
at com.mapr.fs.Inode.flushPages(Inode.java:505)
at com.mapr.fs.Inode.releaseDirty(Inode.java:583)
at com.mapr.fs.MapRFsOutStream.dropCurrentPage(MapRFsOutStream.java:73)
at com.mapr.fs.MapRFsOutStream.write(MapRFsOutStream.java:85)
at com.mapr.fs.MapRFsDataOutputStream.write(MapRFsDataOutputStream.java:39)
at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:87)
at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:59)
at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:119)
at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:376)
at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:346)
at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:297)
at org.apache.hadoop.mapreduce.JobResourceUploader.copyRemoteFiles(JobResourceUploader.java:203)
at org.apache.hadoop.mapreduce.JobResourceUploader.uploadFiles(JobResourceUploader.java:128)
at org.apache.hadoop.mapreduce.JobSubmitter.copyAndConfigureFiles(JobSubmitter.java:98)
at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:193)
at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1290)
at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1287)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1669)
at org.apache.hadoop.mapreduce.Job.submit(Job.java:1287)
at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:575)
at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:570)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1669)
at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:570)
at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:561)
at org.apache.hadoop.hive.ql.exec.mr.ExecDriver.execute(ExecDriver.java:414)
at org.apache.hadoop.hive.ql.exec.mr.MapRedTask.execute(MapRedTask.java:151)
at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:201)
at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:100)
at org.apache.hadoop.hive.ql.exec.TaskRunner.run(TaskRunner.java:79)
at org.apache.hive.jdbc.HiveStatement.execute(HiveStatement.java:283)
下面是我们如何执行查询的过程,以及在执行查询之前要做什么。
首先我们创建一个函数,如下所示创建函数func\u name为'com.test.classcontainingmapper'
这个classcontainingmapper类打包在一个名为myudf.jar的udf jar中。我们使用hive.aux.jars.path属性将这个jar添加到类路径中。
现在,我们的查询如下:从dbname.tablename中选择func\u name(col1,col2);
执行此查询时,hadoop会尝试将类路径中提到的jar上载到staging目录下的libjars文件夹。那就是它失败的时候。
有趣的是,在一个类似的集群上,这个查询成功执行,但在另一个集群上,它失败了,出现了stacktrace中提到的异常。
更新:
实际上,在select查询之前执行了另一个查询。是的
将jar'/path/添加到/jar/file/myudf.jar';
当执行这个查询时,jar被上传到集群,这就更有意义了。在此期间,上传查询的操作失败。
暂无答案!
目前还没有任何答案,快来回答吧!