我想使用pigserverjava类在远程hadoop集群上运行pig脚本。目前,我正在使用clouderacdh4.4.0发行版来测试pusposes。我正在clouderavm中运行java程序。
包含hadoop配置文件的目录包含在类路径中。
当我在shell中以map reduce模式运行同一个脚本时,它工作得很好。提前谢谢你的回答。
java代码:
Properties props = new Properties();
props.setProperty("fs.default.name", "hdfs://localhost.localdomain:8020");
props.setProperty("mapred.job.tracker", "localhost.localdomain:8021");
PigServer pigServer = new PigServer(ExecType.MAPREDUCE, props);
pigServer.registerQuery("batting = load './Batting.csv' using PigStorage(',');");
pigServer.registerQuery("runs_raw = FOREACH batting GENERATE $0 as playerID, $1 as year, $8 as runs;");
pigServer.registerQuery("runs = FILTER runs_raw BY runs > 0;");
pigServer.registerQuery("grp_data = GROUP runs by (year);");
pigServer.registerQuery("max_runs = FOREACH grp_data GENERATE group as grp, MAX(runs.runs) as max_runs;");
pigServer.registerQuery("join_max_run = JOIN max_runs by ($0, max_runs), runs by (year,runs);");
pigServer.registerQuery("join_data = FOREACH join_max_run GENERATE $0 as year, $2 as playerID, $1 as runs;");
pigServer.store("join_data", "join_data");
我还添加了以下maven依赖项 <dependency> <groupId>org.apache.hadoop</groupId> <artifactId>hadoop-client</artifactId> <version>2.2.0</version> </dependency>
在运行java程序时,我得到以下stacktrace:
log4j:WARN No appenders could be found for logger (org.apache.hadoop.conf.Configuration.deprecation).
log4j:WARN Please initialize the log4j system properly.
log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.
*Exception in thread "main" org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1000: Error during parsing. Unable to check name hdfs://localhost.localdomain:8020/user/cloudera*
at org.apache.pig.PigServer$Graph.parseQuery(PigServer.java:1608)
at org.apache.pig.PigServer$Graph.registerQuery(PigServer.java:1547)
at org.apache.pig.PigServer.registerQuery(PigServer.java:518)
at org.apache.pig.PigServer.registerQuery(PigServer.java:531)
at MainTestPigServer.main(MainTestPigServer.java:24)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at com.intellij.rt.execution.application.AppMain.main(AppMain.java:120)
Caused by: Failed to parse: Pig script failed to parse:
<line 1, column 10> pig script failed to validate: org.apache.pig.backend.datastorage.DataStorageException: ERROR 6007: Unable to check name hdfs://localhost.localdomain:8020/user/cloudera
at org.apache.pig.parser.QueryParserDriver.parse(QueryParserDriver.java:191)
at org.apache.pig.PigServer$Graph.parseQuery(PigServer.java:1600)
... 9 more
Caused by:
<line 1, column 10> pig script failed to validate: org.apache.pig.backend.datastorage.DataStorageException: ERROR 6007: Unable to check name hdfs://localhost.localdomain:8020/user/cloudera
at org.apache.pig.parser.LogicalPlanBuilder.buildLoadOp(LogicalPlanBuilder.java:835)
at org.apache.pig.parser.LogicalPlanGenerator.load_clause(LogicalPlanGenerator.java:3236)
at org.apache.pig.parser.LogicalPlanGenerator.op_clause(LogicalPlanGenerator.java:1315)
at org.apache.pig.parser.LogicalPlanGenerator.general_statement(LogicalPlanGenerator.java:799)
at org.apache.pig.parser.LogicalPlanGenerator.statement(LogicalPlanGenerator.java:517)
at org.apache.pig.parser.LogicalPlanGenerator.query(LogicalPlanGenerator.java:392)
at org.apache.pig.parser.QueryParserDriver.parse(QueryParserDriver.java:184)
... 10 more
Caused by: org.apache.pig.backend.datastorage.DataStorageException: ERROR 6007: Unable to check name hdfs://localhost.localdomain:8020/user/cloudera
at org.apache.pig.backend.hadoop.datastorage.HDataStorage.isContainer(HDataStorage.java:207)
at org.apache.pig.backend.hadoop.datastorage.HDataStorage.asElement(HDataStorage.java:128)
at org.apache.pig.backend.hadoop.datastorage.HDataStorage.asElement(HDataStorage.java:138)
at org.apache.pig.parser.QueryParserUtils.getCurrentDir(QueryParserUtils.java:91)
at org.apache.pig.parser.LogicalPlanBuilder.buildLoadOp(LogicalPlanBuilder.java:827)
... 16 more
Caused by: java.io.IOException: Failed on local exception: com.google.protobuf.InvalidProtocolBufferException: Protocol message contained an invalid tag (zero).; Host Details : local host is: "localhost.localdomain/127.0.0.1"; destination host is: "localhost.localdomain":8020;
at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:764)
at org.apache.hadoop.ipc.Client.call(Client.java:1351)
at org.apache.hadoop.ipc.Client.call(Client.java:1300)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206)
at $Proxy9.getFileInfo(Unknown Source)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:186)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
at $Proxy9.getFileInfo(Unknown Source)
at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getFileInfo(ClientNamenodeProtocolTranslatorPB.java:651)
at org.apache.hadoop.hdfs.DFSClient.getFileInfo(DFSClient.java:1679)
at org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1106)
at org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1102)
at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1102)
at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:1397)
at org.apache.pig.backend.hadoop.datastorage.HDataStorage.isContainer(HDataStorage.java:200)
... 20 more
Caused by: com.google.protobuf.InvalidProtocolBufferException: Protocol message contained an invalid tag (zero).
at com.google.protobuf.InvalidProtocolBufferException.invalidTag(InvalidProtocolBufferException.java:89)
at com.google.protobuf.CodedInputStream.readTag(CodedInputStream.java:108)
at org.apache.hadoop.ipc.protobuf.RpcHeaderProtos$RpcResponseHeaderProto.<init>(RpcHeaderProtos.java:1398)
at org.apache.hadoop.ipc.protobuf.RpcHeaderProtos$RpcResponseHeaderProto.<init>(RpcHeaderProtos.java:1362)
at org.apache.hadoop.ipc.protobuf.RpcHeaderProtos$RpcResponseHeaderProto$1.parsePartialFrom(RpcHeaderProtos.java:1492)
at org.apache.hadoop.ipc.protobuf.RpcHeaderProtos$RpcResponseHeaderProto$1.parsePartialFrom(RpcHeaderProtos.java:1487)
at com.google.protobuf.AbstractParser.parsePartialFrom(AbstractParser.java:200)
at com.google.protobuf.AbstractParser.parsePartialDelimitedFrom(AbstractParser.java:241)
at com.google.protobuf.AbstractParser.parseDelimitedFrom(AbstractParser.java:253)
at com.google.protobuf.AbstractParser.parseDelimitedFrom(AbstractParser.java:259)
at com.google.protobuf.AbstractParser.parseDelimitedFrom(AbstractParser.java:49)
at org.apache.hadoop.ipc.protobuf.RpcHeaderProtos$RpcResponseHeaderProto.parseDelimitedFrom(RpcHeaderProtos.java:2364)
at org.apache.hadoop.ipc.Client$Connection.receiveRpcResponse(Client.java:996)
at org.apache.hadoop.ipc.Client$Connection.run(Client.java:891)
Process finished with exit code 1
暂无答案!
目前还没有任何答案,快来回答吧!