无法在mapreduce模式下使用java运行嵌入式pig

d5vmydt9  于 2021-06-03  发布在  Hadoop
关注(0)|答案(1)|浏览(407)

我正在使用pig0.12.0和hadoop2.2.0。我已经成功地从gruntshell和pig批处理脚本在本地和map reduce模式下运行了pig。现在我正在尝试从java中嵌入的pig运行pig。
话虽如此,我还成功地在本地模式下运行了嵌入式pig。但是,在map reduce模式下运行嵌入式pig时遇到了问题。
问题是:在成功编译类之后,当我运行

  1. java -cp <classpath> PigMapRedMode

后来我看到有人说我应该在类路径中包含pig.properties。例如

  1. fs.default.name=hdfs://<namenode-hostname>:<port>
  2. mapred.job.tracker=<jobtracker-hostname>:<port>

但是,在hadoop2.2.0中,jobtracker不再存在。有什么想法吗?
我附上了pigmapredmode的java代码,以防这里出了问题。

  1. import java.io.IOException;
  2. import org.apache.pig.PigServer;
  3. public class PigMapRedMode {
  4. public static void main(String[] arg){
  5. try {
  6. PigServer pigServer = new PigServer("map reduce, (need to add properties file)");
  7. runIdQuery(pigServer, "5pts.txt");
  8. } catch (Exception e){
  9. }
  10. }
  11. public static void runIdQuery(PigServer pigServer, String inputFile) throws IOException {
  12. pigServer.registerQuery("A = load '" + inputFile + "' using PigStorage(',');");
  13. pigServer.registerQuery("B = foreach A generate $0 as id;");
  14. pigServer.store("B", "id.out");
  15. }
  16. }

更新:
已找到解决方案!实际上,不需要在类路径中提供properties对象或使用pig.properties,您所要做的就是在类路径中包含hadoop配置目录:(对于我的hadoop 2.2.0,它是/etc/hadoop)并且df.default.address和yarn.resourcemanager.address可以从该位置检索。
我将修改后的java代码附在下面:

  1. /**
  2. * Created by allenlin on 2/19/14.
  3. */
  4. import java.io.IOException;
  5. import java.util.Properties;
  6. import org.apache.pig.ExecType;
  7. import org.apache.pig.PigServer;
  8. public class PigMapRedMode {
  9. public static void main(String[] arg){
  10. try {
  11. PigServer pigServer = new PigServer(ExecType.MAPREDUCE);
  12. runIdQuery(pigServer, "<hdfs input address>");
  13. } catch (Exception e){
  14. }
  15. }
  16. public static void runIdQuery(PigServer pigServer, String inputFile) throws IOException {
  17. pigServer.registerQuery("A = load '" + inputFile + "' using PigStorage(',');");
  18. pigServer.registerQuery("B = foreach A generate $0 as id;");
  19. pigServer.store("B", "<hdfs output address>");
  20. }
  21. }

我用来运行java类的unix命令。注意你需要包含的依赖关系:

  1. java -cp ".:$PIG_HOME/build/pig-0.12.1-SNAPSHOT.jar:$HADOOP_HOME/share/hadoop/common/lib/*:$HADOOP_HOME/share/hadoop/common/*:$HADOOP_HOME/share/hadoop/mapreduce/*:antlr-runtime-3.4.jar:$HADOOP_HOME/share/hadoop/yarn/*:$HADOOP_HOME/share/hadoop/hdfs/*:$PIG_HOME/build/ivy/lib/Pig/*:$HADOOP_CONF_DIR" PigMapRedMode

感谢@zsxwing的帮助!

a9wyjsp7

a9wyjsp71#

我是这样跑的

  1. public class test1 {
  2. public static void main(String[] args) {
  3. try {
  4. PigServer pigServer = new PigServer(ExecType.MAPREDUCE);
  5. runQuery(pigServer);
  6. Properties props = new Properties();
  7. props.setProperty("fs.default.name", "hdfs://localhost:9000");
  8. }catch(Exception e) {
  9. e.printStackTrace();
  10. }
  11. }
  12. public static void runQuery(PigServer pigServer) {
  13. try {
  14. pigServer.registerQuery("input1 = LOAD '/input.data' as (line:chararray);");
  15. pigServer.registerQuery("words = foreach input1 generate FLATTEN(TOKENIZE(line)) as word;");
  16. pigServer.registerQuery("word_groups = group words by word;");
  17. pigServer.registerQuery("word_count = foreach word_groups generate group, COUNT(words);");
  18. pigServer.registerQuery("ordered_word_count = order word_count by group desc;");
  19. pigServer.registerQuery("store ordered_word_count into '/wct';");
  20. } catch(Exception e) {
  21. e.printStackTrace();
  22. }
  23. }
  24. }

在eclipse中设置hadoop\u主页

  1. Run Configurations-->ClassPath-->User Entries-->Advanced-->Add ClassPath Variables-->New-->Name(HADOOP_HOME)-->Path(You Hadoop directory path)

我添加了maven依赖项

  1. <dependencies>
  2. <dependency>
  3. <groupId>org.apache.hadoop</groupId>
  4. <artifactId>hadoop-hdfs</artifactId>
  5. <version>2.7.1</version>
  6. </dependency>
  7. <dependency>
  8. <groupId>org.apache.hadoop</groupId>
  9. <artifactId>hadoop-client</artifactId>
  10. <version>2.7.1</version>
  11. </dependency>
  12. <dependency>
  13. <groupId>commons-io</groupId>
  14. <artifactId>commons-io</artifactId>
  15. <version>2.4</version>
  16. </dependency>
  17. <dependency>
  18. <groupId>log4j</groupId>
  19. <artifactId>log4j</artifactId>
  20. <version>1.2.16</version>
  21. </dependency>
  22. <dependency>
  23. <groupId>org.apache.pig</groupId>
  24. <artifactId>pig</artifactId>
  25. <version>0.15.0</version>
  26. </dependency>
  27. <dependency>
  28. <groupId>org.antlr</groupId>
  29. <artifactId>antlr-runtime</artifactId>
  30. <version>3.4</version>
  31. </dependency>
  32. </dependencies>

如果您没有正确设置hadoop\u home,您将得到以下错误

  1. hadoop20.PigJobControl: falling back to default JobControl (not using hadoop 0.20 ?)
展开查看全部

相关问题