无法在mapreduce模式下使用java运行嵌入式pig

d5vmydt9  于 2021-06-03  发布在  Hadoop
关注(0)|答案(1)|浏览(381)

我正在使用pig0.12.0和hadoop2.2.0。我已经成功地从gruntshell和pig批处理脚本在本地和map reduce模式下运行了pig。现在我正在尝试从java中嵌入的pig运行pig。
话虽如此,我还成功地在本地模式下运行了嵌入式pig。但是,在map reduce模式下运行嵌入式pig时遇到了问题。
问题是:在成功编译类之后,当我运行

java -cp <classpath> PigMapRedMode

后来我看到有人说我应该在类路径中包含pig.properties。例如

fs.default.name=hdfs://<namenode-hostname>:<port>
    mapred.job.tracker=<jobtracker-hostname>:<port>

但是,在hadoop2.2.0中,jobtracker不再存在。有什么想法吗?
我附上了pigmapredmode的java代码,以防这里出了问题。

import java.io.IOException;
import org.apache.pig.PigServer;

public class PigMapRedMode {
    public static void main(String[] arg){
        try {
            PigServer pigServer = new PigServer("map reduce, (need to add properties file)");
            runIdQuery(pigServer, "5pts.txt");
        } catch (Exception e){
        }
    }

    public static void runIdQuery(PigServer pigServer, String inputFile) throws IOException {
        pigServer.registerQuery("A = load '" + inputFile + "' using PigStorage(',');");
        pigServer.registerQuery("B = foreach A generate $0 as id;");
        pigServer.store("B", "id.out");
    }
}

更新:
已找到解决方案!实际上,不需要在类路径中提供properties对象或使用pig.properties,您所要做的就是在类路径中包含hadoop配置目录:(对于我的hadoop 2.2.0,它是/etc/hadoop)并且df.default.address和yarn.resourcemanager.address可以从该位置检索。
我将修改后的java代码附在下面:

/**
 * Created by allenlin on 2/19/14.
 */
import java.io.IOException;
import java.util.Properties;

import org.apache.pig.ExecType;
import org.apache.pig.PigServer;

public class PigMapRedMode {
    public static void main(String[] arg){
        try {
            PigServer pigServer = new PigServer(ExecType.MAPREDUCE);
            runIdQuery(pigServer, "<hdfs input address>");
        } catch (Exception e){
        }
    }

    public static void runIdQuery(PigServer pigServer, String inputFile) throws IOException {
        pigServer.registerQuery("A = load '" + inputFile + "' using PigStorage(',');");
        pigServer.registerQuery("B = foreach A generate $0 as id;");
        pigServer.store("B", "<hdfs output address>");
    }
}

我用来运行java类的unix命令。注意你需要包含的依赖关系:

java -cp ".:$PIG_HOME/build/pig-0.12.1-SNAPSHOT.jar:$HADOOP_HOME/share/hadoop/common/lib/*:$HADOOP_HOME/share/hadoop/common/*:$HADOOP_HOME/share/hadoop/mapreduce/*:antlr-runtime-3.4.jar:$HADOOP_HOME/share/hadoop/yarn/*:$HADOOP_HOME/share/hadoop/hdfs/*:$PIG_HOME/build/ivy/lib/Pig/*:$HADOOP_CONF_DIR" PigMapRedMode

感谢@zsxwing的帮助!

a9wyjsp7

a9wyjsp71#

我是这样跑的

public class test1 {
public static void main(String[] args) {
 try {
        PigServer pigServer = new PigServer(ExecType.MAPREDUCE);
        runQuery(pigServer);
         Properties props = new Properties();
        props.setProperty("fs.default.name", "hdfs://localhost:9000");
}catch(Exception e) {
        e.printStackTrace();
    }
}
public static void runQuery(PigServer pigServer) {
    try {
        pigServer.registerQuery("input1 = LOAD '/input.data' as (line:chararray);");
        pigServer.registerQuery("words = foreach input1 generate FLATTEN(TOKENIZE(line)) as word;");
        pigServer.registerQuery("word_groups = group words by word;");
        pigServer.registerQuery("word_count = foreach word_groups generate group, COUNT(words);");
        pigServer.registerQuery("ordered_word_count = order word_count by group desc;");
        pigServer.registerQuery("store ordered_word_count into '/wct';");
    } catch(Exception e) {
        e.printStackTrace();
    }

  }
}

在eclipse中设置hadoop\u主页

Run Configurations-->ClassPath-->User Entries-->Advanced-->Add ClassPath Variables-->New-->Name(HADOOP_HOME)-->Path(You Hadoop directory path)

我添加了maven依赖项

<dependencies>
    <dependency>
    <groupId>org.apache.hadoop</groupId>
    <artifactId>hadoop-hdfs</artifactId>
    <version>2.7.1</version>
</dependency>
<dependency>
    <groupId>org.apache.hadoop</groupId>
    <artifactId>hadoop-client</artifactId>
    <version>2.7.1</version>
</dependency>

<dependency>
    <groupId>commons-io</groupId>
    <artifactId>commons-io</artifactId>
    <version>2.4</version>
</dependency>
<dependency>
    <groupId>log4j</groupId>
    <artifactId>log4j</artifactId>
    <version>1.2.16</version>
</dependency>
 <dependency>
    <groupId>org.apache.pig</groupId>
    <artifactId>pig</artifactId>
    <version>0.15.0</version>
</dependency>

<dependency>
    <groupId>org.antlr</groupId>
    <artifactId>antlr-runtime</artifactId>
    <version>3.4</version>
</dependency>
 </dependencies>

如果您没有正确设置hadoop\u home,您将得到以下错误

hadoop20.PigJobControl: falling back to default JobControl (not using hadoop 0.20 ?)

相关问题