如何传递多个输入格式文件来Mapreduce作业？

sqserrrh 于 2021-06-03 发布在 Hadoop

关注(0)|答案(1)|浏览(437)

我正在编写map-reduce程序来查询Cassandra列族。我只需要从一个列族中读取行的子集（使用行键）。我有一组我要读的行的键。如何将“row key set”传递给map reduce作业，以便它只能输出cassandra columnfamily中的那些行子集？
摘要：

enter code here
  class GetRows()
  {
   public set<String> getRowKeys()
   {
     logic.....
     return set<string>;
   }
  }
  class MapReduceCassandra()
  {
    inputformat---columnFamilyInputFormat
     .
     ;
    also need input key-set .. How to get it?
  }

有谁能建议从java应用程序调用mapreduce的最佳方法，以及如何将一组键传递给mapreduce？

Java hadoop cassandra mapreduce

来源：https://stackoverflow.com/questions/21897568/how-to-pass-multiple-input-format-files-to-map-reduce-job

1条答案

按热度按时间

yzckvree1#

从java调用map reduce
为此，可以使用 org.apache.hadoop.mapreduce 命名空间（可以使用旧的 mapred 使用非常类似的方法，只需检查java应用程序中的api文档：

Job job = Job.getInstance(new Configuration());
// configure job: set input and output types and directories, etc.
job.setJarByClass(MapReduceCassandra.class);
job.submit();

将数据传递到mapreduce作业
如果行键集非常小，可以将其序列化为字符串，并将其作为配置参数传递：

job.getConfiguration().set("CassandraRows", getRowsKeysSerialized()); // TODO: implement serializer
//...
job.submit();

nside作业可以通过上下文对象访问参数：

public void map(
    IntWritable key,  // your key type
    Text value,       // your value type
    Context context
)
{
    // ...
    String rowsSerialized = context.getConfiguration().get("CassandraRows");
    String[] rows = deserializeRows(rowsSerialized);  // TODO: implement deserializer
    //...
}

但是，如果您的集合可能是无界的，那么将其作为参数传递将是一个坏主意。相反，您应该在文件中传递密钥，并利用分布式缓存。然后您可以在提交作业之前将此行添加到上面的部分：

job.addCacheFile(new Path(pathToCassandraKeySetFile).toUri());
//...
job.submit();

在作业内部，您可以通过上下文对象访问此文件：

public void map(
    IntWritable key,  // your key type
    Text value,       // your value type
    Context context
)
{
    // ...
    URI[] cacheFiles = context.getCacheFiles();
    // find, open and read your file here
    // ...
}

注意：所有这些都是针对新的api的( org.apache.hadoop.mapreduce ). 如果你用的是 org.apache.hadoop.mapred 这种方法非常相似，但是在不同的对象上调用了一些相关的方法。

展开查看全部

赞(0）回复(0）举报 2021-06-03

我来回答

如何传递多个输入格式文件来Mapreduce作业？

1条答案

相关问题

热门标签

最新问答