saveashadoopdataset从不关闭到zookeeper的连接

iaqfqrcu 于 2021-05-29 发布在 Hadoop

关注(0)|答案(1)|浏览(461)

我正在使用下面的代码来编写hbase

jsonDStream.foreachRDD(new Function<JavaRDD<String>, Void>() {

        @Override
        public Void call(JavaRDD<String> rdd) throws Exception {

            DataFrame jsonFrame = sqlContext.jsonRDD(rdd);
            DataFrame selecteFieldFrame = jsonFrame.select("id_str","created_at","text");

            Configuration config = HBaseConfiguration.create();
            config.set("hbase.zookeeper.quorum", "d-9543");
            config.set("zookeeper.znode.parent","/hbase-unsecure");
            config.set("hbase.zookeeper.property.clientPort", "2181");
            final JobConf jobConfig=new JobConf(config,SveAsHadoopDataSetExample.class);

            jobConfig.setOutputFormat(TableOutputFormat.class);
            jobConfig.set(TableOutputFormat.OUTPUT_TABLE,"tableName");
             selecteFieldFrame.javaRDD().mapToPair(new PairFunction<Row, ImmutableBytesWritable, Put>() {

                @Override
                public Tuple2<ImmutableBytesWritable, Put> call(Row row) throws Exception {
                    // TODO Auto-generated method stub
                    return convertToPut(row);
                }
            }).saveAsHadoopDataset(jobConfig);

            return null;
        }
    });

但当我在zookeeper中看到zkdump时，连接不断增加
任何建议/建议都会大有帮助！

hadoop hbase apache-spark apache-spark-sql spark-streaming

来源：https://stackoverflow.com/questions/37435163/saveashadoopdataset-never-closes-connection-to-zookeeper

1条答案

按热度按时间

8oomwypt1#

我有同样的问题，这是一个hbase错误，我修复它：
将org.apache.hadoop.hbase.mapred.tableoutputformat更改为org.apache.hadoop.hbase.mapreduce.tableoutputformat，并使用org.apache.hadoop.mapreduce.job，而不是org.apache.hadoop.mapred.jobconf
这是一个示例：

import org.apache.hadoop.mapreduce.Job
import org.apache.hadoop.hbase.mapreduce.TableOutputFormat

val conf = HBaseConfiguration.create()
conf.set("hbase.zookeeper.quorum", zk_hosts)
conf.set("hbase.zookeeper.property.clientPort", zk_port)

conf.set(TableOutputFormat.OUTPUT_TABLE, "TABLE_NAME")
val job = Job.getInstance(conf)
job.setOutputFormatClass(classOf[TableOutputFormat[String]])

formatedLines.map{
  case (a,b, c) => {
    val row = Bytes.toBytes(a)

    val put = new Put(row)
    put.setDurability(Durability.SKIP_WAL)

    put.addColumn(Bytes.toBytes("cf"), Bytes.toBytes("node"), Bytes.toBytes(b))
    put.addColumn(Bytes.toBytes("cf"), Bytes.toBytes("topic"), Bytes.toBytes(c))

    (new ImmutableBytesWritable(row), put)
  }
}.saveAsNewAPIHadoopDataset(job.getConfiguration)

这也许对你有帮助！
https://github.com/hortonworks-spark/shc/pull/20/commits/2074067c42c5a454fa4cdeec18c462b5367f23b9

赞(0）回复(0）举报 2021-05-29

我来回答

saveashadoopdataset从不关闭到zookeeper的连接

1条答案

相关问题

热门标签

最新问答