将数据从pyspark保存到hbase时出错

7gcisfzg  于 2021-07-13  发布在  Spark
关注(0)|答案(0)|浏览(299)

我正在尝试使用pyspark将sparkDataframe写入hbase。我上传了spark hbase依赖项。通过使用jupyter笔记本我运行的代码。另外,我还在hbase的默认名称空间中创建了一个表。
我通过运行下面的命令启动了pyspark。我的spark版本:spark 3.x和hbase版本:hbase-2.2.6

pyspark --packages com.hortonworks:shc:1.0.0-1.6-s_2.10 --repositories http://repo.hortonworks.com/content/groups/public/ --files /home/vijee/hbase-2.2.6-bin/conf/hbase-site.xml

已成功添加依赖项

df = sc.parallelize([('a', 'def'), ('b', 'abc')]).toDF(schema=['col0', 'col1'])

catalog = ''.join("""{
     "table":{"namespace":"default", "name":"smTable"},
     "rowkey":"c1",
     "columns":{
    "col0":{"cf":"rowkey", "col":"c1", "type":"string"},
    "col1":{"cf":"t1", "col":"c2", "type":"string"}
   }
      }""".split())

df.write.options(catalog=catalog).format('org.apache.spark.sql.execution.datasources.hbase').save()

当我运行上面的语句时,我得到下面的错误。由于我是新来的Spark,我无法理解错误。
一开始,我尝试使用csv文件,遇到了相同的问题:java.lang.abstractmethoderror。现在我使用的样本数据仍然得到相同的错误。

Py4JJavaError                             Traceback (most recent call last)
<ipython-input-9-cfcf107b1f03> in <module>
----> 1 df.write.options(catalog=catalog,newtable=5).format('org.apache.spark.sql.execution.datasources.hbase').save()

~/spark-3.0.1-bin-hadoop2.7/python/pyspark/sql/readwriter.py in save(self, path, format, mode, partitionBy,**options)
    823             self.format(format)
    824         if path is None:
--> 825             self._jwrite.save()
    826         else:
    827             self._jwrite.save(path)

~/spark-3.0.1-bin-hadoop2.7/python/lib/py4j-0.10.9-src.zip/py4j/java_gateway.py in __call__(self, *args)
   1302 
   1303         answer = self.gateway_client.send_command(command)
-> 1304         return_value = get_return_value(
   1305             answer, self.gateway_client, self.target_id, self.name)
   1306 

~/spark-3.0.1-bin-hadoop2.7/python/pyspark/sql/utils.py in deco(*a,**kw)
    126     def deco(*a,**kw):
    127         try:
--> 128             return f(*a,**kw)
    129         except py4j.protocol.Py4JJavaError as e:
    130             converted = convert_exception(e.java_exception)

~/spark-3.0.1-bin-hadoop2.7/python/lib/py4j-0.10.9-src.zip/py4j/protocol.py in get_return_value(answer, gateway_client, target_id, name)
    324             value = OUTPUT_CONVERTER[type](answer[2:], gateway_client)
    325             if answer[1] == REFERENCE_TYPE:
--> 326                 raise Py4JJavaError(
    327                     "An error occurred while calling {0}{1}{2}.\n".
    328                     format(target_id, ".", name), value)

Py4JJavaError: An error occurred while calling o114.save.
: java.lang.AbstractMethodError: org.apache.spark.sql.execution.datasources.hbase.DefaultSource.createRelation(Lorg/apache/spark/sql/SQLContext;Lorg/apache/spark/sql/SaveMode;Lscala/collection/immutable/Map;Lorg/apache/spark/sql/Dataset;)Lorg/apache/spark/sql/sources/BaseRelation;
    at org.apache.spark.sql.execution.datasources.SaveIntoDataSourceCommand.run(SaveIntoDataSourceCommand.scala:46)
    at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:70)

暂无答案!

目前还没有任何答案,快来回答吧!

相关问题