spark hbase错误java.lang.illegalstateexception:未读块数据

rsaldnfx  于 2021-06-09  发布在  Hbase
关注(0)|答案(3)|浏览(463)

我试图使用jersey restapi通过javaspark程序从hbase表中获取记录,然后我得到下面提到的错误,但是当我通过spark jar访问hbase表时,代码没有错误地执行。
我有一个用于hbase的2个工作节点和用于spark的2个工作节点,它们由同一个主节点维护。
警告tasksetmanager:阶段0.0中的任务1.0丢失(tid 1,172.31.16.140):java.lang.illegalstateexception:java.io.objectinputstream$blockdatainputstream.setblockdatamode(objectinputstream)处的未读块数据。java:2421)在java.io.objectinputstream.readobject0(objectinputstream。java:1382)在java.io.objectinputstream.defaultreadfields(objectinputstream。java:1990)在java.io.objectinputstream.readserialdata(objectinputstream。java:1915)在java.io.objectinputstream.readordinaryobject(objectinputstream。java:1798)在java.io.objectinputstream.readobject0(objectinputstream。java:1350)在java.io.objectinputstream.readobject(objectinputstream。java:370)在org.apache.spark.serializer.javadeserializationstream.readobject(javaserializer。scala:69)在org.apache.spark.serializer.javaserializerinstance.deserialize(javaserializer。scala:95)在org.apache.spark.executor.executor$taskrunner.run(executor。scala:194)位于java.util.concurrent.threadpoolexecutor.runworker(threadpoolexecutor。java:1145)在java.util.concurrent.threadpoolexecutor$worker.run(threadpoolexecutor。java:615)在java.lang.thread.run(线程。java:745)

smtd7mpg

smtd7mpg1#

好吧,我可能知道你的问题,因为我刚刚经历过。
原因很可能是遗漏了一些hbase jar,因为在spark运行期间,spark需要通过hbase jar读取数据,如果不存在,那么会抛出一些异常,应该怎么办?这很容易
在提交作业之前,您需要添加params--jars,并按如下方式加入一些jar:
--jars/root/server/hive/lib/hive-hbase-handler-1.2.1.jar,
/root/server/hbase/lib/hbase-client-0.98.12-hadoop2.jar,
/root/server/hbase/lib/hbase-common-0.98.12-hadoop2.jar,
/root/server/hbase/lib/hbase-server-0.98.12-hadoop2.jar,
/root/server/hbase/lib/hbase-hadoop2-compat-0.98.12-hadoop2.jar,
/root/server/hbase/lib/guava-12.0.1.jar,
/root/server/hbase/lib/hbase-protocol-0.98.12-hadoop2.jar,
/root/server/hbase/lib/htrace-core-2.04.jar
如果可以,尽情享受吧!

nkkqxpd9

nkkqxpd92#

干熄焦/干熄焦:
步骤1:将hbase-site.xml文件复制到/etc/spark/conf/目录中。cp/opt/cloudera/parcels/cdh/lib/hbase/conf/hbase-site.xml/etc/spark/conf/
步骤2:将以下库添加到spark submit/spark shell。

/opt/cloudera/parcels/CDH/jars/hive-hbase-handler-*.jar
/opt/cloudera/parcels/CDH/lib/hbase/hbase-client-*.jar
/opt/cloudera/parcels/CDH/lib/hbase/hbase-common-*.jar
/opt/cloudera/parcels/CDH/lib/hbase/hbase-server-*.jar
/opt/cloudera/parcels/CDH/lib/hbase/hbase-hadoop2-compat-*.jar
/opt/cloudera/parcels/CDH/lib/hbase/hbase-protocol-*.jar
/opt/cloudera/parcels/CDH/jars/guava-28.1-jre.jar
/opt/cloudera/parcels/CDH/jars/htrace-core-3.2.0-incubating.jar

Spark壳:

sudo -u hive spark-shell --master yarn --jars /opt/cloudera/parcels/CDH/jars/hive-hbase-handler-*.jar, /opt/cloudera/parcels/CDH/lib/hbase/hbase-client-*.jar, /opt/cloudera/parcels/CDH/lib/hbase/hbase-common-*.jar, /opt/cloudera/parcels/CDH/lib/hbase/hbase-server-*.jar, /opt/cloudera/parcels/CDH/lib/hbase/hbase-hadoop2-compat-*.jar, /opt/cloudera/parcels/CDH/lib/hbase/hbase-protocol-*.jar,/opt/cloudera/parcels/CDH/jars/guava-28.1-jre.jar,/opt/cloudera/parcels/CDH/jars/htrace-core-3.2.0-incubating.jar --files /etc/spark/conf/hbase-site.xml
anauzrmj

anauzrmj3#

在提交用java api实现的spark作业时,我在cdh5.4.0中遇到了同样的问题,下面是我的解决方案:
解决方案1:使用spark提交:

--jars zookeeper-3.4.5-cdh5.4.0.jar, 
hbase-client-1.0.0-cdh5.4.0.jar, 
hbase-common-1.0.0-cdh5.4.0.jar,
hbase-server1.0.0-cdh5.4.0.jar,
hbase-protocol1.0.0-cdh5.4.0.jar,
htrace-core-3.1.0-incubating.jar,
// custom jars which are needed in the spark executors

解决方案2:在代码中使用sparkconf:

SparkConf.setJars(new String[]{"zookeeper-3.4.5-cdh5.4.0.jar",
"hbase-client-1.0.0-cdh5.4.0.jar",
"hbase-common-1.0.0-cdh5.4.0.jar",
"hbase-server1.0.0-cdh5.4.0.jar",
"hbase-protocol1.0.0-cdh5.4.0.jar",
"htrace-core-3.1.0-incubating.jar",
// custom jars which are needed in the spark executors
});

汇总
这个问题是由spark项目中缺少jar引起的,您需要将这些jar添加到您的项目类路径中,此外,使用上面的两个解决方案来帮助将这些jar分发到您的spark集群中。

相关问题