写入数据后使用.saveastable写入hdfs时出现超时错误

tsm1rwdh 于 2021-05-29 发布在 Hadoop

关注(0)|答案(1)|浏览(577)

我正在emr上运行spark 2.3，并尝试使用scala将数据写入hdfs，如下所示：

dataframe.write.
  partitionBy("column1").
  bucketBy(1,"column2").
  sortBy("column2").
  mode("overwrite").
  format("parquet").
  option("path","hdfs:///destination/").
  saveAsTable("result")

一旦数据被写入并且任务完成，我就会得到一个超时错误。错误发生后，我可以看到hdfs中的数据，完全经过处理。
为什么会出现这种错误？有什么意义吗？
看起来主节点正在尝试与另一个ip通信（与任何节点ip都不匹配），但数据已经在hdfs中。
请注意，使用时不会发生这种情况 .save("hdfs:///location/") 或者 .save("s3://bucket/folder/") ，只有 saveAsTable 方法。我需要使用 saveAsTable 以便进行铲斗和分拣。
下面的错误日志片段

18/07/23 16:33:31 WARN HiveExternalCatalog: Persisting bucketed data source table `default`.`result` into Hive metastore in Spark SQL specific format, which is NOT compatible with Hive.
18/07/23 16:35:32 ERROR log: Got exception: org.apache.hadoop.net.ConnectTimeoutException Call From ip-master_node_ip/master.node.ip to other_ip.ec2.internal:8020 failed on socket timeout exception: org.apache.hadoop.net.ConnectTimeoutException: 20000 millis timeout while waiting for channel to be ready for connect. ch : java.nio.channels.SocketChannel[connection-pending remote=other_ip.ec2.internal/other_ip:8020]; For more details see:  http://wiki.apache.org/hadoop/SocketTimeout
org.apache.hadoop.net.ConnectTimeoutException: Call From ip-master_node_ip/master.node.ip to other_ip.ec2.internal:8020 failed on socket timeout exception: org.apache.hadoop.net.ConnectTimeoutException: 20000 millis timeout while waiting for channel to be ready for connect. ch : java.nio.channels.SocketChannel[connection-pending remote=other_ip.ec2.internal/other_ip:8020]; For more details see:  http://wiki.apache.org/hadoop/SocketTimeout

at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: org.apache.hadoop.net.ConnectTimeoutException: 20000 millis timeout while waiting for channel to be ready for connect. ch : java.nio.channels.SocketChannel[connection-pending remote=other_ip.ec2.internal/other_ip:8020]
    at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:534)
    at org.apache.hadoop.ipc.Client$Connection.setupConnection(Client.java:685)
    at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:788)
    at org.apache.hadoop.ipc.Client$Connection.access$3500(Client.java:410)
    at org.apache.hadoop.ipc.Client.getConnection(Client.java:1550)
    at org.apache.hadoop.ipc.Client.call(Client.java:1381)
    ... 110 more
    18/07/23 16:35:32 ERROR log: Converting exception to MetaException
    org.apache.hadoop.net.ConnectTimeoutException: Call From ip-master_node_ip/master.node.ip to other_ip.ec2.internal:8020 failed on socket timeout exception: org.apache.hadoop.net.ConnectTimeoutException: 20000 millis timeout while waiting for channel to be ready for connect. ch : java.nio.channels.SocketChannel[connection-pending remote=other_ip.ec2.internal/other_ip:8020]; For more details see:  http://wiki.apache.org/hadoop/SocketTimeout

  ... 49 elided
Caused by: org.apache.hadoop.net.ConnectTimeoutException: 20000 millis timeout while waiting for channel to be ready for connect. ch : java.nio.channels.SocketChannel[connection-pending remote=other_ip.ec2.internal/other_ip:8020]
  at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:534)
  at org.apache.hadoop.ipc.Client$Connection.setupConnection(Client.java:685)
  at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:788)

作为参考，我尝试了这里发布的解决方案，但是在路径中指定主节点ip时仍然出现错误 hdfs:///master_node_ip:8020/location/") .

hadoop hdfs apache-spark

来源：https://stackoverflow.com/questions/51484232/timeout-error-when-writing-with-saveastable-to-hdfs-after-data-is-written

1条答案

按热度按时间

fdbelqdn1#

如果您的emr集群在默认情况下使用glue metastore，并且该数据库不存在，那么您将看到超时。您可以删除配置，也可以按照建议创建数据库

Classification: hive-site
Property: hive.metastore.client.factory.class
Value: com.amazonaws.glue.catalog.metastore.AWSGlueDataCatalogHiveClientFactory
Source: Cluster configuration

赞(0）回复(0）举报 2021-05-29

我来回答

写入数据后使用.saveastable写入hdfs时出现超时错误

1条答案

相关问题

热门标签

最新问答