无法将文件从ftp复制到hdfs

anauzrmj 于 2021-06-02 发布在 Hadoop

关注(0)|答案(2)|浏览(458)

我有ftp服务器（f[ftp]）、linux box（s[standalone]）和hadoop集群（c[cluster]）。当前文件流是f->s->c。我试图通过跳过s来提高性能。
电流为：

wget ftp://user:password@ftpserver/absolute_path_to_file
hadoop fs -copyFromLocal path_to_file path_in_hdfs

我试过：

hadoop fs -cp ftp://user:password@ftpserver/absolute_path_to_file path_in_hdfs

以及：

hadoop distcp ftp://user:password@ftpserver/absolute_path_to_file path_in_hdfs

两个都挂了。作为作业的distcp之一被超时终止。日志（hadoop作业日志）只说它被超时终止了。我试着从ftp从c的某个节点上下载wget，结果成功了。可能是什么原因，以及如何找出它的任何暗示？

hadoop hdfs ftp DistCp

来源：https://stackoverflow.com/questions/26000217/failed-to-copy-file-from-ftp-to-hdfs

2条答案

按热度按时间

wooyq4lh1#

通过标准管道：

wget ftp://user:password@ftpserver/absolute_path_to_file | hadoop fs -put - path_in_hdfs

单指令hdfs put从stdin读取。

赞(0）回复(0）举报 2021-06-03

bnl4lu3b2#

hadoop fs-cp版ftp://user:password@ftpserver.com/absolute_path_to_file 路径在hdfs中
由于源文件是本地文件系统中的文件，因此不能使用此选项。它没有考虑到你试图通过的计划。请参阅javadoc:filesystem
distcp只适用于大型集群内或集群间（读作hadoop集群，即hdfs）。同样，它无法从ftp获取数据。两步流程仍然是您的最佳选择。或者编写一个程序，从ftp读取数据并写入hdfs。

赞(0）回复(0）举报 2021-06-03

我来回答

无法将文件从ftp复制到hdfs

2条答案

相关问题

热门标签

最新问答