JavaHadoop客户端无法在hdfs中列出文件

55ooxyrt  于 2021-08-20  发布在  Java
关注(0)|答案(1)|浏览(382)

我正在尝试通过JavaHadoop客户端操作hdfs。但是当我打电话的时候 FileSystem::listFiles ,返回的迭代器不给我任何条目。
以下是我的java代码:

import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.LocatedFileStatus;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.fs.RemoteIterator;
import java.net.URI;
import java.net.URISyntaxException;
import java.io.IOException;

class HadoopTest {
    public static void main(String[] args) throws IOException, URISyntaxException {
        String url = "hdfs://10.2.206.148";
        FileSystem fs = FileSystem.get(new URI(url), new Configuration());
        System.out.println("get fs success!");
        RemoteIterator<LocatedFileStatus> iterator = fs.listFiles(new Path("/"), false);
        while (iterator.hasNext()) {
            LocatedFileStatus lfs = iterator.next();
            System.out.println(lfs.getPath().toString());
        }
        System.out.println("iteration finished");
    }
}

以下是输出:

SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/home/admin/pengduo/hadoop_test/lib/logback-classic-1.2.3.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/home/admin/pengduo/hadoop_test/lib/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [ch.qos.logback.classic.util.ContextSelectorStaticBinder]
10:49:26.019 [main] DEBUG org.apache.hadoop.util.Shell - setsid exited with exit code 0
10:49:26.064 [main] DEBUG org.apache.hadoop.metrics2.lib.MutableMetricsFactory - field org.apache.hadoop.metrics2.lib.MutableRate org.apache.hadoop.security.UserGroupInformation$UgiMetrics.loginSuccess with annotation @org.apache.hadoop.metrics2.annotation.Metric(always=false, sampleName=Ops, valueName=Time, about=, interval=10, type=DEFAULT, value=[Rate of successful kerberos logins and latency (milliseconds)])
10:49:26.069 [main] DEBUG org.apache.hadoop.metrics2.lib.MutableMetricsFactory - field org.apache.hadoop.metrics2.lib.MutableRate org.apache.hadoop.security.UserGroupInformation$UgiMetrics.loginFailure with annotation @org.apache.hadoop.metrics2.annotation.Metric(always=false, sampleName=Ops, valueName=Time, about=, interval=10, type=DEFAULT, value=[Rate of failed kerberos logins and latency (milliseconds)])
10:49:26.069 [main] DEBUG org.apache.hadoop.metrics2.lib.MutableMetricsFactory - field org.apache.hadoop.metrics2.lib.MutableRate org.apache.hadoop.security.UserGroupInformation$UgiMetrics.getGroups with annotation @org.apache.hadoop.metrics2.annotation.Metric(always=false, sampleName=Ops, valueName=Time, about=, interval=10, type=DEFAULT, value=[GetGroups])
10:49:26.070 [main] DEBUG org.apache.hadoop.metrics2.lib.MutableMetricsFactory - field private org.apache.hadoop.metrics2.lib.MutableGaugeLong org.apache.hadoop.security.UserGroupInformation$UgiMetrics.renewalFailuresTotal with annotation @org.apache.hadoop.metrics2.annotation.Metric(always=false, sampleName=Ops, valueName=Time, about=, interval=10, type=DEFAULT, value=[Renewal failures since startup])
10:49:26.070 [main] DEBUG org.apache.hadoop.metrics2.lib.MutableMetricsFactory - field private org.apache.hadoop.metrics2.lib.MutableGaugeInt org.apache.hadoop.security.UserGroupInformation$UgiMetrics.renewalFailures with annotation @org.apache.hadoop.metrics2.annotation.Metric(always=false, sampleName=Ops, valueName=Time, about=, interval=10, type=DEFAULT, value=[Renewal failures since last successful login])
10:49:26.071 [main] DEBUG org.apache.hadoop.metrics2.impl.MetricsSystemImpl - UgiMetrics, User and group related metrics
10:49:26.084 [main] DEBUG org.apache.hadoop.security.SecurityUtil - Setting hadoop.security.token.service.use_ip to true
10:49:26.096 [main] DEBUG org.apache.hadoop.security.Groups -  Creating new Groups object
10:49:26.097 [main] DEBUG org.apache.hadoop.util.NativeCodeLoader - Trying to load the custom-built native-hadoop library...
10:49:26.097 [main] DEBUG org.apache.hadoop.util.NativeCodeLoader - Failed to load native-hadoop with error: java.lang.UnsatisfiedLinkError: no hadoop in java.library.path
10:49:26.097 [main] DEBUG org.apache.hadoop.util.NativeCodeLoader - java.library.path=/usr/java/packages/lib/amd64:/usr/lib64:/lib64:/lib:/usr/lib
10:49:26.097 [main] WARN org.apache.hadoop.util.NativeCodeLoader - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
10:49:26.098 [main] DEBUG org.apache.hadoop.util.PerformanceAdvisory - Falling back to shell based
10:49:26.098 [main] DEBUG org.apache.hadoop.security.JniBasedUnixGroupsMappingWithFallback - Group mapping impl=org.apache.hadoop.security.ShellBasedUnixGroupsMapping
10:49:26.153 [main] DEBUG org.apache.hadoop.security.Groups - Group mapping impl=org.apache.hadoop.security.JniBasedUnixGroupsMappingWithFallback; cacheTimeout=300000; warningDeltaMs=5000
10:49:26.157 [main] DEBUG org.apache.hadoop.security.UserGroupInformation - hadoop login
10:49:26.158 [main] DEBUG org.apache.hadoop.security.UserGroupInformation - hadoop login commit
10:49:26.161 [main] DEBUG org.apache.hadoop.security.UserGroupInformation - using local user:UnixPrincipal: admin
10:49:26.161 [main] DEBUG org.apache.hadoop.security.UserGroupInformation - Using user: "UnixPrincipal: admin" with name admin
10:49:26.161 [main] DEBUG org.apache.hadoop.security.UserGroupInformation - User entry: "admin"
10:49:26.161 [main] DEBUG org.apache.hadoop.security.UserGroupInformation - UGI loginUser:admin (auth:SIMPLE)
log4j:WARN No appenders could be found for logger (org.apache.htrace.core.Tracer).
log4j:WARN Please initialize the log4j system properly.
log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.
10:49:26.201 [main] DEBUG org.apache.hadoop.fs.FileSystem - Loading filesystems
10:49:26.211 [main] DEBUG org.apache.hadoop.fs.FileSystem - file:// = class org.apache.hadoop.fs.LocalFileSystem from /home/admin/pengduo/hadoop_test/lib/hadoop-common-3.2.1.jar
10:49:26.216 [main] DEBUG org.apache.hadoop.fs.FileSystem - viewfs:// = class org.apache.hadoop.fs.viewfs.ViewFileSystem from /home/admin/pengduo/hadoop_test/lib/hadoop-common-3.2.1.jar
10:49:26.218 [main] DEBUG org.apache.hadoop.fs.FileSystem - har:// = class org.apache.hadoop.fs.HarFileSystem from /home/admin/pengduo/hadoop_test/lib/hadoop-common-3.2.1.jar
10:49:26.219 [main] DEBUG org.apache.hadoop.fs.FileSystem - http:// = class org.apache.hadoop.fs.http.HttpFileSystem from /home/admin/pengduo/hadoop_test/lib/hadoop-common-3.2.1.jar
10:49:26.219 [main] DEBUG org.apache.hadoop.fs.FileSystem - https:// = class org.apache.hadoop.fs.http.HttpsFileSystem from /home/admin/pengduo/hadoop_test/lib/hadoop-common-3.2.1.jar
10:49:26.226 [main] DEBUG org.apache.hadoop.fs.FileSystem - hdfs:// = class org.apache.hadoop.hdfs.DistributedFileSystem from /home/admin/pengduo/hadoop_test/lib/hadoop-hdfs-client-3.2.1.jar
10:49:26.233 [main] DEBUG org.apache.hadoop.fs.FileSystem - webhdfs:// = class org.apache.hadoop.hdfs.web.WebHdfsFileSystem from /home/admin/pengduo/hadoop_test/lib/hadoop-hdfs-client-3.2.1.jar
10:49:26.234 [main] DEBUG org.apache.hadoop.fs.FileSystem - swebhdfs:// = class org.apache.hadoop.hdfs.web.SWebHdfsFileSystem from /home/admin/pengduo/hadoop_test/lib/hadoop-hdfs-client-3.2.1.jar
10:49:26.234 [main] DEBUG org.apache.hadoop.fs.FileSystem - Looking for FS supporting hdfs
10:49:26.234 [main] DEBUG org.apache.hadoop.fs.FileSystem - looking for configuration option fs.hdfs.impl
10:49:26.251 [main] DEBUG org.apache.hadoop.fs.FileSystem - Looking in service filesystems for implementation class
10:49:26.251 [main] DEBUG org.apache.hadoop.fs.FileSystem - FS for hdfs is class org.apache.hadoop.hdfs.DistributedFileSystem
10:49:26.282 [main] DEBUG org.apache.hadoop.hdfs.client.impl.DfsClientConf - dfs.client.use.legacy.blockreader.local = false
10:49:26.282 [main] DEBUG org.apache.hadoop.hdfs.client.impl.DfsClientConf - dfs.client.read.shortcircuit = false
10:49:26.282 [main] DEBUG org.apache.hadoop.hdfs.client.impl.DfsClientConf - dfs.client.domain.socket.data.traffic = false
10:49:26.282 [main] DEBUG org.apache.hadoop.hdfs.client.impl.DfsClientConf - dfs.domain.socket.path =
10:49:26.291 [main] DEBUG org.apache.hadoop.hdfs.DFSClient - Sets dfs.client.block.write.replace-datanode-on-failure.min-replication to 0
10:49:26.297 [main] DEBUG org.apache.hadoop.io.retry.RetryUtils - multipleLinearRandomRetry = null
10:49:26.312 [main] DEBUG org.apache.hadoop.ipc.Server - rpcKind=RPC_PROTOCOL_BUFFER, rpcRequestWrapperClass=class org.apache.hadoop.ipc.ProtobufRpcEngine$RpcProtobufRequest, rpcInvoker=org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker@7c729a55
10:49:26.322 [main] DEBUG org.apache.hadoop.ipc.Client - getting client out of cache: org.apache.hadoop.ipc.Client@222545dc
10:49:26.587 [main] DEBUG org.apache.hadoop.util.PerformanceAdvisory - Both short-circuit local reads and UNIX domain socket are disabled.
10:49:26.593 [main] DEBUG org.apache.hadoop.hdfs.protocol.datatransfer.sasl.DataTransferSaslUtil - DataTransferProtocol not using SaslPropertiesResolver, no QOP found in configuration for dfs.data.transfer.protection
get fs success!
10:49:26.629 [main] DEBUG org.apache.hadoop.ipc.Client - The ping interval is 60000 ms.
10:49:26.631 [main] DEBUG org.apache.hadoop.ipc.Client - Connecting to /10.2.206.148:8020
10:49:26.658 [IPC Client (1923598304) connection to /10.2.206.148:8020 from admin] DEBUG org.apache.hadoop.ipc.Client - IPC Client (1923598304) connection to /10.2.206.148:8020 from admin: starting, having connections 1
10:49:26.660 [IPC Parameter Sending Thread #0] DEBUG org.apache.hadoop.ipc.Client - IPC Client (1923598304) connection to /10.2.206.148:8020 from admin sending #0 org.apache.hadoop.hdfs.protocol.ClientProtocol.getListing
10:49:26.666 [IPC Client (1923598304) connection to /10.2.206.148:8020 from admin] DEBUG org.apache.hadoop.ipc.Client - IPC Client (1923598304) connection to /10.2.206.148:8020 from admin got value #0
10:49:26.666 [main] DEBUG org.apache.hadoop.ipc.ProtobufRpcEngine - Call: getListing took 52ms
iteration finished
10:49:26.695 [shutdown-hook-0] DEBUG org.apache.hadoop.ipc.Client - stopping client from cache: org.apache.hadoop.ipc.Client@222545dc
10:49:26.695 [shutdown-hook-0] DEBUG org.apache.hadoop.ipc.Client - removing client from cache: org.apache.hadoop.ipc.Client@222545dc
10:49:26.695 [shutdown-hook-0] DEBUG org.apache.hadoop.ipc.Client - stopping actual client because no more references remain: org.apache.hadoop.ipc.Client@222545dc
10:49:26.695 [shutdown-hook-0] DEBUG org.apache.hadoop.ipc.Client - Stopping client
10:49:26.696 [IPC Client (1923598304) connection to /10.2.206.148:8020 from admin] DEBUG org.apache.hadoop.ipc.Client - IPC Client (1923598304) connection to /10.2.206.148:8020 from admin: closed
10:49:26.696 [IPC Client (1923598304) connection to /10.2.206.148:8020 from admin] DEBUG org.apache.hadoop.ipc.Client - IPC Client (1923598304) connection to /10.2.206.148:8020 from admin: stopped, remaining connections 0
10:49:26.797 [Thread-4] DEBUG org.apache.hadoop.util.ShutdownHookManager - Completed shutdown in 0.102 seconds; Timeouts: 0
10:49:26.808 [Thread-4] DEBUG org.apache.hadoop.util.ShutdownHookManager - ShutdownHookManger completed shutdown.

请注意,我们可以成功地获取文件系统。迭代执行时没有错误。
但是,请使用 hadoop fs 命令看起来不错:

$ $HADOOP_HOME/bin/hadoop fs -ls hdfs://10.2.206.148/
Warning: fs.defaultFs is not set when running "ls" command.
Found 4 items
drwxr-x--x   - hadoop    hadoop          0 2020-09-21 20:29 hdfs://10.2.206.148/apps
drwxr-x--x   - hadoop    hadoop          0 2021-07-08 10:44 hdfs://10.2.206.148/spark-history
drwxrwxrwt   - root      hadoop          0 2021-07-08 10:43 hdfs://10.2.206.148/tmp
drwxr-x--t   - hadoop    hadoop          0 2020-11-20 11:31 hdfs://10.2.206.148/user

我已经准备好了 HADOOP_HOME 适当地。
我的hadoop libs版本是 3.2.1 :

$ ll hadoop-*
-rw-r--r-- 1 admin admin   60258 Jul  8 10:42 hadoop-annotations-3.2.1.jar
-rw-r--r-- 1 admin admin  139109 Jul  8 10:42 hadoop-auth-3.2.1.jar
-rw-r--r-- 1 admin admin   44163 Jul  8 10:42 hadoop-client-3.2.1.jar
-rw-r--r-- 1 admin admin 4137520 Jul  8 10:42 hadoop-common-3.2.1.jar
-rw-r--r-- 1 admin admin 5959246 Jul  8 10:42 hadoop-hdfs-3.2.1.jar
-rw-r--r-- 1 admin admin 5094412 Jul  8 10:42 hadoop-hdfs-client-3.2.1.jar
-rw-r--r-- 1 admin admin  805845 Jul  8 10:42 hadoop-mapreduce-client-common-3.2.1.jar
-rw-r--r-- 1 admin admin 1657002 Jul  8 10:42 hadoop-mapreduce-client-core-3.2.1.jar
-rw-r--r-- 1 admin admin   85900 Jul  8 10:42 hadoop-mapreduce-client-jobclient-3.2.1.jar
-rw-r--r-- 1 admin admin 3287723 Jul  8 10:42 hadoop-yarn-api-3.2.1.jar
-rw-r--r-- 1 admin admin  322882 Jul  8 10:42 hadoop-yarn-client-3.2.1.jar
-rw-r--r-- 1 admin admin 2919779 Jul  8 10:42 hadoop-yarn-common-3.2.1.jar

我不明白为什么java hadoop客户端的行为不同于hadoop cli,以及如何使我的java程序正确执行。有人能帮我吗?非常感谢!

a0x5cqrl

a0x5cqrl1#

这是我自己弄明白的。问题是我使用 FileSystem::listFiles . 此方法将仅列出给定路径下的所有文件(而不是目录)。而我在给定路径下只有4个目录。要列出所有条目,包括文件和目录以及给定的路径,我应该使用 FileSystem::listLocatedStatus 而不是 FileSystem::listFiles .

// this will list only the files but not the directories under "/"
// RemoteIterator<LocatedFileStatus> iterator = fs.listFiles(new Path("/"), false);

// this will list all entries including the files and the directories
RemoteIterator<LocatedFileStatus> iterator = fs.listLocatedStatus(new Path("/"));

相关问题