我想使用hadoop fsck命令跳过指定路径上的文件检查。我们能做到吗?我正在使用以下命令:hadoop fsck>/output.txt我也检查了hdfs指南,但是没有什么可以从上面的命令中排除路径。请帮忙。
9rbhqvlz1#
从hadoop2.9.0开始,没有办法在hadoopfsck命令中指定排除路径。但是您可以使用webhdfsrestapi来获得与fsck相同的文件系统健康信息。使用这个api,我们可以使用liststatusapi获取目录中所有文件的信息,或者使用getfilestatusapi获取单个文件的信息。对于目录:
curl -i "http://<HOST>:<PORT>/webhdfs/v1/<DIRECTORY_PATH>?op=LISTSTATUS"
对于文件:
curl -i "http://<HOST>:<PORT>/webhdfs/v1/<FILE_PATH>?op=GETFILESTATUS"
它们将返回带有filestatuses json对象的响应。请在下面找到nn为目录返回的示例响应:
curl -i "http://<NN_HOST>:<HTTP_PORT>/webhdfs/v1/<DIRECTORY_PATH>?op=LISTSTATUS" HTTP/1.1 200 OK Cache-Control: no-cache Content-Type: application/json Transfer-Encoding: chunked Server: Jetty(6.1.26.hwx) {"FileStatuses":{"FileStatus":[ {"accessTime":1489059994224,"blockSize":134217728,"childrenNum":0,"fileId":209158298,"group":"hdfs","length":0,"modificationTime":1489059994227,"owner":"XXX","pathSuffix":"_SUCCESS","permission":"644","replication":3,"storagePolicy":0,"type":"FILE"}, {"accessTime":1489059969939,"blockSize":134217728,"childrenNum":0,"fileId":209158053,"group":"hdfs","length":0,"modificationTime":1489059986846,"owner":"XXX","pathSuffix":"part-m-00000","permission":"644","replication":3,"storagePolicy":0,"type":"FILE"}, {"accessTime":1489059982614,"blockSize":134217728,"childrenNum":0,"fileId":209158225,"group":"hdfs","length":0,"modificationTime":1489059993497,"owner":"XXX","pathSuffix":"part-m-00001","permission":"644","replication":3,"storagePolicy":0,"type":"FILE"}, {"accessTime":1489059977524,"blockSize":134217728,"childrenNum":0,"fileId":209158188,"group":"hdfs","length":0,"modificationTime":1489059983034,"owner":"XXX","pathSuffix":"part-m-00002","permission":"644","replication":3,"storagePolicy":0,"type":"FILE"}]}}
1条答案
按热度按时间9rbhqvlz1#
从hadoop2.9.0开始,没有办法在hadoopfsck命令中指定排除路径。
但是您可以使用webhdfsrestapi来获得与fsck相同的文件系统健康信息。使用这个api,我们可以使用liststatusapi获取目录中所有文件的信息,或者使用getfilestatusapi获取单个文件的信息。
对于目录:
对于文件:
它们将返回带有filestatuses json对象的响应。
请在下面找到nn为目录返回的示例响应: