如何使用hadoopwebhdfs读取和传输文件块?

xpszyzbs  于 2021-06-02  发布在  Hadoop
关注(0)|答案(1)|浏览(412)

我需要将大文件(至少14mb)从fiware实验室的cosmos示例传输到我的后端。
我使用spring resttemplate作为此处描述的hadoop webhdfs rest api的客户端接口,但遇到io异常:

Exception in thread "main" org.springframework.web.client.ResourceAccessException: I/O error on GET request for "http://cosmos.lab.fiware.org:14000/webhdfs/v1/user/<user.name>/<path>?op=open&user.name=<user.name>":Truncated chunk ( expected size: 14744230; actual size: 11285103); nested exception is org.apache.http.TruncatedChunkException: Truncated chunk ( expected size: 14744230; actual size: 11285103)
    at org.springframework.web.client.RestTemplate.doExecute(RestTemplate.java:580)
    at org.springframework.web.client.RestTemplate.execute(RestTemplate.java:545)
    at org.springframework.web.client.RestTemplate.exchange(RestTemplate.java:466)

这是生成异常的实际代码:

RestTemplate restTemplate = new RestTemplate();
restTemplate.setRequestFactory(new HttpComponentsClientHttpRequestFactory());
restTemplate.getMessageConverters().add(new ByteArrayHttpMessageConverter()); 
HttpEntity<?> entity = new HttpEntity<>(headers);

UriComponentsBuilder builder = 
    UriComponentsBuilder.fromHttpUrl(hdfs_path)
        .queryParam("op", "OPEN")
        .queryParam("user.name", user_name);

ResponseEntity<byte[]> response =
    restTemplate
        .exchange(builder.build().encode().toUri(), HttpMethod.GET, entity, byte[].class);

FileOutputStream output = new FileOutputStream(new File(local_path));
IOUtils.write(response.getBody(), output);
output.close();

我想这是因为cosmos示例的传输超时,所以我试图发送一个 curl 通过指定 offset, buffer and length 参数,但它们似乎被忽略了:我得到了整个文件。
提前谢谢。

cnh2zyt3

cnh2zyt31#

好吧,我找到了解决办法。我不明白为什么,但是如果我使用jetty httpclient而不是restemplate(以及apache httpclient),传输就会成功。现在可以了:

ContentExchange exchange = new ContentExchange(true){
            ByteArrayOutputStream bos = new ByteArrayOutputStream();

            protected void onResponseContent(Buffer content) throws IOException {
                bos.write(content.asArray(), 0, content.length());
            }

            protected void onResponseComplete() throws IOException {
                if (getResponseStatus()== HttpStatus.OK_200) {
                    FileOutputStream output = new FileOutputStream(new File(<local_path>));
                    IOUtils.write(bos.toByteArray(), output);
                    output.close();
                }
            }

        };

UriComponentsBuilder builder = UriComponentsBuilder.fromHttpUrl(<hdfs_path>)
                .queryParam("op", "OPEN")
                .queryParam("user.name", <user_name>);

exchange.setURL(builder.build().encode().toUriString());
exchange.setMethod("GET");
exchange.setRequestHeader("X-Auth-Token", <token>);

HttpClient client = new HttpClient();
client.setConnectorType(HttpClient.CONNECTOR_SELECT_CHANNEL);
client.setMaxConnectionsPerAddress(200);
client.setThreadPool(new QueuedThreadPool(250)); 
client.start();
client.send(exchange);
exchange.waitForDone();

apache http客户机上是否存在任何已知的分块文件传输错误?
我的请求是不是做错了什么?

更新:我仍然没有解决办法

经过几次测试,我发现我的问题还没有解决。我发现安装在cosmos示例上的hadoop版本是非常旧的hadoop0.20.2-cdh3u6,我读到webhdfs不支持部分文件传输 length 参数(从v 0.23.3开始引入)。这些是我使用发送get请求时从服务器收到的头 curl :

Access-Control-Allow-Origin: *
Access-Control-Allow-Methods: HEAD, POST, GET, OPTIONS, DELETE
Access-Control-Allow-Headers: origin, content-type, X-Auth-Token, Tenant-ID, Authorization
server: Apache-Coyote/1.1
set-cookie: hadoop.auth="u=<user>&p=<user>&t=simple&e=1448999699735&s=rhxMPyR1teP/bIJLfjOLWvW2pIQ="; Version=1; Path=/
Content-Type: application/octet-stream; charset=utf-8
content-length: 172934567
date: Tue, 01 Dec 2015 09:54:59 GMT
connection: close

如您所见,连接头被设置为关闭。实际上,每次get请求持续超过120秒时,连接通常都会关闭,即使文件传输尚未完成。
总之,如果cosmos不支持大文件传输,我可以说它是完全无用的。
如果我错了,或者你知道解决方法,请纠正我。

相关问题