sqoop无法从postgres导入到s3

lf5gs5x2 于 2021-06-03 发布在 Sqoop

关注(0)|答案(1)|浏览(553)

在日常操作中，我将数据从postgresql导入hdfs，并将hdfs导入s3(sqoop import[postgres to hdfs]&distcp[from hdfs to s3]）
我想删除中间步骤（hdfs）并使用sqoop直接将数据导入s3 bucket。
但是，相同的sqoop字符串在导入操作结束时失败。

sqoop import 
-Dmapreduce.map.memory.mb="8192" 
-Dmapreduce.map.java.opts="-Xmx7200m" 
-Dmapreduce.task.timeout=0 
-Dmapreduce.task.io.sort.mb="2400" 
--connect $conn_string$ 
--fetch-size=20000 
--username $user_name$ 
--p $password$ 
--num-mappers 20 
--query "SELECT * FROM table1 WHERE table1.id > 10000000 and table1.id < 20000000 and \$CONDITIONS" 
--hcatalog-database $schema_name$ 
--hcatalog-table $table_name$ 
--hcatalog-storage-stanza "STORED AS PARQUET LOCATION s3a://path/to/destination"
--split-by table1.id

我也试过了 --target-dir s3a://path/to/destination 而不是 ....... LOCATION s3a://path/to/destination 在“Map：%100已完成”之后，它将抛出以下错误消息：

Error: java.io.IOException: Could not clean up TaskAttemptID:attempt_1571557098082_15536_m_000004_0@s3a://path/to/destination_DYN0.6894861001907555/ingest_day=__HIVE_DEFAULT_PARTITION__
        at org.apache.hive.hcatalog.mapreduce.TaskCommitContextRegistry.commitTask(TaskCommitContextRegistry.java:83)
        at org.apache.hive.hcatalog.mapreduce.FileOutputCommitterContainer.commitTask(FileOutputCommitterContainer.java:145)
        at org.apache.hadoop.mapred.Task.commit(Task.java:1200)
        at org.apache.hadoop.mapred.Task.done(Task.java:1062)
        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:345)
        at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:170)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:422)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1866)
        at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:164)
Caused by: java.io.IOException: Could not rename 
s3a://path/to/destination/_DYN0.6894861001907555/ingest_day=20180522/_temporary/1/_temporary/attempt_1571557098082_15536_m_000004_0 
to 
s3a://path/to/destination/_DYN0.6894861001907555/ingest_day=20180522/_temporary/1/task_1571557098082_15536_m_000004
        at org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter.commitTask(FileOutputCommitter.java:579)
        at org.apache.hadoop.mapred.FileOutputCommitter.commitTask(FileOutputCommitter.java:172)
        at org.apache.hadoop.mapred.OutputCommitter.commitTask(OutputCommitter.java:343)
        at org.apache.hive.hcatalog.mapreduce.DynamicPartitionFileRecordWriterContainer$1.commitTask(DynamicPartitionFileRecordWriterContainer.java:125)
        at org.apache.hive.hcatalog.mapreduce.TaskCommitContextRegistry.commitTask(TaskCommitContextRegistry.java:80)
        ... 9 more```

sqoop amazon-s3 data-ingestion

来源：https://stackoverflow.com/questions/58802225/sqoop-fails-to-import-from-postgres-to-s3