GoogleCloudConnectorforHadoop不支持pig

l2osamch  于 2021-05-29  发布在  Hadoop
关注(0)|答案(1)|浏览(664)

我正在将hadoop与hdfs2.7.1.2.4和pig0.15.0.2.4(hortonworkshdp2.4)结合使用,并尝试将google云存储连接器用于spark和hadoop(github上的bigdatainterop)。如果我试着,

  1. hadoop fs -ls gs://bucket-name

但是当我在pig中尝试以下操作时(在mapreduce模式下):

  1. data = LOAD 'gs://softline/o365.avro' USING AvroStorage();
  2. data = STORE data INTO 'gs://softline/o366.avro' USING AvroStorage();

清管器出现以下错误:

  1. org.apache.pig.backend.executionengine.ExecException: ERROR 2118: Wrong FS scheme: hdfs, in path: hdfs://hdp.slweb.ru:8020/user/root, expected scheme: gs
  2. at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.getSplits(PigInputFormat.java:279)
  3. at org.apache.hadoop.mapreduce.JobSubmitter.writeNewSplits(JobSubmitter.java:301)
  4. at org.apache.hadoop.mapreduce.JobSubmitter.writeSplits(JobSubmitter.java:318)
  5. at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:196)
  6. at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1290)
  7. at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1287)
  8. at java.security.AccessController.doPrivileged(Native Method)
  9. at javax.security.auth.Subject.doAs(Subject.java:422)
  10. at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
  11. at org.apache.hadoop.mapreduce.Job.submit(Job.java:1287)
  12. at org.apache.hadoop.mapreduce.lib.jobcontrol.ControlledJob.submit(ControlledJob.java:335)
  13. at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
  14. at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
  15. at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
  16. at java.lang.reflect.Method.invoke(Method.java:497)
  17. at org.apache.pig.backend.hadoop23.PigJobControl.submit(PigJobControl.java:128)
  18. at org.apache.pig.backend.hadoop23.PigJobControl.run(PigJobControl.java:194)
  19. at java.lang.Thread.run(Thread.java:745)
  20. at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher$1.run(MapReduceLauncher.java:276)
  21. Caused by: java.lang.IllegalArgumentException: Wrong FS scheme: hdfs, in path: hdfs://hdp.slweb.ru:8020/user/root, expected scheme: gs
  22. at com.google.cloud.hadoop.fs.gcs.GoogleHadoopFileSystemBase.checkPath(GoogleHadoopFileSystemBase.java:741)
  23. at com.google.cloud.hadoop.fs.gcs.GoogleHadoopFileSystem.checkPath(GoogleHadoopFileSystem.java:90)
  24. at org.apache.hadoop.fs.FileSystem.makeQualified(FileSystem.java:466)
  25. at com.google.cloud.hadoop.fs.gcs.GoogleHadoopFileSystemBase.makeQualified(GoogleHadoopFileSystemBase.java:701)
  26. at com.google.cloud.hadoop.fs.gcs.GoogleHadoopFileSystem.getGcsPath(GoogleHadoopFileSystem.java:163)
  27. at com.google.cloud.hadoop.fs.gcs.GoogleHadoopFileSystemBase.setWorkingDirectory(GoogleHadoopFileSystemBase.java:1094)
  28. at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.getSplits(PigInputFormat.java:235)
  29. ... 18 more

如果需要,我可以发布gc连接器的日志。
有人用过这个接头吗?任何帮助都会受到欢迎。

ig9co6j1

ig9co6j11#

热释光;dr在启动pig作业时显式设置workmapreduce.job.working.dir=/user/root/
如果作业提交期间没有显式设置工作目录,那么hadoop会将该工作目录设置为默认文件系统的工作目录。当使用hdfs作为默认fs时,工作目录通常类似于hdfs://namenode:port/user/<您的用户名>'。
当调用piginputformat#getsplits时,它将获取与其操作的输入路径相关联的文件系统。在本例中,文件系统是googlehadoopfilesystem的一个示例。然后pig检查其输入的路径,如果路径是非本地的,则调用filesystem#setworkingdirectory(job.getworkingdirectory())。这里的问题是作业的工作目录是'hdfs://namenode:port/user/,googlehadoopfilesystem将拒绝将其作为路径设置为自己的工作目录(因为它只支持'gs://'路径)。

相关问题