亚马逊弹性云无法在子网上启动

qpgpyjmq  于 2021-06-03  发布在  Hadoop
关注(0)|答案(2)|浏览(380)

我正在尝试在我们自己的vpc上启动一个ec2集群。我可以在aws中使用命令来启动它,但是如果我指定我们自己的vpc/子网,它将无法启动集群(因此,我们不是在讨论在它上面运行的作业--我们是在讨论启动默认集群本身)。
显然,这一定与sub和aws的hadoop有关(尽管这不是常见的“在主路由表中找不到到到internetgateway的路由”错误)。
我无法从日志中确定原因。这在命令行和使用awsweb控制台上都会发生。
我们没有在集群上执行任何自定义操作/环境。
下面是子网的详细信息

Destination    Target
10.0.0.0/16    local
0.0.0.0/0      igw-2235d249
10.3.0.0/16    eni-b989b091

下面是用于启动的命令行(删除--subnet将允许命令成功,但我们需要它在此专有网络上访问某些特定资源):

elastic-mapreduce --create 
                  --alive
                  --name                 "BMVE on Subnet 0BF3BB23" 
                  --instance-type        m1.medium 
                  --num-instances        3 
                  --key-pair             hadoop 
                  --subnet               subnet-0bf3bb23 
                  --visible-to-all-users true

master.log文件:

2014-03-31 18:24:48,848 INFO i-3e4ce71d: new instance started
2014-03-31 18:24:49,920 INFO i-3e4ce71d: bootstrap action 1 completed
2014-03-31 18:35:40,352 ERROR i-3e4ce71d: failed to start. hadoop JobTracker/NameNode process failed to launch.

1/控制器日志:

2014-03-31T18:24:48.849Z INFO Fetching file 's3://elasticmapreduce/bootstrap-actions/configure-hadoop'
2014-03-31T18:24:49.408Z INFO Working dir /mnt/var/lib/bootstrap-actions/1
2014-03-31T18:24:49.408Z INFO Executing /mnt/var/lib/bootstrap-actions/1/configure-hadoop --site-key-value io.file.buffer.size=65536
2014-03-31T18:24:49.917Z INFO Execution ended with ret val 0
2014-03-31T18:24:49.918Z INFO Execution succeeded

1/标准日志:
1/系统日志:

Processing default file /home/hadoop/conf/hadoop-site.xml with overwrite io.file.buffer.size=65536
/home/hadoop/conf/hadoop-site.xml does not exist, assuming empty configuration
'io.file.buffer.size': default does not have key, appending value '65536'
Saved /home/hadoop/conf/hadoop-site.xml with overwrites. Original saved to /home/hadoop/conf/hadoop-site.xml.old

守护进程jobtacker日志(已筛选警告|错误):

2014-03-31 18:25:00,906 WARN org.apache.hadoop.metrics2.impl.MetricsSystemImpl (main): Source name ugi already exists!
. . . 
2014-03-31 18:25:08,059 WARN org.apache.hadoop.hdfs.DFSClient (Thread-18): DataStreamer Exception: org.apache.hadoop.ipc.RemoteException: java.io.IOException: File /mnt/var/lib/hadoop/tmp/mapred/system/jobtracker.info could only be replicated to 0 nodes, instead of 1
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1569)
. . . 
2014-03-31 18:25:08,059 WARN org.apache.hadoop.hdfs.DFSClient (Thread-18): Error Recovery for block null bad datanode[0] nodes == null
2014-03-31 18:25:08,060 WARN org.apache.hadoop.hdfs.DFSClient (Thread-18): Could not get block locations. Source file "/mnt/var/lib/hadoop/tmp/mapred/system/jobtracker.info" - Aborting...
2014-03-31 18:25:08,060 WARN org.apache.hadoop.mapred.JobTracker (main): Writing to file hdfs://10.0.7.65:9000/mnt/var/lib/hadoop/tmp/mapred/system/jobtracker.info failed!
2014-03-31 18:25:08,060 WARN org.apache.hadoop.mapred.JobTracker (main): FileSystem is not ready yet!
2014-03-31 18:25:08,084 WARN org.apache.hadoop.mapred.JobTracker (main): Failed to initialize recovery manager. 
. . .
2014-03-31 18:35:32,239 WARN org.apache.hadoop.hdfs.DFSClient (Thread-125): Error Recovery for block null bad datanode[0] nodes == null
2014-03-31 18:35:32,239 WARN org.apache.hadoop.hdfs.DFSClient (Thread-125): Could not get block locations. Source file "/mnt/var/lib/hadoop/tmp/mapred/system/jobtracker.info" - Aborting...
2014-03-31 18:35:32,239 WARN org.apache.hadoop.mapred.JobTracker (main): Writing to file hdfs://10.0.7.65:9000/mnt/var/lib/hadoop/tmp/mapred/system/jobtracker.info failed!
2014-03-31 18:35:32,239 WARN org.apache.hadoop.mapred.JobTracker (main): FileSystem is not ready yet!
2014-03-31 18:35:32,244 WARN org.apache.hadoop.mapred.JobTracker (main): Failed to initialize recovery manager. 
    org.apache.hadoop.ipc.RemoteException: java.io.IOException: File /mnt/var/lib/hadoop/tmp/mapred/system/jobtracker.info could only be replicated to 0 nodes, instead of 1
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1569)

守护程序namenode日志(再次筛选):

2014-03-31 18:25:07,693 INFO org.apache.hadoop.security.ShellBasedUnixGroupsMapping (IPC Server handler 1 on 9000): add hadoop to shell userGroupsCache
2014-03-31 18:25:08,042 ERROR org.apache.hadoop.security.UserGroupInformation (IPC Server handler 11 on 9000): PriviledgedActionException as:hadoop cause:java.io.IOException: File /mnt/var/lib/hadoop/tmp/mapred/system/jobtracker.info could only be replicated to 0 nodes, instead of 1
2014-03-31 18:25:08,043 INFO org.apache.hadoop.ipc.Server (IPC Server handler 11 on 9000): IPC Server handler 11 on 9000, call addBlock(/mnt/var/lib/hadoop/tmp/mapred/system/jobtracker.info, DFSClient_678715989, null) from 10.0.7.65:36607: error: java.io.IOException: File /mnt/var/lib/hadoop/tmp/mapred/system/jobtracker.info could only be replicated to 0 nodes, instead of 1
    java.io.IOException: File /mnt/var/lib/hadoop/tmp/mapred/system/jobtracker.info could only be replicated to 0 nodes, instead of 1

任何协助都将不胜感激。

ldxq2e6h

ldxq2e6h1#

这似乎与我们公司专有网络的dns性质有关——我们不得不创建一个额外的专有网络,然后将数据库资源克隆到其中(不知道为什么——我对专有网络管理员的访问受到限制,所以我相信管理员所说的话)。
上面的错误是相当迟钝的,所以希望==dns问题能帮助其他人。
一些参考资料:
http://docs.aws.amazon.com/elasticmapreduce/latest/developerguide/emr-troubleshoot-error-vpc.html#emr-dhcp错误疑难解答
http://docs.aws.amazon.com/amazonvpc/latest/userguide/vpc-dns.html
vpc上的hadoop要求vpc的dhcp选项配置为默认ec2设置,例如“使用amazon dns服务器”和“在dns中注册主机”。如果不使用amazondns服务器,hadoop集群将无法相互联系,启动集群将失败。这与我们通过dhcp选项推送自定义dns服务器信息的专有网络设置不兼容。

ykejflvf

ykejflvf2#

对我来说很好。在vpc中,您可以尝试将 Route Table 到vpc中的子网:

相关问题