我有一个4节点的hadoop,hbase集群,其中1个充当hadoopmaster和hbasemaster。其余三台服务器分别为datanode®ionserver&zookeeper。最近我的一台运行的机器(datanode、regionserver和zookeeper)崩溃了,系统没有重新启动。但是我的hadoop集群仍然运行良好,但是hbase集群显示出问题,因为没有任何区域在线,而且我无法查看hbase表。。注意:节点的名称是hadoopslave3 for command:
在 hbase shell
当我试图创建一个表时,它给出了一个错误:
hbase(main):001:0> create 'USERS','COL-FAMILY1'
ERROR: org.apache.hadoop.hbase.NotAllMetaRegionsOnlineException: org.apache.hadoop.hbase.NotAllMetaRegionsOnlineException: Timed out (10000ms)
对于命令:
$hbase hbck
它显示:
12/08/27 11:10:20 INFO zookeeper.ZooKeeper: Client environment:java.library.path=/usr/lib/hbase/lib/native/Linux-amd64-64
12/08/27 11:10:20 INFO zookeeper.ZooKeeper: Client environment:java.io.tmpdir=/tmp
12/08/27 11:10:20 INFO zookeeper.ZooKeeper: Client environment:java.compiler=<NA>
12/08/27 11:10:20 INFO zookeeper.ZooKeeper: Client environment:os.name=Linux
12/08/27 11:10:20 INFO zookeeper.ZooKeeper: Client environment:os.arch=amd64
12/08/27 11:10:20 INFO zookeeper.ZooKeeper: Client environment:os.version=2.6.32-33-server
12/08/27 11:10:20 INFO zookeeper.ZooKeeper: Client environment:user.name=ocpe
12/08/27 11:10:20 INFO zookeeper.ZooKeeper: Client environment:user.home=/home/ocpe
12/08/27 11:10:20 INFO zookeeper.ZooKeeper: Client environment:user.dir=/var/log/hbase
12/08/27 11:10:20 INFO zookeeper.ZooKeeper: Initiating client connection, connectString=hadoopslave2:2181,hbasemaster:2181,hadoopslave3:2181 sessionTimeout=180000 watcher=hconnection
12/08/27 11:10:20 INFO zookeeper.ClientCnxn: Opening socket connection to server hbasemaster/10.68.210.71:2181
12/08/27 11:10:20 INFO zookeeper.ClientCnxn: Socket connection established to hbasemaster/10.68.210.71:2181, initiating session
12/08/27 11:10:20 INFO zookeeper.ClientCnxn: Session establishment complete on server hbasemaster/10.68.210.71:2181, sessionid = 0x39668f686e0006, negotiated timeout = 180000
Version: 0.90.4-cdh3u3
12/08/27 11:11:21 DEBUG client.HConnectionManager$HConnectionImplementation: Lookedup root region location, connection=org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation@69267649; hsa=null
ERROR: Root Region or some of its attributes are null.
ERROR: Encountered fatal error. Exiting...
主日志显示:
2012-08-27 11:00:41,329 INFO org.apache.zookeeper.ClientCnxn: Opening socket connection to server hadoopslave3/10.68.210.58:2181
2012-08-27 11:00:42,832 WARN org.apache.zookeeper.ClientCnxn: Session 0x0 for server null, unexpected error, closing socket connection and attempting reconnect
java.net.NoRouteToHostException: No route to host
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:567)
at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1146)
.
.
.
.
.
2012-08-27 11:00:43,610 DEBUG org.apache.hadoop.hbase.master.HMaster: Started service threads
2012-08-27 11:00:44,357 INFO org.apache.hadoop.hbase.master.ServerManager: Registering server=hadoopslave2,60020,1346045456923, regionCount=0, userLoad=false
2012-08-27 11:00:45,110 INFO org.apache.hadoop.hbase.master.ServerManager: Waiting on regionserver(s) count to settle; currently=1
2012-08-27 11:00:45,703 INFO org.apache.hadoop.hbase.master.ServerManager: Registering server=hadoopslave1,60020,1346065230110, regionCount=0, userLoad=false
2012-08-27 11:00:46,611 INFO org.apache.hadoop.hbase.master.ServerManager: Waiting on regionserver(s) count to settle; currently=2
2012-08-27 11:00:48,111 INFO org.apache.hadoop.hbase.master.ServerManager: Exiting wait on regionserver(s) to checkin; count=2, stopped=false, count of regions out on cluster=0
2012-08-27 11:00:48,118 INFO org.apache.hadoop.hbase.master.MasterFileSystem: Log folder hdfs://hadoopmaster:54310/hbase2/.logs/hadoopslave1,60020,1346065230110 belongs to an existing region server
2012-08-27 11:00:48,118 INFO org.apache.hadoop.hbase.master.MasterFileSystem: Log folder hdfs://hadoopmaster:54310/hbase2/.logs/hadoopslave2,60020,1346045456923 belongs to an existing region server
2012-08-27 11:00:48,118 INFO org.apache.hadoop.hbase.master.MasterFileSystem: No logs to split
2012-08-27 11:00:48,125 INFO org.apache.hadoop.hbase.master.AssignmentManager: Processing region -ROOT-,,0.70236052 in state RS_ZK_REGION_OPENED
2012-08-27 11:00:48,128 INFO org.apache.hadoop.hbase.master.AssignmentManager: Failed to find hadoopslave2,60020,1345789680868 in list of online servers; skipping registration of open of -ROOT-,,0.70236052
2012-08-27 11:00:48,128 INFO org.apache.hadoop.hbase.master.AssignmentManager: Waiting on 70236052/-ROOT-
2012-08-27 11:00:53,213 INFO org.apache.hadoop.hbase.master.AssignmentManager: Regions in transition timed out: -ROOT-,,0.70236052 state=OPEN, ts=1345790880487, server=hadoopslave2,60020,1345789680868
2012-08-27 11:00:53,213 ERROR org.apache.hadoop.hbase.master.AssignmentManager: Region has been OPEN for too long, we don't know where region was opened so can't do anything
2012-08-27 11:01:43,242 WARN org.apache.hadoop.hbase.master.TimeToLiveLogCleaner: Found a log newer than current time, probably a clock skew
2012-08-27 11:02:43,234 WARN org.apache.hadoop.hbase.master.TimeToLiveLogCleaner: Found a log newer than current time, probably a clock skew
其中一个regionserver(hadoopslave2)日志显示:
2012-08-27 16:30:30,388 INFO org.apache.zookeeper.ZooKeeper: Initiating client connection, connectString=hadoopslave2:2181,hbasemaster:2181,hadoopslave3:2181 sessionTimeout=180000 watcher=regionserver:60020
2012-08-27 16:30:30,418 INFO org.apache.zookeeper.ClientCnxn: Opening socket connection to server hadoopslave3/10.68.210.58:2181
2012-08-27 16:30:30,533 INFO org.apache.hadoop.hbase.regionserver.ShutdownHook: Installed shutdown hook thread: Shutdownhook:regionserver60020
2012-08-27 16:30:33,426 WARN org.apache.zookeeper.ClientCnxn: Session 0x0 for server null, unexpected error, closing socket connection and attempting reconnect
java.net.NoRouteToHostException: No route to host
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:567)
at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1146)
.
.
.
.
.
.
.
2012-08-27 16:30:37,649 INFO org.apache.hadoop.ipc.HBaseServer: PRI IPC Server handler 6 on 60020: starting
2012-08-27 16:30:37,649 INFO org.apache.hadoop.ipc.HBaseServer: PRI IPC Server handler 7 on 60020: starting
2012-08-27 16:30:37,649 INFO org.apache.hadoop.ipc.HBaseServer: PRI IPC Server handler 8 on 60020: starting
2012-08-27 16:30:37,653 INFO org.apache.hadoop.ipc.HBaseServer: PRI IPC Server handler 9 on 60020: starting
2012-08-27 16:30:37,653 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: Serving as hadoopslave1,60020,1346065230110, RPC listening on /10.68.210.54:60020, sessionid=0x139668faf560000
2012-08-27 16:30:37,655 INFO org.apache.hadoop.hbase.regionserver.SplitLogWorker: SplitLogWorker hadoopslave1,60020,1346065230110 starting
2012-08-27 16:30:37,665 INFO org.apache.hadoop.hbase.regionserver.StoreFile: Allocating LruBlockCache with maximum size 199.2m
2012-08-27 16:30:38,211 WARN org.apache.hadoop.hbase.regionserver.wal.HLog: HDFS pipeline error detected. Found 1 replicas but expecting 3 replicas. Requesting close of hlog.
暂无答案!
目前还没有任何答案,快来回答吧!