master,regionserver相关的问题
master日志:
2013-01-21 15:17:40,661 DEBUG org.apache.hadoop.hbase.master.SplitLogManager: total tasks = 2 unassigned = 2 2013-01-21 15:17:44,669 DEBUG org.apache.hadoop.hbase.master.SplitLogManager: total tasks = 2 unassigned = 2 2013-01-21 15:17:45,671 DEBUG org.apache.hadoop.hbase.master.SplitLogManager: total tasks = 2 unassigned = 2 2013-01-21 15:17:46,446 INFO org.apache.hadoop.hbase.master.ServerManager: Registering server=hadoop1,60020,1358752666030 2013-01-21 15:17:46,673 DEBUG org.apache.hadoop.hbase.master.SplitLogManager: total tasks = 2 unassigned = 2 2013-01-21 15:17:47,675 DEBUG org.apache.hadoop.hbase.master.SplitLogManager: total tasks = 2 unassigned = 2 2013-01-21 15:17:47,990 WARN org.apache.hadoop.conf.Configuration: fs.default.name is deprecated. Instead, use fs.defaultFS 2013-01-21 15:17:48,677 DEBUG org.apache.hadoop.hbase.master.SplitLogManager: total tasks = 2 unassigned = 2 2013-01-21 15:17:52,685 DEBUG org.apache.hadoop.hbase.master.SplitLogManager: total tasks = 2 unassigned = 2 at org.apache.hadoop.hbase.regionserver.HRegionServer.getRegion(HRegionServer.java:2859) at org.apache.hadoop.hbase.regionserver.HRegionServer.getClosestRowBefore(HRegionServer.java:1764) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:601) at org.apache.hadoop.hbase.ipc.WritableRpcEngine$Server.call(WritableRpcEngine.java:364) at org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1345) 2013-01-21 15:17:53,687 DEBUG org.apache.hadoop.hbase.master.SplitLogManager: total tasks = 2 unassigned = 2 2013-01-21 15:17:54,690 DEBUG org.apache.hadoop.hbase.master.SplitLogManager: total tasks = 2 unassigned = 2 2013-01-21 15:17:55,692 DEBUG org.apache.hadoop.hbase.master.SplitLogManager: total tasks = 2 unassigned = 2 2013-01-21 15:17:56,531 WARN org.apache.hadoop.conf.Configuration: fs.default.name is deprecated. Instead, use fs.defaultFS 2013-01-21 15:17:56,694 DEBUG org.apache.hadoop.hbase.master.SplitLogManager: total tasks = 2 unassigned = 2 2013-01-21 15:17:57,696 DEBUG org.apache.hadoop.hbase.master.SplitLogManager: total tasks = 2 unassigned = 2 2013-01-21 15:17:58,698 DEBUG org.apache.hadoop.hbase.master.SplitLogManager: total tasks = 2 unassigned = 2 2013-01-21 15:17:59,637 WARN org.apache.hadoop.conf.Configuration: fs.default.name is deprecated. Instead, use fs.defaultFS 2013-01-21 15:17:59,700 DEBUG org.apache.hadoop.hbase.master.SplitLogManager: total tasks = 2 unassigned = 2 2013-01-21 15:18:00,610 WARN org.apache.hadoop.conf.Configuration: fs.default.name is deprecated. Instead, use fs.defaultFS 2013-01-21 15:18:00,702 DEBUG org.apache.hadoop.hbase.master.SplitLogManager: total tasks = 2 unassigned = 2 2013-01-21 15:18:01,704 DEBUG org.apache.hadoop.hbase.master.SplitLogManager: total tasks = 2 unassigned = 2 2013-01-21 15:18:02,706 DEBUG org.apache.hadoop.hbase.master.SplitLogManager: total tasks = 2 unassigned = 2 2013-01-21 15:18:05,712 DEBUG org.apache.hadoop.hbase.master.SplitLogManager: total tasks = 2 unassigned = 2 2013-01-21 15:18:20,742 DEBUG org.apache.hadoop.hbase.master.SplitLogManager: total tasks = 2 unassigned = 2 2013-01-21 15:18:21,744 DEBUG org.apache.hadoop.hbase.master.SplitLogManager: total tasks = 2 unassigned = 2 org.apache.zookeeper.KeeperException$SessionExpiredException: KeeperErrorCode = Session expired at org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.connectionEvent(ZooKeeperWatcher.java:374) at org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.process(ZooKeeperWatcher.java:271) at org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:521) at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:497) 2013-01-21 15:18:21,922 INFO org.apache.hadoop.hbase.master.HMaster: Aborting 2013-01-21 15:18:22,714 DEBUG org.apache.hadoop.hbase.master.HMaster: Stopping service threads 2013-01-21 15:18:22,714 INFO org.apache.hadoop.hbase.master.HMaster$2: hadoop1,60000,1358570634492-BalancerChore exiting 2013-01-21 15:18:22,714 INFO org.apache.hadoop.ipc.HBaseServer: Stopping server on 60000 2013-01-21 15:18:22,714 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server handler 0 on 60000: exiting 2013-01-21 15:18:22,716 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server handler 13 on 60000: exiting 2013-01-21 15:18:22,715 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server handler 8 on 60000: exiting 2013-01-21 15:18:22,716 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server handler 20 on 60000: exiting 2013-01-21 15:18:22,716 WARN org.apache.hadoop.hbase.master.CatalogJanitor: Failed scan of catalog table java.io.IOException: Giving up after tries=1 at org.apache.hadoop.hbase.client.HTable$ClientScanner.nextScanner(HTable.java:1252) at org.apache.hadoop.hbase.client.HTable$ClientScanner.initialize(HTable.java:1175) at org.apache.hadoop.hbase.client.HTable.getScanner(HTable.java:676) at org.apache.hadoop.hbase.catalog.MetaReader.fullScan(MetaReader.java:702) at org.apache.hadoop.hbase.catalog.MetaReader.fullScan(MetaReader.java:183) at org.apache.hadoop.hbase.catalog.MetaReader.fullScan(MetaReader.java:168) at org.apache.hadoop.hbase.master.CatalogJanitor.getSplitParents(CatalogJanitor.java:125) at org.apache.hadoop.hbase.master.CatalogJanitor.scan(CatalogJanitor.java:136) at org.apache.hadoop.hbase.master.CatalogJanitor.chore(CatalogJanitor.java:94) at org.apache.hadoop.hbase.Chore.run(Chore.java:67) at java.lang.Thread.run(Thread.java:722) Caused by: java.lang.InterruptedException: sleep interrupted at java.lang.Thread.sleep(Native Method) ... 11 more 2013-01-21 15:18:22,716 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server handler 21 on 60000: exiting 2013-01-21 15:18:22,716 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server handler 17 on 60000: exiting 2013-01-21 15:18:22,716 INFO org.apache.hadoop.ipc.HBaseServer: REPL IPC Server handler 2 on 60000: exiting 2013-01-21 15:18:22,715 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server handler 15 on 60000: exiting 2013-01-21 15:18:22,715 INFO org.apache.hadoop.ipc.HBaseServer: REPL IPC Server handler 1 on 60000: exiting 2013-01-21 15:18:22,715 INFO org.apache.hadoop.ipc.HBaseServer: Stopping IPC Server listener on 60000 2013-01-21 15:18:22,715 INFO org.apache.hadoop.ipc.HBaseServer: REPL IPC Server handler 0 on 60000: exiting 2013-01-21 15:18:22,722 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server handler 60 on 60000: exiting 2013-01-21 15:18:22,717 INFO org.apache.hadoop.hbase.master.HMaster: Stopping infoServer 2013-01-21 15:18:22,717 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server handler 25 on 60000: exiting 2013-01-21 15:18:22,717 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server handler 23 on 60000: exiting 2013-01-21 15:18:22,717 INFO org.apache.hadoop.ipc.HBaseServer: Stopping IPC Server Responder 2013-01-21 15:18:22,716 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server handler 22 on 60000: exiting 2013-01-21 15:18:22,719 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server handler 51 on 60000: exiting java.io.IOException: failed log splitting for hadoop1,60020,1358570636640, will retry at org.apache.hadoop.hbase.master.handler.ServerShutdownHandler.process(ServerShutdownHandler.java:180) at org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:169) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603) at java.lang.Thread.run(Thread.java:722) Caused by: java.io.IOException: duplicate log split scheduled for hdfs://mycluster/hbase/.logs/hadoop1,60020,1358570636640-splitting/hadoop1%2C60020%2C1358570636640.1358749721768 at org.apache.hadoop.hbase.master.SplitLogManager.splitLogDistributed(SplitLogManager.java:259) at org.apache.hadoop.hbase.master.MasterFileSystem.splitLog(MasterFileSystem.java:277) at org.apache.hadoop.hbase.master.MasterFileSystem.splitLog(MasterFileSystem.java:245) at org.apache.hadoop.hbase.master.handler.ServerShutdownHandler.process(ServerShutdownHandler.java:176) ... 4 more 2013-01-21 15:18:22,847 DEBUG org.apache.hadoop.hbase.catalog.CatalogTracker: Stopping catalog tracker org.apache.hadoop.hbase.catalog.CatalogTracker@611d341a 2013-01-21 15:18:22,853 INFO org.apache.zookeeper.ZooKeeper: Session: 0x23c5bf22ef10005 closed 2013-01-21 15:18:22,853 INFO org.apache.zookeeper.ClientCnxn: EventThread shut down 2013-01-21 15:18:22,853 INFO org.apache.hadoop.hbase.master.HMaster: HMaster main thread exiting 2013-01-21 15:18:22,854 ERROR org.apache.hadoop.hbase.master.HMasterCommandLine: Failed to start master java.lang.RuntimeException: HMaster Aborted at org.apache.hadoop.hbase.master.HMasterCommandLine.startMaster(HMasterCommandLine.java:154) at org.apache.hadoop.hbase.master.HMasterCommandLine.run(HMasterCommandLine.java:103) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) at org.apache.hadoop.hbase.util.ServerCommandLine.doMain(ServerCommandLine.java:76) at org.apache.hadoop.hbase.master.HMaster.main(HMaster.java:1684)
regionserver日志:
java.io.IOException: Connection reset by peer at sun.nio.ch.FileDispatcherImpl.read0(Native Method) at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39) at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:218) at sun.nio.ch.IOUtil.read(IOUtil.java:191) at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:359) 2013-01-21 15:42:09,881 WARN org.apache.hadoop.ipc.HBaseServer: IPC Server listener on 60020: readAndProcess threw exception java.io.IOException: Connection reset by peer. Count of bytes read: 0 java.io.IOException: Connection reset by peer at sun.nio.ch.FileDispatcherImpl.read0(Native Method) at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39) at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:218) at sun.nio.ch.IOUtil.read(IOUtil.java:191) at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:359) at org.apache.hadoop.hbase.ipc.HBaseServer.channelRead(HBaseServer.java:1698) at org.apache.hadoop.hbase.ipc.HBaseServer$Connection.readAndProcess(HBaseServer.java:1136) at org.apache.hadoop.hbase.ipc.HBaseServer$Listener.doRead(HBaseServer.java:719) at org.apache.hadoop.hbase.ipc.HBaseServer$Listener$Reader.doRunLoop(HBaseServer.java:511) at org.apache.hadoop.hbase.ipc.HBaseServer$Listener$Reader.run(HBaseServer.java:486) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603) at java.lang.Thread.run(Thread.java:722) 2013-01-21 15:42:13,690 WARN org.apache.hadoop.ipc.HBaseServer: IPC Server listener on 60020: readAndProcess threw exception java.io.IOException: Connection reset by peer. Count of bytes read: 0 java.io.IOException: Connection reset by peer
hbase客户端每次和regionserver交互的时候,都会在服务器端生成一个租约(Lease),租约的有效期由参数hbase.regionserver.lease.period确定。
客户端去regionserver取 数据的时候,hbase中存得数据量很大并且很多region的时候的,客户端请求的region不在内存中,或是没有被cache住,需要从磁盘中加 载,如果这时候加载需要的时间超过hbase.regionserver.lease.period所配置的时间,并且客户端没有和 regionserver报告其还活着,那么regionserver就会认为本次租约已经过期,并从LeaseQueue从删除掉本次租约,当 regionserver加载完成后,拿已经被删除的租约再去取数据的时候,就会出现如上的错误现象。
解 决的办法:
1、适当的增大 hbase.regionserver.lease.period参数的值,默认是1分钟
2、增大regionserver的cache大小
hbase.regionserver.lease.period
regionserer租约时间,默认值是60s,也有点小,如果你的生产环境中,在执行一些任务时,如mapred时出现lease超时的报错,那这个时候就需要去调大这个值了
hfile.block.cache.size
regionserver cache的大小,默认是0.2,是整个堆内存的多少比例作为regionserver的cache,调大该值会提升查询性能,当然也不能过大,如果你的 hbase都大量的查询,写入不是很多的话,调到0.5也就够了,说到这个值,有一个地方需要说明一下,如果生产环境有mapred任务去scan hbase的时候,一些要在mapred scan类中加一个scan.setCacheBlocks(false),避免由于mapred使用regionserver的cache都被替换,造 成hbase的查询性能明显下降