master,regionserver相关的问题
 
master日志:
2013-01-21 15:17:40,661 DEBUG org.apache.hadoop.hbase.master.SplitLogManager: total tasks = 2 unassigned = 2
2013-01-21 15:17:44,669 DEBUG org.apache.hadoop.hbase.master.SplitLogManager: total tasks = 2 unassigned = 2
2013-01-21 15:17:45,671 DEBUG org.apache.hadoop.hbase.master.SplitLogManager: total tasks = 2 unassigned = 2
2013-01-21 15:17:46,446 INFO org.apache.hadoop.hbase.master.ServerManager: Registering server=hadoop1,60020,1358752666030
2013-01-21 15:17:46,673 DEBUG org.apache.hadoop.hbase.master.SplitLogManager: total tasks = 2 unassigned = 2
2013-01-21 15:17:47,675 DEBUG org.apache.hadoop.hbase.master.SplitLogManager: total tasks = 2 unassigned = 2
2013-01-21 15:17:47,990 WARN org.apache.hadoop.conf.Configuration: fs.default.name is deprecated. Instead, use fs.defaultFS
2013-01-21 15:17:48,677 DEBUG org.apache.hadoop.hbase.master.SplitLogManager: total tasks = 2 unassigned = 2
2013-01-21 15:17:52,685 DEBUG org.apache.hadoop.hbase.master.SplitLogManager: total tasks = 2 unassigned = 2
        at org.apache.hadoop.hbase.regionserver.HRegionServer.getRegion(HRegionServer.java:2859)
        at org.apache.hadoop.hbase.regionserver.HRegionServer.getClosestRowBefore(HRegionServer.java:1764)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:601)
        at org.apache.hadoop.hbase.ipc.WritableRpcEngine$Server.call(WritableRpcEngine.java:364)
        at org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1345)
2013-01-21 15:17:53,687 DEBUG org.apache.hadoop.hbase.master.SplitLogManager: total tasks = 2 unassigned = 2
2013-01-21 15:17:54,690 DEBUG org.apache.hadoop.hbase.master.SplitLogManager: total tasks = 2 unassigned = 2
2013-01-21 15:17:55,692 DEBUG org.apache.hadoop.hbase.master.SplitLogManager: total tasks = 2 unassigned = 2
2013-01-21 15:17:56,531 WARN org.apache.hadoop.conf.Configuration: fs.default.name is deprecated. Instead, use fs.defaultFS
2013-01-21 15:17:56,694 DEBUG org.apache.hadoop.hbase.master.SplitLogManager: total tasks = 2 unassigned = 2
2013-01-21 15:17:57,696 DEBUG org.apache.hadoop.hbase.master.SplitLogManager: total tasks = 2 unassigned = 2
2013-01-21 15:17:58,698 DEBUG org.apache.hadoop.hbase.master.SplitLogManager: total tasks = 2 unassigned = 2
2013-01-21 15:17:59,637 WARN org.apache.hadoop.conf.Configuration: fs.default.name is deprecated. Instead, use fs.defaultFS
2013-01-21 15:17:59,700 DEBUG org.apache.hadoop.hbase.master.SplitLogManager: total tasks = 2 unassigned = 2
2013-01-21 15:18:00,610 WARN org.apache.hadoop.conf.Configuration: fs.default.name is deprecated. Instead, use fs.defaultFS
2013-01-21 15:18:00,702 DEBUG org.apache.hadoop.hbase.master.SplitLogManager: total tasks = 2 unassigned = 2
2013-01-21 15:18:01,704 DEBUG org.apache.hadoop.hbase.master.SplitLogManager: total tasks = 2 unassigned = 2
2013-01-21 15:18:02,706 DEBUG org.apache.hadoop.hbase.master.SplitLogManager: total tasks = 2 unassigned = 2
2013-01-21 15:18:05,712 DEBUG org.apache.hadoop.hbase.master.SplitLogManager: total tasks = 2 unassigned = 2
2013-01-21 15:18:20,742 DEBUG org.apache.hadoop.hbase.master.SplitLogManager: total tasks = 2 unassigned = 2
2013-01-21 15:18:21,744 DEBUG org.apache.hadoop.hbase.master.SplitLogManager: total tasks = 2 unassigned = 2
org.apache.zookeeper.KeeperException$SessionExpiredException: KeeperErrorCode = Session expired
        at org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.connectionEvent(ZooKeeperWatcher.java:374)
        at org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.process(ZooKeeperWatcher.java:271)
        at org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:521)
        at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:497)
2013-01-21 15:18:21,922 INFO org.apache.hadoop.hbase.master.HMaster: Aborting
2013-01-21 15:18:22,714 DEBUG org.apache.hadoop.hbase.master.HMaster: Stopping service threads
2013-01-21 15:18:22,714 INFO org.apache.hadoop.hbase.master.HMaster$2: hadoop1,60000,1358570634492-BalancerChore exiting
2013-01-21 15:18:22,714 INFO org.apache.hadoop.ipc.HBaseServer: Stopping server on 60000
2013-01-21 15:18:22,714 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server handler 0 on 60000: exiting
2013-01-21 15:18:22,716 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server handler 13 on 60000: exiting
2013-01-21 15:18:22,715 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server handler 8 on 60000: exiting
2013-01-21 15:18:22,716 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server handler 20 on 60000: exiting
2013-01-21 15:18:22,716 WARN org.apache.hadoop.hbase.master.CatalogJanitor: Failed scan of catalog table
java.io.IOException: Giving up after tries=1
        at org.apache.hadoop.hbase.client.HTable$ClientScanner.nextScanner(HTable.java:1252)
        at org.apache.hadoop.hbase.client.HTable$ClientScanner.initialize(HTable.java:1175)
        at org.apache.hadoop.hbase.client.HTable.getScanner(HTable.java:676)
        at org.apache.hadoop.hbase.catalog.MetaReader.fullScan(MetaReader.java:702)
        at org.apache.hadoop.hbase.catalog.MetaReader.fullScan(MetaReader.java:183)
        at org.apache.hadoop.hbase.catalog.MetaReader.fullScan(MetaReader.java:168)
        at org.apache.hadoop.hbase.master.CatalogJanitor.getSplitParents(CatalogJanitor.java:125)
        at org.apache.hadoop.hbase.master.CatalogJanitor.scan(CatalogJanitor.java:136)
        at org.apache.hadoop.hbase.master.CatalogJanitor.chore(CatalogJanitor.java:94)
        at org.apache.hadoop.hbase.Chore.run(Chore.java:67)
        at java.lang.Thread.run(Thread.java:722)
Caused by: java.lang.InterruptedException: sleep interrupted
        at java.lang.Thread.sleep(Native Method)
        ... 11 more
2013-01-21 15:18:22,716 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server handler 21 on 60000: exiting
2013-01-21 15:18:22,716 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server handler 17 on 60000: exiting
2013-01-21 15:18:22,716 INFO org.apache.hadoop.ipc.HBaseServer: REPL IPC Server handler 2 on 60000: exiting
2013-01-21 15:18:22,715 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server handler 15 on 60000: exiting
2013-01-21 15:18:22,715 INFO org.apache.hadoop.ipc.HBaseServer: REPL IPC Server handler 1 on 60000: exiting
2013-01-21 15:18:22,715 INFO org.apache.hadoop.ipc.HBaseServer: Stopping IPC Server listener on 60000
2013-01-21 15:18:22,715 INFO org.apache.hadoop.ipc.HBaseServer: REPL IPC Server handler 0 on 60000: exiting
2013-01-21 15:18:22,722 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server handler 60 on 60000: exiting
2013-01-21 15:18:22,717 INFO org.apache.hadoop.hbase.master.HMaster: Stopping infoServer
2013-01-21 15:18:22,717 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server handler 25 on 60000: exiting
2013-01-21 15:18:22,717 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server handler 23 on 60000: exiting
2013-01-21 15:18:22,717 INFO org.apache.hadoop.ipc.HBaseServer: Stopping IPC Server Responder
2013-01-21 15:18:22,716 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server handler 22 on 60000: exiting
2013-01-21 15:18:22,719 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server handler 51 on 60000: exiting
java.io.IOException: failed log splitting for hadoop1,60020,1358570636640, will retry
        at org.apache.hadoop.hbase.master.handler.ServerShutdownHandler.process(ServerShutdownHandler.java:180)
        at org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:169)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
        at java.lang.Thread.run(Thread.java:722)
Caused by: java.io.IOException: duplicate log split scheduled for hdfs://mycluster/hbase/.logs/hadoop1,60020,1358570636640-splitting/hadoop1%2C60020%2C1358570636640.1358749721768
        at org.apache.hadoop.hbase.master.SplitLogManager.splitLogDistributed(SplitLogManager.java:259)
        at org.apache.hadoop.hbase.master.MasterFileSystem.splitLog(MasterFileSystem.java:277)
        at org.apache.hadoop.hbase.master.MasterFileSystem.splitLog(MasterFileSystem.java:245)
        at org.apache.hadoop.hbase.master.handler.ServerShutdownHandler.process(ServerShutdownHandler.java:176)
        ... 4 more
2013-01-21 15:18:22,847 DEBUG org.apache.hadoop.hbase.catalog.CatalogTracker: Stopping catalog tracker org.apache.hadoop.hbase.catalog.CatalogTracker@611d341a
2013-01-21 15:18:22,853 INFO org.apache.zookeeper.ZooKeeper: Session: 0x23c5bf22ef10005 closed
2013-01-21 15:18:22,853 INFO org.apache.zookeeper.ClientCnxn: EventThread shut down
2013-01-21 15:18:22,853 INFO org.apache.hadoop.hbase.master.HMaster: HMaster main thread exiting
2013-01-21 15:18:22,854 ERROR org.apache.hadoop.hbase.master.HMasterCommandLine: Failed to start master
java.lang.RuntimeException: HMaster Aborted
        at org.apache.hadoop.hbase.master.HMasterCommandLine.startMaster(HMasterCommandLine.java:154)
        at org.apache.hadoop.hbase.master.HMasterCommandLine.run(HMasterCommandLine.java:103)
        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
        at org.apache.hadoop.hbase.util.ServerCommandLine.doMain(ServerCommandLine.java:76)
        at org.apache.hadoop.hbase.master.HMaster.main(HMaster.java:1684)
 
 
regionserver日志:
java.io.IOException: Connection reset by peer
        at sun.nio.ch.FileDispatcherImpl.read0(Native Method)
        at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39)
        at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:218)
        at sun.nio.ch.IOUtil.read(IOUtil.java:191)
        at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:359)
2013-01-21 15:42:09,881 WARN org.apache.hadoop.ipc.HBaseServer: IPC Server listener on 60020: readAndProcess threw exception java.io.IOException: Connection reset by peer. Count of bytes read: 0
java.io.IOException: Connection reset by peer
        at sun.nio.ch.FileDispatcherImpl.read0(Native Method)
        at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39)
        at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:218)
        at sun.nio.ch.IOUtil.read(IOUtil.java:191)
        at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:359)
        at org.apache.hadoop.hbase.ipc.HBaseServer.channelRead(HBaseServer.java:1698)
        at org.apache.hadoop.hbase.ipc.HBaseServer$Connection.readAndProcess(HBaseServer.java:1136)
        at org.apache.hadoop.hbase.ipc.HBaseServer$Listener.doRead(HBaseServer.java:719)
        at org.apache.hadoop.hbase.ipc.HBaseServer$Listener$Reader.doRunLoop(HBaseServer.java:511)
        at org.apache.hadoop.hbase.ipc.HBaseServer$Listener$Reader.run(HBaseServer.java:486)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
        at java.lang.Thread.run(Thread.java:722)
2013-01-21 15:42:13,690 WARN org.apache.hadoop.ipc.HBaseServer: IPC Server listener on 60020: readAndProcess threw exception java.io.IOException: Connection reset by peer. Count of bytes read: 0
java.io.IOException: Connection reset by peer
 
 
 
 
hbase客户端每次和regionserver交互的时候,都会在服务器端生成一个租约(Lease),租约的有效期由参数hbase.regionserver.lease.period确定。
客户端去regionserver取 数据的时候,hbase中存得数据量很大并且很多region的时候的,客户端请求的region不在内存中,或是没有被cache住,需要从磁盘中加 载,如果这时候加载需要的时间超过hbase.regionserver.lease.period所配置的时间,并且客户端没有和 regionserver报告其还活着,那么regionserver就会认为本次租约已经过期,并从LeaseQueue从删除掉本次租约,当 regionserver加载完成后,拿已经被删除的租约再去取数据的时候,就会出现如上的错误现象。
 

解 决的办法:

1、适当的增大 hbase.regionserver.lease.period参数的值,默认是1分钟
2、增大regionserver的cache大小

hbase.regionserver.lease.period

regionserer租约时间,默认值是60s,也有点小,如果你的生产环境中,在执行一些任务时,如mapred时出现lease超时的报错,那这个时候就需要去调大这个值了

hfile.block.cache.size

regionserver cache的大小,默认是0.2,是整个堆内存的多少比例作为regionserver的cache,调大该值会提升查询性能,当然也不能过大,如果你的 hbase都大量的查询,写入不是很多的话,调到0.5也就够了,说到这个值,有一个地方需要说明一下,如果生产环境有mapred任务去scan hbase的时候,一些要在mapred scan类中加一个scan.setCacheBlocks(false),避免由于mapred使用regionserver的cache都被替换,造 成hbase的查询性能明显下降