A new stage of Apache HBase high availability
Apache HBase is an online service oriented database, the characteristics of its native support Hadoop, make it become the data processing based on Hadoop extensibility and flexibility is the obvious choice.
In Hortonworks data platform (HDP) http://zh.hortonworks.com/hdp/ 2.2, the high availability of HBase got rapid progress, to ensure the normal operation of the application to run on time by 99.99%.
This article will review the development process of the past 12 months, showing how developers improve HBase high availability, and discuss the improvement plan for the future.
HBase high availability of historical perspective
High availability (HA) is the key to any database features, is the precondition of any core business application.
Previously, HBase use two strategies to ensure availability of data:
First, HBase will automatically data partition, and will be released each partition to different nodes.A node logoff or outage affects only the data on the node, the data on the other nodes will not be affected.
Second, all of the data is actually stored on the HBase is stored on the HDFS, data is backup into 3 pieces, distribution in different nodes, and that it can be applied to any node in the cluster data.
Making HBase can automatically fail nodes managed data reassigned to the normal nodes, thus ensuring the high availability of data.
If HA characteristics of comprehensive utilization of the natural, and combined with the best practices of Hadoop, made based on the application of HBase high availability can reach 99.9%, the total annual outage time less than nine hours.
It is suitable for most applications, and for core application system, need higher availability guarantee.
Better high availability requirements
We are in a big data applications to reengineering in the early stages of the Hadoop platform.Increasing the penetration and influence of Hadoop, has become the emphasis on system scalability or data processing application flexibility of choice.
For those who want to progress from Hadoop everywhere, rapid innovation benefit of online applications, HBase as a member of the Hadoop ecosystem, natural to become the preferred database.
When we are with some hope will be the key business migration to the HBase client communication, we often received the following feedback, customer needs and provide data consistency HBase, but can't stand even a short outage of recovery time.In order to make the Hadoop can support key business online applications, high availability characteristics of HBase needs to be greatly improved.
Hortonworks with HBase community cooperation, by introducing the timeline consistent copy of regional technology (also known as HBase reading high availability, related references
[https://issues.apache.org/jira/browse/HBASE-10070] HBase - 10070), and greatly improves the high availability of HBase.
Look from the top, the new characteristics of HA in master copy area and area around the HBase cluster to maintain the same data in multiple backup copy.Using HBase reading high availability, if a failure of the RegionServer, users still can be read from the other RegionServer the data on the failed node.
That is to say, during the automatic recovery system, the user just lost the node availability, but can still read the node data.For those who need to continue to read and maintain the consistency of the applications, read high availability characteristics of HBase is an ideal choice.
Combined with the best practices, such as using the double copies and rack perception, HBase reading high availability would make it easier for those who rely on HBase availability up to 99.99% of critical business applications.
What is the timeline consistency?
Hey, it's a very simple way to make ensure the realization of data consistency, there is only one owner's strategy means that there will be no split brain, won't appear the last time to write effective (last - write - wins), and to counter the realization of the function of this kind of important quick and simple.
Never looked on the bright side, if a RegionServer goes down, the RegionServer holds all the key value range will be offline, until the data recovery process is completed.
In HBase 0.96, the recovery process has been optimized to within a minute, however, we still have sacrificed some usability to ensure that the height of the data consistency.According to the theory of CAP, we must consider consistency and availability, compromise and is not a perfect system, can have both consistency and availability.
Many modern database system tries to realize the pure AP model to optimize the availability, by giving up consistency to optimize the availability.Give up consistency makes this kind of database users had to face some complicated issues in distributed systems.Most of the time, eventually consistency of database users are more like a database developer, and not just the database users.
In fact, the network partition problems are not exist all the time, all, no need to prevent the occasional failure at any time at the expense of consistency.If the discussion to this aspect and the consistency of the timeline interested in related content, can read Daniel Abadi blog
HBase reading the high availability of a timeline the consistency of the system, the system used for developers in the query phase selection strict consistency strategy strategy or loose the function.
Use HBase reading high availability:
; Data is a main Region and held by a copy one or more of the Region.
; Any Region (whether Lord Reigon or Region) a copy of the response
on the above data can be read requests.
; Only the main Region can handle the request.
; Copy the data in May and the main Region of inconsistent data,
; All copies are in the same order received update request.
Judging from the client:
; The client what Consistency can be specified in each request use
strategy, strict (Consistency. STRONG) or loose (Consistency. The
; Returns the result of explicitly pointed out that data is the latest
(that is, to autonomous Region) or outdated copy (that is, from the
The client can operate according to the logo.
This model has the following advantages:
; Ensure write consistency:
; During the period of system failure, the data are still readable.Use
double configuration backup and appropriate rack position, HBase
can under the condition of the whole frame of fault, ensure no
downtime data readable.
; Delay: read consistency is still only need back and forth across
; Delay: the client can be read from all copies random data, and USES
the first return of the response.
Timeline both the strong consistency and consistency of graceful degradation when the demand, can be treated without any increase in the developer eventual consistency under the condition of system complexity, higher availability.
HBase reading high availability: 1 / phase 2
HBase reads the availability of development has experienced two stages.The first phase is mainly used for validation of the prototype and API semantics, while the second stage provides a suitable for the production version.If you focus on the HBase HDP2.1 provide read high availability, and since they cannot support the split/merge defects of this kind of operation and believe that education is not available, so HDP2.2 HBase operation against all you expected to provide high availability.
HBase reading availability is a feature of HDP2.2 platform.If you are more interested in to improve the availability of application, so we suggest you to try.
HBaes high availability
HBaes high availability got great improvement in the past year, however, there are still many places need to improve.So far, HBase has yet to solve two important questions:
; When a failure of availability
; Consistency across the data center, speaking, reading and writing
We are very happy to see that HBase community has already started to solve these problems, working hard to Facebook development HydraBase merge into HBase.Future, HBase will ensure key business system of strong data consistency requirements at the same time, provide as much as five 9 (99.999%) of availability.