RebornDB: the next generation of distributed database
Key - Value
The real world there are many Key - Value database, they are widely used in many systems.For example, we can use Memcached store one MySQL database query result set to the same query using follow-up, directing a stored document to get a better query performance and so on.
In view of the different scenarios, we should choose different Key - Value database, there is not a Key - Value applies to all database solution, but if you just want a simple, easy to use, fast and strong support for a variety of data structure is the Key - Value database, Redis may be as you start a very good choice.
Redis is an advanced Key - Value cache and database, which is based on the BSD license.It's fast, supports many data types (String, Hash, List, Set, Sorted Set......), use RDB or AOF persistence and replication to guarantee the security of data, and client library to support multiple languages.
Chose the Redis is the most important market, many companies are using Redis and it prove their value.
While redis is pretty good, it still has some disadvantages, the biggest drawback is the memory limit, redis will all data residing memory, this limits the size of the whole data set, we can't hold more data.
The official R edis cluster through the data to multiple Redis server to solve this problem, but this method has not proven in many practical environment.At the same time, it requires us to change our client library to support version redirection "and other special commands, which produced in the running environment is unacceptable.Now it seems so, Redis cluster is not a good solution.
We like R edis, and hope to transcend the limitations of it, so we created a service called QDB, it is compatible with Redis, the data stored in the disk to cross the memory size limit and the hot spot in data stored in memory to improve performance.
QDBIs a similar R edis quick Key - Value database, it has the following advantages:
; Compatible with Redis: if you are familiar with Redis, you can easily use QDB, it supports most
Redis commands and data structure (String, Hash, List, Set, Sorted Set, etc.);
; The data stored in the disk (beyond hot data memory size limit) can be saved in the memory, using
the back-end storage;
; Support for multiple back-end storage: you can choose RocksDB, LevelDB or GoLevelDB (later
on, we will use RocksDBs as example);
; And Redis two-way synchronous: we can as a slave node from Redis synchronous data, data
replication can also act as a master node to Redis.
The back-end storage
Use QDB LevelDB, RocksDB, GoLevelDB as the back-end storage.The store has an excellent fast reading and writing are based on performanceLog the merger of the tree structureTree (LSM), at the same time, they all use bloom filter and the LRU cache (LRU: minimum of using the page replacement algorithm) to improve the performance of reading.
LevelDB is developed by Google, the earliest version of RocksDB is maintained by Facebook, an optimized version of GoLevelDB is a pure LevelDB implementation with the GO language.If you just want a quick test and don't want to build and install RocksDB or LevelDB, you can use GoLevelDB directly, but we do not recommend you to use in a production environment, because it's performance is bad.
LevelDB and RocksDB for your production environment is very good, but given the RocksDB excellent performance, we prefer RocksDB, later we will only support RocksDB and GoLevelDB, one for the production environment, the other for testing and test environment.
QDB is great, we are able to store huge data on a machine, and get a good read and write performance, but as the growth of the data set, we still face such a problem, that is: we can't all data is stored in a machine.At the same time, the QDB server will become a bottleneck and single point of failure risks.
Now we have to think about clustering solution.
RebornDBIs an agent based distributed Redis cluster solution.It's a bit liketwemproxy, an almost is the earliest and most famous Redis cluster solution based on agent.
But twemproxy has its own problems, it only supports static cluster topology, so we can't dynamically add or delete data to segmentation redis node.Twemproxy if we run a lot and want to add a node after the Redis, another problem is how to make all the twemproxy security update configuration, which will increase the complexity of IT operations.At the same time, Twitter (company) is developing twemproxy already give up and no longer applied to the production environment.
Unlike twemproxy, RebornDB has a killer: dynamic segmentation data sets, it will be very useful, especially in your data set is growing rapidly, you have to add more storage nodes to expand the cluster.All in all, RebornDB will be transparent to shard data without affecting the current running services.
We might think RebornDB as a black box, like a single node Redis server with any existing Redis client and its communication.The image below shows the RebornDB architecture.
RebornDB has the following components: reborn - proxy, backend store, coordinator, reborn - config, and reborn - agent.
Reborn - proxy for the client to provide a single external services.Any redis client can connect any reborn - proxy and run the command.
Reborn - proxy with RESP analysis from the client orders, the distribution to the corresponding back-end storage, receives the back-end storage reply and returned to the client.
Reborn - proxy is stateless, meaning that you can easily scale-out redis - proxy to handle more service requests.
We can have many Reborn - proxy, in a distributed system design how to let the client found that they are another theme, but we are not going to delve into this problem here, some practical way is to use DNS, the LVS, HAProxy and so on.
; The back-end storage (Backend store)
The back-end storage is reborn - server (a modified version of redis) or QDB.We introduce a concept called group (group) to manage one or multiple back-end storage.A group (group) must have a master node and zero, one or more from the node form replication topology.
We divided the whole data set into 1024 slots (we use hash (key) to determine the key belongs to which slot), and to save different slot into different groups.If you want to split the data, you can add a new group and let RebornDB from anotherGroup movedMove all the data to the new group a slot.
We can also make different groups using different back-end storage.For example: hot group1, we hope to save data, cold group2 to save large amounts of data, then we can use the reborn - server group1, QDB constitute group2. Reborn - a lot faster than QDB server, so we can ensure hot data read and write performance.
; The Coordinator (Coordinator)
We use a zookeeper or etcd as coordination server, when we need to do some write operation, such as segmentation, failover, and so on, they will coordinate all of the services.
All RebornDB information was stored in the coordinator, the key routing rules, for example, reborn - proxy can according to it will command the right distribution to the backend storage.
Reborn - config is a management tool, we can use it to add or remove groups, such as storage in add or remove group, migrate data from a group to another group, etc.
If we want to change RebornDB clusters of information, you must use the Reborn - config.For example: we can't directly use "SLAVE NO ONE" command to the backend storage ascending master node, and must use the "reborn - config server promote groupid server," we must not only change the replication topology structure inside the group, and to update the information in the coordinator, and these only reborn - config can do it.
Reborn - config also provides a web service, so you can easily manage RebornDB, if you need more control, you can use it to HTTP RESTFUL API.
Reborn - agent is a highly available components.You can use it to start and stop the application (reborn - config, QDB - server, reborn - server, reborn - proxy).We will be discussed at length in the next part high availability.
To cut (Resharding)
RebornDB support dynamic segmentation data again.How do we do?
As we have said above, we divide the whole data set into 1024 slots, and to save different slot into different groups.When we added a group, we will put some slots from the old to migrate to the new group.In the segmentation process we will it's called migration.In the least of RebornDB migration unit is slot.
Let's start from a simple example of the following:
We have two groups and group1 has two slots 1 and 2, there are three slot group2 3, 4, 5. Now group2 workload is bigger, we will increase the group3 and slot5 migration into it.
We can use the following command to slot5 from group2 migrated to group3. reborn-config slot migrate 5 2 3
(note: the original is reborn - config slot migrate 5 5 3)
This command seems very simple, but internally we need to do a lot of work to ensure the safety of the migration.We must use two-phase commit protocol (2 PCS) to tell reborn - proxy we will put slot5 from group2 migrated to group3.After all the reborn - proxy to confirm and reply, we will start to transfer operation.
The migration process is simple: get a key from the slot5, from group2 transfer its data to group3, then remove the key in group2, so cycle.Finally group2 is no slot5 data in all data slot5 in group3.
Key is atomic migration, so no matter whether this key before in group2 or group3, we can determine by performing the migration command it is in the group3.
If there is no data in group2 belongs to slot5, we will stop the migration, the topology structure looks like the following:
High Availability (High the Availability)
RebornDB use reborn - agent to provide HA solution.
Reborn - agent to check it every moment in the start application is active, if reborn - agent found an application to hang out, it will restart the application.
Reborn - agent is a bit like a manager, but it has more features.
Reborn - agent provide convenient HTTP Restful API we add or remove need to be dynamically monitoring application.For example, we can use the HTTP/API/start_redis API to start a new reborn - server, or "/ API/start_proxy" API to start a new reborn - proxy, we can also use "/ API/stop" to stop a running application and delete it from the current monitoring list.
Reborn - agent applied to the local application of monitoring, not only applies to the backend storage HA.Multiple Reborn - agent will be the first through the coordinator to select a primary Reborn - agent, whether it will continue to check the backend storage is active, if it is found that the backend storage goes down, it will failover.If the downtime backend storage is a node, reborn - agent will only in the heart of the coordinator it is set to offline, but if it is the master node, reborn - agent will from existing by selecting one to serve as a master node from the node and failover operation.
Is going to do...
Despite many great RebornDB features, we still need more work to further improve it, we can follow-up to do these things:
; A better user experience: now run RebornDB is not so easy, we're going to do a series of work such
as initialization slots, add services to the group, the distribution of slots to a group, etc., in the
future work, how to reduce the use of user threshold is we have to think about the problem;
; Copy migration: now we migration is one key migration slot, if a slot contains many data speed is
not fast, using replication migration may be much better.In the example above, the group2 first
create a snapshot, group3 can all slot5 data at that point in time, after group3 will change from
group2 incremental synchronization of data.When we found group3 made group2 slot5 all changes
in the data, we will have to switch, and deleted from group2 slot5;
; Elegant dashboard: in order to provide a better user experience, we hope that through the
dashboard control and monitor all affairs.Cluster based on P2P: RebornDB is now a cluster based
on agent solution, we may have to redesign the whole architecture, then use of P2P, the same as the
official redis cluster.