Tell people of distributed data storage
Neo, this is the question about which make us upset
Why there are so many AWS data storage options?Which one should I use?These are customers of common problems.In this series is divided into three parts blog, I will try to do some clarification.inThe first part, I will be on the basis of high availability, as well as why redundancy is the commonly used method to achieve high availability.I also briefly mentioned in the data layer redundancy will bring new problems.In the second part of this blog series, I will discuss some of these problems, and you need to consider when to overcome these problems.The third part of this blog series on the basis of the information, discusses the AWS specific data storage options, and optimization of each storage options for what the workload.In after you finish reading this blog all three part series, you will agree with AWS provides rich data storage products, and learn to correct the workload of choosing the right choice.
What's the problem with a relational database?
As many of you may already know, relational database (RDB) technology existed since the 1970 s, until the late 1990 s has been structured to store the DE facto standard.RDB is excellent for decades to support the high consistency transaction workload, and remain strong.As time goes on, the ancient technology in response to customer demand for the new abilities, such as a BLOB storage, XML/document storage, full text search, code is executed in the database, usingstardata structure of data warehouse, and geographical spatial extension.As long as everything is packed into a relational data structure definition, and suitable for the single machine, can be implemented in a relational database.
Then, have taken place in the commercialization of the Internet, changed everything, and make the relational database can no longer meet the demand of all the storage.Compared
to the consistency, availability, performance and extension is becoming as important - sometimes even more important.
Performance has been very important, but with the emergence of the Internet commercialization, change is the size.Facts have proven that to scale up the performance of the required skills and technology before the age of the Internet is unacceptable.Relational database around the ACID (Atomicity Atomicity, Consistency, Consistency, Isolation, Isolation and Durability Durability), the concept of implementing ACID the simplest way is to keep everything on a single machine.Is, therefore, the traditional method of RDB scale vertical extension (scale up), the vernacular said, is to use a bigger machine.
Oh - oh, I think I need a bigger machine
Use a larger machine solution has been very good, until the Internet brings the load to the single machine can't handle.This forced engineers have come up with a clever technology to overcome the limits of the single machine.There are many different ways, each have their advantages and disadvantages: Lord - vice, cluster, joint with partition table (table federation and partitioning), horizontal partition (sharding, can be thought of as a special case of the partition).
Another factor to the rise in data storage options are available.Before the age of the Internet system, the user is usually come from inside the organization, it is likely to set the planned downtime during non-work time, even an unplanned outage will only have limited impact.The commercialization of the Internet has also changed it: now everyone who has access to the Internet are all potential users, so it is possible that an unplanned down time caused a greater impact, and the global Internet lead to non-working hours, it is difficult to determine and arrange planned downtime.
inThe first part of this blog series, I discussed the redundant role in achieving high availability.However, when applied to the data storage layer, redundant brings a series of new interesting challenges.In the database application layer redundancy of the most common way is to master/deputy configuration.