How to use the amazon EC2 service availability and
efficient data storage
Since its launch, AWS EC2 service (only at that time the only instance types: m1. Small) from expanding its capability of providing basic operation into now very perfect Elastic computing infrastructure service, abundant, not only the type of operation instance and its related services (such as EBS, Elastic IP, Auto Scaling, ELB, etc.) also join.EC2 has become the cornerstone of the AWS service, many later released the AWS service, such as relational database service (RDS), service (EMR) elastic graphs are built based on EC2.Now, EC2 service including the increasingly rich content, how to efficient and economic use of EC2 service for many AWS developers to focus on key points.In the past four years, we have been using the AWS to build the whole service online.Because the project background is computing intensive services, so EC2 is the most used service (more than 90% of the cost of the whole project AWS).This article will share from the actual project from the following aspects summed up the experience in using EC2.
; Service availability: AWS EC2 in itself provides high availability
services at the same time, also provides a number of related
services to help us improve the service availability.The correct
use of these services can help to build a high availability of AWS
; Data Storage: AWS provides a variety of Storage service, which is
combined with EC2 is EBS and most closely Instance Storage.How to
reasonably choose and use the storage service is a key consideration
in using EC2.
; Security service: in the public cloud services, safety is always
a key subject.AWS EC2 in providing security protection from top to
bottom, we need to rational use of these services strategy to
protect their service as well.
; Cost control: compared with the storage, network and other services,
still more expensive computing services, EC2.Therefore, efficient
and economic use of EC2 service has become a most important part
of the overall service cost control.
; Automatic deployment, with emphasis on the present agile
development, support rapid deployment is indispensable.AWS for
rapid deployment of services offers a variety of solutions, and have
enough API to support custom deployment strategy.Meanwhile, AWS has
multiple data centers in the world, so in the global rapid
deployment services has also become possible.
Any one of these aspects can have a lot of content.As a result, the following discussion will only around EC2 to introduce in the field of personal experience and Suggestions accordingly.
Popular, service availability is the probability of service available to users, usually expressed as a percentage (such as AWS EC2 provides 99.95% of availability).To ensure the high service availability is a very complex challenges, especially in the case of high concurrency and complex technology stack), and the index is basic indicator of all online services.Therefore, AWS EC2 in guarantee under the premise of its high service availability, also for deployment on its various service provides a basis for building a high availability service facilities.In discussing this subject, therefore, take a look first at the AWS recommended by the typical Web services architecture (figure 1), we can summarize some recommended practices.
Figure 1 AWS recommended Web application architecture
1. In a number of the Availability Zone deployment service (AZ).Each AWS AZ are physically separate isolation within the Region, they have separate power supply system and network access, or even two completely separate rooms.When the service deployment under the multiple AZ, only deploy all AZ fails, the service will be interrupted.In theory, if a service deployed in N AZ EC2 at the same time, then the probability of EC2 problems affecting service availability to a (N) 0.05 x 100%.
2. With Auto Scaling (AS) to automatic scheduling EC2 instances.Generally speaking, when the service encounter some unexpected or expected high flow, or your service appeared some abnormal, the entire service availability is subject
to great challenge, and AS mechanism is to help you cope with these situations.AS mechanism can be set by before Scaling Up and Scaling Down conditionsstart automatically shut Down or EC2 instances to cope with change and service exceptions.In general, any online service should have the ability of automatic transverse expansion, AS is the infrastructure layer provides this kind of ability, and you only need to pay attention to in the application layer to support this ability.Of course, if the AS mechanism cannot meet the demand, then can use EC2 API to implement their Scaling algorithm (we did).
3. Use Elastic Load Balancing (ELB) to balance the Load of multiple EC2 instances.Actually, load balancing is a long-standing concept, and in traditional data centers also have corresponding implementation (software or hardware) directly.But ELB than traditional data center of the load balancer has the following some obvious advantages.
; ELB itself is automatic elastic scaling.When there is a large flow
(especially the network attack or sudden emergencies), the Load
Balancer's own Load would be a huge challenge.And ELB are
implemented based on EC2, can be very easily automatic telescopic
capacity to cope with these conditions (note that ELB itself elastic
telescopic completely transparent to developers, this should be
called it the reason of elastic load balancer).
; ELB can very easily with other AWS services (such as Auto Scaling,
Cloud Watch, Route53, etc.).Although ELB can work independently,
directly to the registration of EC2 instances distribution flow,
but to the ELB register a Auto Scaling Group is a more common
choice.At the same time, ELB also in Cloud Watch is used to determine
the health service instance.In addition, ELB Endpoint is often as
a service in a AWS Region in Route53 entrance and configure DNS
4. In multiple data centers (Region) deployment EC2 instances.Despite more than AWS data center's purpose is to help customers around the world use (provide better network environment, comply with all legal requirements, etc.), but more data center deployment can also help us to improve the availability of services.You can completely paralysed in a data centre under the condition of the user guide other data center.But when using this method, need to pay attention to network performance and the legal risks and related issues.
After introduction to ensure availability of AWS service, use the following Suggestions can help us make better use of these services.
1. As far as possible the same service ability to deploy on multiple AZ (e.g., EC2 instances).Although EC2 availability is a reliable and ELB has helped us very well balanced load, but a AZ has seen the whole problems.When emergency
occurs, the more AZ roughly balanced design can better ensure service availability.
2. To provide more than simple data center switching logic.Although need more data center switching situation is not a lot, but it is real.A month ago, AWS data center in the western u.s EC2 service problems for more than four hours.During this time, we in the west the data center all the service is not available, and what is more serious is that we service for the vast majority of users are from the west.Fortunately, we have more service support dynamic data center switch (this is designed to help users to dynamically select the best data center), simply modify a configuration on all flow into the spanish-american data center (of course, the use of user experience may be decreased because of the network latency increase).Although our own the dynamic data center switching logic, but Route53 now would be a better choice.Only need simple set each Region in Route53 Endpoint Failover logic can achieve similar results.I suggest deployment on AWS global service it is best to use Route53 to increase the flexibility of the overall architecture.
3. Open the ELB "Cross the Availability Zone" function.Whenstarting a ELB, it will be in all the required AZ created in the corresponding Load Balancer instance, then distributing the user traffic turns to different Load Balancer of AZ instance.By default, each Load Balancer instance will only give them flow to EC2 instances of the same AZ processing, as shown in figure 2.
Figure 2 default ELB flow distribution model
In most of the time of figure 2 architecture can work, but in some cases it can lead to various AZ load imbalance, even the service is not available.For example, a service instance integral problems on AZ, all assigned to the AZ user requests will go wrong.In addition, because the DNS is returned to the user Load on each AZ Balancer IP (TTL value of the DNS is short, the default is 60 seconds),
if the user in a short period of time to the service launched a large number of requests can lead to different AZ still bear the Load of different internal QA testing (we encountered such a problem).In addition, some business needs we need to Stick to connect, the possibility of more big load imbalance.To this end, ELB join Cross the Availability Zone function.When the user opens it, after every Load Balancer will average distribution of flow rate on all instances, not just the current instance of AZ.The whole structure is shown in figure 3.
Figure 3 open Cross the Availability Zone after ELB flow distribution model
This functionality, of course, also can bring a side effect, is that the Load Balancer and communication between EC2 may cross AZ, thereby increasing network latency.AWS, however, when designing the data center on the same network latency between the Region of multiple AZ have strict restrictions, so the side effects generally will not affect the end user's experience, unless the service is delay sensitive type.
4. Open the ELB "Connection" Draining function.In routine maintenance service, the operations staff is one of the challenges processing services to upgrade and all kinds of anomalies.The "Connection ELB Draining" can help in this respect to operations staff.The basic idea is that when some EC2 instance need to exit the current service resources pool, let ELB don't distribute the new request to these instances, and set a Timeout time to keep the current have requests on these instances are still able to normal processing is completed.Similar ideas for operations staff certainly isn't fresh.The traditional approach is by the Load Balance or Proxy for flow switch (also supports the ELB itself).But the "Connection ELB Draining" can make the whole process is the main benefit of complete automation.Practice can combine ELB flow switch and "Connection Draining", so as to achieve efficient and smooth upgrade online service or processing of abnormal instance.
Data is stored
In order to meet the different needs of AWS offers a variety of different storage service.In these storage services, EBS and Instance Store is directly related to the EC2 (EC2 and other, such as S3, DynamoDB, RDS for data exchange, but they are beyond the scope to be discussed here).EBS is AWS provides persistence storage service, can be easily mounted to EC2 instance as a block device access.And the Instance Storage is a physical connection and EC2 Instance piece of equipment, but it's data can be lost with the demise of EC2 instances, so often referred to as a persistent Storage service.In the use of these closely related to EC2 block storage service, the following Suggestions may help to you.
1. Fully using the Instance Storage.In many documents of AWS, EBS were introduced and recommended.And than the original AMI based on EBS is based on the Instance Storage AMI has many obvious advantages such as faster launch velocity, etc.), but this does not mean that the Instance Storage is nothing.On the contrary, sometimes the Instance Storage reasons would be a better choice.
; Instance Storage is a physical connection and EC2 instances, and
EBS plate is through the network connection and EC2 Instance.As a
result, the Instance Storage access speed will be faster and more
stable.Especially, more and more new EC2 instances (e.g., M3 series)
began with the Instance of SSD medium Storage, the access speed of
the gap is more obvious (can be found in the AWS official document
various Instance type is equipped with the Instance of the Storage
media type, capacity and number).
; The Instance Storage fee is included within the cost of EC2
instances, while EBS to pay extra.
; The Instance Storage access mechanism is more suitable for RAID to
get a higher access throughput.Mechanism of RAID storage system is
used to improve the digital access a common way of performance and
reliability.In AWS, you can use EBS plate already, also can use the
Instance Storage for raid 0.But because EBS plate is through the
network connection and EC2 instance, even if the RAID mechanism to
improve the storage of the throughput, but also is limited by the
network bandwidth limit (test results proved that even this
limitation is also very obvious).In addition, most of the AWS
Instance provides more than two pieces of the Instance Storage
Storage devices, this design also facilitate everybody use RAID.
2. For Instance Storage 'biggest worry is mainly comes from its persistent Storage limit.But non-persistent doesn't mean can't use (in fact, the computer CPU Cache and Memory are persistent Storage devices, operating systems and hardware only in the management of them), I think the Instance Storage for at least the following two kinds of Storage.
; Store huge amounts of data in the middle.Scientific computing, for
example, middle class service often produce a huge amount of data,
the data can't fully stored in the Memory, and back to the persistent
storage service is slow and not much value.But they are very
suitable for in the Instance Storage, because even some leakage also
has nothing to do, can get the same data to recalculate again, and
in the Instance Storage can also be very quickly access the next
processing logic.Scientific computing service, therefore, it is
commonly used only store the intermediate calculation results and
the final result, usually data volume is smaller than the
intermediate results will be many) deposit back into persistent
storage services (such as EBS, S3, etc.).
; As the data file Cache service.Our service, for example, the design
of the need to deal with a lot of big files, and these documents
have been made to the Autodesk cloud storage.As a result of the
limitation of access speed cannot change every time write directly
back into the cloud Storage, we will use the Instance Storage for
the local Cache., of course, used in this scenario, we need to make
sure that the Instance Storage of data can be timely and automatic
synchronization back into the cloud.The overall structure as shown
in figure 4.
Figure 4 as a data Cache Instance Storage synchronization mode
Use the reserved IOPS EBS plate and EBS optimization EC2 instance to ensure that EBS access performance.In the case of persistence to access, EBS is still the best choice.But as a result of EBS and EC2 through the network connection, the access speed is often affected by network busy situation.If the service is sensitive to a disk access speed (high concurrent database access, for example), the performance fluctuation can be fatal.Therefore, AWS launched the reserved IOPS EBS set, need to pay some additional cost to purchase reserved IOPS, but
it can ensure the millisecond delay and can guarantee 99.9% time can provide at least more than 10% of the reserved IOPS.In order to guarantee the access bandwidth on EC2, you also need tostart the specially optimized for EBS EC2 instance (a AWS physical machine can run multiple EC2 instances, and each I/O bandwidth of EC2 instances need to EBS, Internet access, and other aspects. Ifstart the EBS optimization EC2 instances in particular, it will establish and EBS disk access channel separately, thus reducing with other instances or network access for the problem of I/O bandwidth).EBS optimized EC2 instances, of course, also need additional costs.In general, use a combination of the reserved IOPS EBS plate and EBS optimization of EC2 instances to the EBS access performance has the certain assurance.