The challenges of the Docker 300 million container
deployment and response
the IronWorker is a task queue service for developers, developers can not set up and manage any infrastructure, on the basis of scheduling a large-scale missions.Being tried a couple of months ago, Iron Docker, now its internal has deployed more than 300 million Docker container, in this article is to share the IronWorker when using the infrastructure based on Docker, challenges and solutions, and the harvest.The following is the original:
IronWorkerIs a task queue service, he let the developers don't have to set up and manage any infrastructure, on the basis of scheduling a large-scale missions.When we launched the service more than three years ago, using the language and code package contains all the LXC container operation tasks.Docker enables us to easily upgrade and manage a group of container, to provide customers with more language environment and installation package.
We just started to use is the Dokcer v0.7.4 version, in use process encountered some difficulties,Not closed properlyIs a big problem, but later have been resolved), we have successfully overcome all the difficulties, and found the Docker not only meet the needs of us, but also has exceeded our expectations.So we used in our infrastructure Docker.Based on our experience, it is meaningful to do so.
Below is a list of what time we realized the Docker advantage:
Update maintenance mirror is very easy
Doker using similar git is a very powerful way to manage the Image, so that you can easily manage it, constantly changing environment, his Image layered system not only save space and make us have more fine degree of differentiation of images.
Now, we are able to keep up with the quick update the rhythm of the language, we can offer special, for example, a new designed specifically for media processingffmpeg stack.We now have up to 15 different stack and is expanding rapidly.
The allocation of resources
Based on the Lxc container is operating system level virtualization method, all the containers share the system kernel, but each container can be constraints, using the specified resource, such as CPU, memory and I/O.D ocker provide REST API, version control, environment acquisition/submit mirror, easy to obtain statistical data, etc.Docker supports the use ofCoW file systemTo more security isolation data.This means that the task of all changes to the file stored separately, and can use a command to clear.LXC is cannot track this change.
Dockerfiles makes simple integration
Our team all over the world.As long as the release of a simple Dockerfile can go off work, when you rest, guarantee that other jobs will be generated and you the same image.To overcome the people in different places have different schedules.Clean the mirror makes it faster deployment and testing.Our iteration cycle faster, more happy everyone in the team.
A growing community
Docker at a very fast update, even faster than chrome.More importantly, to participate in adding new features and bug fixes in a large increase in the number of community.Both for contributing to the mirror and contribute to Docker, and even contribute to Docker surrounding tool, there are a lot of smart people are for their efforts, so we can't stay out.We found thatThe Docker communityVery active sense, we are very happy to be able to become a part of it.
Docker + CoreOS
We also in exploratory stage, but we found the Docker andCoreOSThe combination of seems to be a better choice for us.Docker provides a stable image management and containers.CoreOS provides a simplified cloud operating systems, choreography and virtual machine level distributed state management.The group focus on different aspects of the problem, is a more reasonable infrastructure stack.
Every server-side technologies need to fine tune and custom, especially large-scale runtime, and Docker is no exception.(for example, we run the task of less than 50 million, 500000 hours a month, and constantly update our mirror).The following is we use a lot of Docker container number encounter some challenges:
Backward compatibility is not enough
Although rapid innovation in the field is an advantage, but there are also shortcomings.One of which is the difference in backward compatibility.In most cases, we mainly is the problem of the command line syntax, API changes, from a product standpoint that is not a serious problem.
But in some cases, it affects the performance.For example, after any start the container Docker caused by mistake, we want to parse STDERR and depending on the type of error response (e.g. Retry).Unfortunately, error output formats with different version change, have to constantly change results in the debugging, made us very tired.
The Docker error rate
Relatively also better solve the problem, but means that every time update should pass validation for many times, and you need to develop until the most updated version was released to the system environment.V0.7.4 we use a few months ago, now our system updates to v1.2.0. In this area we have a great progress.
Limited tools and libraries
Although the Docker have a stable version, released four months ago, some of the tools around it is still not stable.The Docker ecosystem means that most of the tools need to put in more effort.In order to use the new features and bug fixes, you need someone on the team to stay up late to work overtime for these functions, frequent changes. That is to say, we are happy to have some surrounding the Docker tools in development, and look forward to can have a tool in which stand out.We etcd, fleet, kubernetes is look good.
Next, according to our experience, further tell us about our problems and our solutions.List mainly comes from our Ironworker chief development and engineering operations directorRoman KononovAnd has been debugging and standardization we Docker operationSam Ward。
When the Debug of an exception
Note, when we encounter and Docker related or other system related issues, we can automatically to perform a task, the user does not have any effect (retry is platform of built-in functions).
Delete operation time is long
Delete container at first time is long, need a lot of disk I/O operations.This leads to our system speed significantly slower, formed a bottleneck.We have to increase the number of available kernel, and this number is far more than we need.
Quickly remove the Docker container solution
By studying the usedevicemapper(a Docker file system driver), we found a role ` set an option - storage - opt dm. Blkdiscard = false `, this option tells the Docker delete container skip time long disk operation, greatly accelerated the shutdown process of the container.Delete the script when modify the good, the problem is gone.
Volume can't unload
Due to the Docker not reliably discharge volume, container can't stop correctly.This leads to the container at run forever, even if has completed the task.Solution is to explicitly call the users themselves can write some scripts to uninstall volume, delete folders.Fortunately, the problem is before we use the Docker v0.7.6 version of, when the Docker v0.9.0 to solve this problem after we delete those lengthy script.
Memory limit switch
Docker one release sudden memory limit option has been added, deleted the option in the LXC.As a result, some of the work process to the memory limit, then the overall response.This make us unprepared, because even if it does not support setting, Docker also have no wrong.Solution is very simple, that is set within the Docker memory limit, but this change let us off guard
As you can see, we have Docker spend very much, we will in the next day to continue.In addition to using it to isolate the user running the code in the IronWorker, we are prepared to use it in some other areas.
These areas include:
In addition to using Docker as containers of tasks, we also use it to manage each running on the server to manage and start the task into said.Each process with the main task of the
pick a task from the queue, put it in the right Docker container, operation, monitoring, running after deleting the environment.It is interesting to note the same machine we have container code to manage the other containers.Put all our infrastructure environment in the Docker containers let us on the CoreOS run quite easy.
IronWorker, IronMQ and IronCache APIs
We and other ops team, no one likes to deployment.Can put all our service package Docker container, and then simple, certainly deployment, we are very excited.Don't have to configure the server.What we need is just to be able to run Dokcer container server.We are replacing our server set up, use the Docker container product announced on the server for our built environment.Become flexible, simple, more reliable protocol stack.
Generation and loader
We are also using Docker container in IronWorker generation and loader.A significant progress is improved for the user, large-scale, specific task load and workflow to create, upload and run the task process.There's another advantage is the user can in the local test procedure, test environment and our production services.
Enterprise internal deployment
Use Docker as the main distribution method, IronMQ internal deployment version simplifies our distribution of work, and provides a simple and common in almost any cloud environment can deploy method.As we run on a total of cloud services, customer need is can run the Docker container server, at the same time, they can be relatively easy to obtain in the test or production environment run multiple servers of cloud services.