From Virtual Machines to Containers (part 1)

This is a series of blog posts about X-Road® and containers. The first part provides an introduction to containers and container technologies in general. The second part concentrates on the challenges in containerizing the Security Server. The Security Server Sidecar – a containerized version of the Security Server – is discussed in the third part.

Nowadays, it’s hard to avoid hearing about Docker and containers if you work in the field of IT. It applies to X-Road, too, since questions regarding X-Road and support for containers have been arising regularly during recent years. But what containers are, and how do they differ from virtual machines?

What are the containers?

Containers package an application and all its dependencies, libraries, configuration files, etc., into a single package that contains the entire runtime environment needed to run the application. The package can then be deployed to different computing environments without having to worry about the differences between operating system distributions, versions of available libraries, etc. The differences are abstracted away by the containerization.

The difference between virtual machines and containers is that a virtual machine includes an entire operating system and the application. In contrast, a container only contains the application and its runtime environment. Therefore, containers are more lightweight and use fewer resources than virtual machines. The size of a container may be only tens of megabytes, and it can be started in seconds. Instead, a virtual machine with an entire operating system may be several gigabytes in size, and booting up may take several minutes.

Image 1. A physical server that runs multiple containers compared to a physical server that runs multiple virtual machines.

Image 1. A physical server that runs multiple containers compared to a physical server that runs multiple virtual machines.

A physical server that runs multiple virtual machines has a separate guest operating system running for each virtual machine on top of it. Instead, a server running multiple containers only runs a single operating system which resources are shared between the containers. However, each container runs in a separate, isolated process that has its namespace and filesystem. The number of containers that can be hosted by a single server is far higher than the number of virtual machines that the server can host.

Container technologies

Docker is commonly considered a synonym for containers, even if it’s not the only container technology out there. Besides, Docker is not the first container technology either since several other technologies had existed already before its launch in 2013. However, Docker was the first container technology, which became hugely popular among the masses, which is why the name Docker is often mistakenly used when referring to container technologies in general.

Nowadays, there are multiple container technologies available, and the fundamental building blocks of the technology have been standardized. The Open Container Initiative (OCI) is a project facilitated by the Linux Foundation, which creates open industry standards around container formats and runtime for all platforms. The standardization enables portability between infrastructures, cloud providers, etc., and prevents locking into a specific technology vendor. All the leading players in the container industry follow the specifications.

Images and containers

Images and containers are the two main concepts of container technologies. Therefore, understanding their difference on a high-level, at least, is essential.

A container image can be compared to a virtual machine image – except that it’s smaller and does not contain the whole operating system. A container image is an immutable, read-only file that contains executable code, libraries dependencies, tools, etc., that are needed for an application to run. An image represents an application and its virtual environment at a specific point in time, and it can be considered as a template of an application. An image is compiled of layers built on top of a parent or base image, which enables image reuse.

Containers are running images. When a new container is started, the container is created from a source image. In other words, the container is an instance of the source image, just like a process is an instance of an executable. Unlike images, containers are not immutable, and therefore, they can be modified. However, the image based on which the container was created remains unchanged. Consequently, it’s possible to create multiple containers from the same source image, and all the created containers have the same initial setup that can be altered during their lifecycle.

Images can exist independently without containers, but a container always requires an image to exist. Images are published and shared in image registries that may be public or private. The best-known image registry is probably Docker Hub. Images are published and maintained by software vendors as well as individual developers.

Stateful and stateless containers

Containers can be stateful or stateless. The main difference is that stateless containers don’t store data across operations while stateful containers store data from one time they’re run to the next. In general, a new container always starts from the sate defined by the source image. It means that the data generated by one container is not available to other containers by default. If the data processed by a container must be persisted over a lifecycle of the container, it needs to be stored on a persistent storage, e.g., an external volume stored on the host where the container is running. The persisted storage can then be attached to another container regardless of the source image of the other container. In other words, persistent storage can be used to share data between containers.

Handling upgrades

Upgrading an application running in a container also differs from the way how applications running on a virtual machine are traditionally upgraded. Applications running on a virtual machine are usually upgraded by installing a new version of the application on the existing virtual machine. Instead, applications running in a container are upgraded by creating a new image containing the latest version of the application and then recreating all the containers using the new image. In other words, instead of upgrading the application running in the existing containers, the existing containers are replaced with new containers that run the latest version of the application. However, the approach is not container-specific since handling upgrades on virtual machines in cloud environments often follows the same process nowadays.

Container management systems

Running a single container or an application consisting of a couple of containers on a local machine for testing or development purposes is a simple task. Instead, running a complex application consisting of tens of containers in a production environment is far from simple. Container management systems are tools that provide capabilities to manage complex setups composed of multiple containers across many servers. In general, container management systems automate the creation, deployment, destruction, and scaling of containers. Available features vary between different solutions and may include, for example, monitoring, orchestration, load balancing, security, and storage. However, running a container management system is not a simple task that brings additional complexity to management and operations.

Kubernetes is the best-known open-source container management system. Google originated it, but nowadays, it is widely used in the industry, and by different service providers. For example, all the major cloud service providers offer Kubernetes services. When it comes to commercial alternatives, Docker Enterprise Edition is probably the best-known commercial solution, but there are many other solutions available too.

Pros and cons

The benefits of containerization vary between different applications. And sometimes containerization may not provide any benefits. Therefore, instead of containerizing everything by default, only applications that benefit from containers should be containerized.

Containers provide a streamlined way to distribute and deploy applications. Containers are highly portable, and they can be easily deployed to different operating systems and platforms. They also have less overhead compared to virtual machines, which enables more efficient utilization of computing resources. Besides, containers support agile development and DevOps enabling faster application development cycles and more consistent operations. All in all, containers provide many benefits, but they’re not perfect, they have disadvantages too. 

In general, managing containers in a production setup requires a container management system. The system automates many aspects of container management, but implementing and managing the system itself is often complicated and requires special skills. Managing persistent data storage brings additional complexity as well, and incorrect configuration may lead to data loss. Besides, persistent storage configurations may not be fully compatible between different environments and platforms, which means that they may need to be changed when containers are moved between environments. For example, both Docker and Kubernetes have the concept of volume, but they’re not identical and, therefore, behave differently.

All in all, containers offer many benefits, and they provide an excellent alternative to other virtualisation options. However, containers cannot fully replace the other options, and therefore, different solutions will be used side-by-side in the future too.