March 04, 2017

Containers vs. Virtual Machines

Containers vs. Virtual Machines

Virtualization is where one host computer can be turned into one or more pseudo-computers, known as virtual machines (VMs). Inside VMs, you can install an operating system and multiple applications.

A Docker container is similar to a virtual machine. It runs a pre-packaged application inside a container.

Virtualization decouples the application from the underlying hardware.
Containers, operating at a higher level, decoupling the application from the underlying operating system.

Hardware Virtualization
Virtualization abstracts hardware, allowing multiple workloads to share a common set of resources. Virtualization allows multiple workloads to co-locate on the virtualized hardware, while maintaining full isolation from each other.
The hardware abstraction piece of virtualization is made possible by a portfolio of technologies such as Intel Virtualization Technology (Intel VT) and AMD Virtualization (AMD-V).
These technologies conscribe the hardware in the effort to reduce overhead (in cache, I/O, and memory), cost and complexity of virtualization. They enable the hardware to run multiple operating systems and applications in independent partitions, allowing one computer system to function as multiple virtual systems.
Hardware virtualization technologies use hardware to do the job that virtual machine managers (VMM) do via software by incorporating virtualization extensions in a processor’s instruction set. The benefits include increased performance and scalability.
Hardware virtualization features include:
  • CPU virtualization - enables software in the virtual machine (VM) to run without any performance or compatibility penalty, as if it were running on a dedicated CPU.
    • VT-x - flex migration - migration between physical hardware
    • VT-d - directed I/O - enables data movement without processor involvement
    • VT-c - connectivity - packet sorting/routing without processor involvement
  • Memory virtualization - allows abstraction, isolation, and monitoring of memory on a per VM basis.
  • I/O virtualization - facilitates offloading of packet processing to network adapters as well as direct assignment of virtual machines to virtual functions, including disk I/O.

Containerization: A Timeline

Technologies that populate the containerization timeline include:

  • chroot (1979)
    • UNIX system call for changing the root directory of a process to a new location in the filesystem which is only visible to a given process
  • FreeBSD Jails (2000)
    • compartmentalize of files and its resources
  • Virtuozzo (2001)
  • Linux Vserver (2001)
  • Oracle Solaris Containers (2004)
  • Open VZ (Open Virtuzzo) (2005)
  • Process Containers (2006)
  • Google Control Groups (2007)
  • HP-UX Containers (2007)
    • used to create an isolated operating environment within a single instance of the HP-UX 11i v3 operating system
  • AIX WPARs - Workload Partition (2007)
    • software-based virtualization that allows multiple AIX environments to run on a single system. WPARs provide isolation from other processes and WPARs.
  • LXC - LinuX Containers (2008)
  • Cloud Foundry Warden (2011)
  • Google LMCTFY (2013) - replacement for LXC (based on cgroups, does not use namespaces)
  • Docker (2013)
  • Rocket/rkt (2014)
    • "what Docker was supposed to be before it got so big" - CoreOS
    • an alternative to the Docker runtime
    • is designed to be secure, composable, and standards-based
  • Windows Containers (2016)
  • Note: HP-UX hard partition (nPARs - establish boundaries that completely protect one virtual machine from another) and virtual partition (vPAR - software only partition, which runs its own instance of HP-UX). They are not considered to be containers as they run independent kernels.
Reference:

Resource Isolation

Docker Resource Isolation

Cgroups and Namespaces are features of the Linux kernel. They form the basis for lightweight process virtualization. Docker uses them to allow individual "containers" to run in an isolated environment within a single Linux kernel, avoiding the overhead of starting and maintaining virtual machines.
  • CGroups - resource allocation - limits how much resources can be used
  • cgroups (control groups) limits an application to a specific set of resources (CPU, memory, disk I/O, network, etc.). Facilitating the sharing of available hardware resources to containers. 
    • control group
      • A Resource Management and Resource Accounting/Tracking solution
      • Provides a generic process-grouping framework
    • A cgroup limits an application to a specific set of system resources
      • allows Docker to share available system resources to containers and enforce limits and constraints
      • e.g. you can limit the memory available to a specific container.
    • The kernel provides access to multiple controllers (also called subsystems) through the cgroup interface; for example, the "memory" controller limits memory use, "cpuacct" accounts CPU usage, etc.
    • originally developed by Google (in 2006 as 'process containers' and renamed 'Control Groups' in 2007 -- merged into 2.6.24 kernel in 2008)
    • governs the isolation and usage of system resources, such as CPU and memory, for a group of processes
    • can be manipulated by modifying files and directories in the /sys/fs/cgroup directory
  • Namespaces - resource isolation - limits what you can see
  • Namespaces allow an application to have its own view and control of shared system resources such as network stack, process space, mount point, etc.
    • Provides processes with their own view of the system
    • allows groups to be separated so they cannot “see” each other
    • originally developed by IBM
    • process’ namespaces are represented under the /proc/<PID>/ns directory:

    • Six Linux Namespaces:
      Namespace from Kernel Description
      UTS 2.6.19 Unix Timesharing System - Isolating kernel and version identifiers; domain and hostname - provides a way to get information about the system with commands like uname or hostname.
      The UTS namespace isolates two specific identifiers of the system: nodename and domainname. UTS namespace for example, allows changing the hostname.
      IPC 2.6.19 InterProcess Communication - Manage access to IPC resources; queues, semaphores, and shared memory - process/groups can have own IPC resources
      Isolating a process by the IPC namespace gives it its own interprocess communication resources, for example, System V IPC and POSIX messages.
      PID 2.6.19 Process ID - Process isolation
      Historically, the Linux kernel maintained a single contiguous process tree where every process exhibited a parent-child relationship. With the introduction of Linux namespaces, it's possible to have multiple distinct process trees where each process tree can have an entirely isolated set of processes.
      E.g. before Namespaces, init, the 1st process started on a Linux OS. It was the root of the process tree and had a process ID of 1. Every other process running in the OS fell somewhere in the hierarchy under init. With Namespaces, there can be multiple process hierarchies, each with its own process ID 1 process, and each isolated from one another.
      With process ID Namespace a new tree, with its own PID 1 process can be spun off the parent tree. The process that does this remains in the parent namespace, in the original tree, with the child being the root of its own process tree. Processes in the child namespace have no visibility into the parent process’s namespace. However, processes in the parent namespace have a complete view of processes in the child namespace.
      MNT 2.4.19 Mount - Manage filesystem mount points - processes can have their own root FS, conceptually close to chroot.
      Linux maintains a data structure for all the mount points on the system. With the Mount namespace, this data structure can be cloned, and processes in different Namespaces can change the mount points without affecting each other. Mount Namespace for example, allows creating a different file system layout, or making certain mount points read-only.
      NET 2.4.24 Networking - Manage network interfaces; IP, routes, devices, etc. - provides a logical copy of the network stack, with its own routing tables, firewall rules and network devices
      Network Namespace isolates processes into their own Namespaces, creating isolated network interface controllers (physical or virtual), iptables firewall rules, routing tables etc. Processes see an entirely different set of network interfaces depending on the process hierarchy.
      Network namespaces can be connected with each other using the "veth" virtual Ethernet device.
      USER 3.8 UID, GUID,…
      User namespace isolates the user IDs between namespaces.
      The user namespace allows a process to have root privileges within the namespace, without giving it that access to processes outside of the namespace.

      Other Isolation Technologies:
      • Hardware Virtualization technologies such as Intel-VT and AMD-V provide the processor’s hardware the ability to divide and isolate its computing capacity for multiple host virtual machines and their operating systems.
      • RunC

      References

Docker Characteristics

Four characteristics of Docker:

  • single-process
    • Preferred run-state is as a single application per container
    • Runs application as PID 1
    • Multi-tier components implemented in separate containers, by default
  • stateless
    • changes/state kept in a writable layer which exists only until the container is deleted
    • Container persistence initiated by committing changes to a new image
    • Persistent data implemented by writing to host mounts or data containers
  • scalable
    • image/container consists of a set of layers
    • layer downloaded only if it doesn't already exist on the host... saving space
    • container builds on instead of duplicating existing resources... saving space and time
    • layering improves resource efficiency: improves performance and scale
  • portable
    • containers are portable across Docker platform
    • self-contained, i.e. each container packaged with required configuration environment

Docker Runtime Environment

Docker Runtime Environment

Containerization, the ability to run multiple isolated compute environments on a single kernel, was not introduced by Docker. Docker's contribution includes a user-friendly management model.

Two features, cgroups and namespaces, introduced into the Linux kernel around 2008, make it possible to track and partition system resources within a single kernel. These and other capabilities are packaged by runtime environment technologies such as LXC, libContainer, and RunC. The runtime environment forms  the foundation of Docker's ability to host multiple isolated containers under a single kernel.

Docker facilitates building an application image, packaging it with all its dependencies, and running it in a software container (isolated user-space processes). The container runs the same on any    Docker-supported environment: physical server, virtual machine, a cloud platform. The mantra is: “build once, run anywhere”.

Docker combines:
  • kernel features (such as cgroups, namespaces, etc.)
  • a Union File System
  • a unified, low-level container format (runC)
  • a management framework
and leverages them to build, ship and run portable, and efficient computing environments called containers on physical, virtual and cloud platforms.

Note: The entry point to the container is an executable, specifically the default executable. It is the process running with PID 1 in the container. The entry point to a virtual machine is the kernel or the init program. In a VM (and standalone Linux server), the init process has PID 1 and it is the parent of all other processes on the system.

Operating System vs. Kernel
In a lot of scenarios, the operating system with the kernel are commonly conflated. With Docker and Containerization in general, the difference is key.

A kernel is an essential subset of the operating system. It provides a low-level interface to system  resources. The operating system includes the kernel and other resources: libraries, binaries, configuration files, etc. needed by a computing platform.

With Containerization, only the kernel is shared. All other resources can be abstracted out for each container.

Before settling on RunC, Docker used the following container formats in turn:
  • Linux Container (LXC)
    • LXC is an operating system-level virtualization solution for running multiple isolated Linux systems (containers) on top of a single kernel
    • Used in Docker up until Docker v1.8
    • This is where Containers got its name
  • Libcontainer
    • provides a standard interface to making sandboxes or containers inside an OS
    • a cross-system abstraction layer that attempts to standardize how applications are built, delivered, and run in isolation
    • Introduced as the default at Docker 0.9 (LXC was made optional)
    • Provides ability to manipulate OS Containers or “lightweight virtualization" features in a consistent and predictable manner, without depending on LXC or any other isolation control packages.
      • Protects against instability or changes across distributions or installation
Image: blog.docker.com/2014/03/docker-0-9-introducing-execution-drivers-and-libcontainer/
  • runC
    • the latest Universal Runtime
    • is built on libcontainer
    • v0.0.1 was introduced in July 2015
    • a lightweight, portable container runtime
See also:

Containerization Is The New Virtualization

What is Docker?

Docker is an open-source project that automates the deployment of applications inside software containers.

At a high level, Docker is a utility that efficiently builds, ships, and runs containers. It is a container management tool.

Software containers are self-contained, immutable execution environments. They do not change as it's promoted through the pipeline or development cycle. Each container has its own compute resources, and share the kernel of the host operating system.

Containers offer an environment as close as possible to that of a virtual machine (VM) without the overhead that comes with running a separate kernel and simulating the hardware. A Container could be correctly described as "operating system virtualization", which facilitates running multiple isolated user-space operating environments (containers) on top of a single kernel.


User-space is that portion of system memory in which user processes (i.e., everything other than the kernel) run. This contrasts with kernel-space, which is that portion of memory in which the kernel executes and provides its services. User processes can access kernel-space via the use of system calls.

Docker containers wrap, by default, a single application in an environment that contains everything it needs to run: code, runtime, system tools, system libraries.  It does it with minimal duplication of resources in a maximally isolated environments.

Containers Transform Applications, Infrastructure and Processes:


Applications: decomposing development into services that can be developed independently, improving efficiency, agility and innovation
Infrastructure: unlocking away from traditional DC to Cloud to a more flexible Hybrid model
Processes: enables easy adoption of Agile and DevOps processing over traditional Waterfall model, the goal being improving flexibility, innovation and go-to-market speed

From: Why containers - Beginning of the buyer’s journey -- IT Leader audience by Red Hat

Containers provide functionality to both the infrastructure and application:
  • Infrastructure
    • Isolate application processes on a shared OS kernel
    • Create light, dense execution environments
    • Enables portability across platforms
  • Application
    • Create portable, immutable environment packaged with application and depencies
    • Facilitate continuous integration and continuous development (CI/CD)
    • Easy access and sharing of containerized components

The goal of the container is to guarantee that the application will run the same, regardless of the environment. It does this by defining an abstraction of required machine-specific settings. With containers, "It works on my laptop." is no longer an excuse for delays in moving to production; with it, we know that if it works on the developers' laptop, it works in production.

The Docker container can be executed in any Docker-supported platform with the guarantee that the execution environment exposed to the application will be the same in development, testing, and production.

References:
  • User Space: http://www.linfo.org/user_space.html
  • Why containers - Beginning of the buyer’s journey -- IT Leader audience by Red Hat
  • Containers for the Enterprise: A Red Hat Virtual Event