March 25, 2017

Data-only Containers

Data-only Containers

Container volumes are containers that store and can share data, either exclusively as with data-only containers or as a side-benefit of regular containers with mounted volumes. Data-only containers take up no more resources than needed to provide storage services. They are instantiated via docker create or the VOLUME instruction in a Dockerfile.
  • Use docker volume create to create a volume at the command line:
    • $ docker volume create --name vol44
     
    The volume can be attach to a container at run-time:
    • $ docker run --rm -it -v vol44:/cvol44 alpine sh
     
  • Volumes can also be created via the VOLUME instruction in a Dockerfile:
    • FROM alpine
      VOLUME /data55
    Describe the intent of the following commands:
    • $ docker build -t vol45:01 .
    • $ docker images
     
    • $ docker inspect b159452444cb
    • $ docker run -d b159452444cb /bin/true
     
    • $ docker run --rm -i -t --volumes-from a010 alpine sh
     
  • Miscellaneous
    • Use docker volume rm <container name | ID> to fully delete a volume from the file-system:
      • $ docker volume rm 89f07c5cddc935717d75d1232fc1bf13f77a1f6fe26f0e13e9afdd3ca02fb053
    • Dangling volumes are those that are no longer being used by any other containers on the system. You can find all dangling volumes using:
      • $ docker volume ls -f dangling=true

 

Container Data Volumes

Container Data Volumes

Docker data volumes allow data to:
  • persist after the container is exited, removed or deleted
  • be shared between the host and the Docker container
  • be shared with other Docker containers
It allows directories of the host system, managed by Docker, to be mounted by one or more containers. It's simple to setup as you don't need to pick a specific directory on the host system
  • A use-case for Volumes is as a way to share directories across containers
  • Simple example:
    • $ docker run --rm -i -t -v /data/vol01 debian bash
      • This creates a volume /data/vol01 and makes it available to the container
      • The container volume, /data/vol01, maps to a directory on the host file system. You can get the location via the $ docker inspect command. Look in the Mount section for the Source name/value pair:

      "Mounts": [
          {
              "Type": "volume",
              "Name": "dd517d905c98c74dc0c10370a46dd8445d67dbf84162dc0d9076b4040c395134",
              "Source": "/var/lib/docker/volumes/dd517d905c98c74dc0c10370a46dd8445d67dbf84162dc0d9076b4040c395134/_data",
              "Destination": "/data/vol01",
              "Driver": "local",
              "Mode": "",
              "RW": true,
              "Propagation": ""
          }
      ],
    • File(s) generated in the container (e.g. fileFromContainer.txt) is accessible from the "Source" directory on the host
  • A container with a volume mount can be run in "detached" mode and its volume(s) can be used by other containers as their data volume(s), e.g.: $ docker run -d -v debian bash
    • Another container can attach to this volume, /var/vol02, using the --volumes-from option and the name or ID of the container: $ docker run -it --volumes-from debian bash
    • Every container that mounts the volume from this container will have the same view of any files in the target directory.
    • Data volumes provide better and more predictable performance than the Union File System as they:
      • exists outside of the Union File System
      • bypass the storage driver and operate at native host speeds
      • do not incur potential overheads introduced by copy-on-write
  • Another example of creating data volumes:
    • Use docker run with one or more -v flags to add data volume(s) to a container:
      $ docker run -it --name testvol -h -v /data debian /bin/bash
      This will make the directory /data44 inside the container live outside the Union File System and directly accessible on the host. Any files that the image held inside the /data44 directory will be copied into the volume.
      • --rm - for short-lived containers. Container automatically removed when it exits
      • -t - Allocate a pseudo-TTY
      • -i - enable the STDIN stream, e.g. for keyboard interaction with container
      •  -h - Container host name
      • --name - Assign a name to the container, by default Docker assigns a pseudo-random friendly name, e.g. "serene_galileo"
      • -v  - container volume mount name

Mount Host Directory As Data Volume

Mount host directory as a Data Volume

Docker allows you to mount a directory from the Docker host into a container. Using the -v option, host directories can be mounted in two ways: using an existing host volume, e.g. /home/john/app01, or a new auto-generated volume on the host, e.g. /var/lib/docker/volumes/53404f432f0….
  • Mount an existing host volume in the container:
    • $ docker run -v /home/john/app01:/app01 -i -t busybox In this example, the -v parameters are:
      • /home/john/app01 -- Docker host volume
      • : -- a single colon
      • /data -- container path where the host volume will be mounted
    • Any existing files in the host volume (/home/john/app01) are automatically available in the container mount point, /app01
    • To maintain portability, you cannot map a host directory to a container via the Dockerfile, as this specific directory may not be available on another host where the Dockerfile is applied.
    • More generally:
      • $ docker run -v <host_dir>:<container_dir>:ro -i -t <image> <default executable>
        • is the source directory
        • is the container directory
        • Add :ro to make the mount read-only
    • In addition to directories, the -v option can be used to mount single files between the host and container
  • Mount a Docker created host volume in the container:
      • $ docker run -v /app02 -i -t busybox
    • Docker creates a volume and maps it to an internal path on the Docker host
    • Docker creates a new volume on the host,  e.g. /var/lib/docker/volumes/14d8613f61f3a977c0b71e585d72e1099234084ee683259157f576b51baa4f64/_data and maps it to a volume, e.g. /app02, in the container.
    • In this example, the -v parameter is:
      • /app02 -- container path where the host volume will be mounted
    • Changes made in the container volume are immediately reflected in the host volume and vice versa.
    • You can assign the volume a name using the --name option, otherwise Docker assigns it a 64-character volume identifier. Use the docker volume inspect command to find the Volume Identifier:
    • "Mounts": [
          {
              "Type": "volume",
              "Name": "14d8613f61f3a977c0b71e585d72e1099234084ee683259157f576b51baa4f64",
              "Source": "/var/lib/docker/volumes/14d8613f61f3a977c0b71e585d72e1099234084ee683259157f576b51baa4f64/_data",
              "Destination": "/app02",
              "Driver": "local",
              "Mode": "",
              "RW": true,
              "Propagation": ""
          }
    • The advantage of Docker created host volumes is portability between hosts. Using it you do not require a specific volume to be available on any host that will make the mount.
  • A container volume is available for other containers to mount using the --volumes-from option. The following figure and commands demonstrate volume sharing among three containers:
    • container frosty_babbage has a volume /app02
    • new container alpinecontainer mounts its volume
    • second new container, alpinecontainer02 mounts volume from alpinecontainer
  • Create a container with a container volume, /app02: $ docker run -v /app02 -it alpine sh.
  • Create a new container and use --volumes-from to mount volume from the above container:
    $ docker run -i -t --name alpinecontainer --volumes-from frosty_babbage alpine sh
    • alpinecontainer is created
    • volume(s) mounted from container frosty_babbage using the --volumes-from option(s)
    • Notice below that volume /app02 is available, having been mounted from container frosty_babbage (container ID 8864dadc7c68):
  • Start a third container, alpinecontainer01 to mount volumes from alpinecontainer:
  • $ docker run -i -t --name alpinecontainer01 --volumes-from alpinecontainer alpine sh
    • alpinecontainer01 is created
    • volume(s) mounted from alpinecontainer using the --volumes-from option
    • Notice below that volume /app02 is available, having been mounted from container alpinecontainer (container ID 696646d6f37e):
  • Multiple -v or --volumes-from options can be used to mount multiple data volumes
  • Volume Use Cases:
    • Improved performance as it bypasses the storage driver, e.g. AUFS
    • Sharing data between containers
    • Share data between the host and the container
References:

Named Volumes

Named Volumes: Host and Container Data Volumes

Docker does not support persistent data by default, i.e. container filesystem changes do not persist across container restarts. A named volume is a mechanism for decoupling persistent data needed by your container from the image used to create the container and from the host. Named volumes persists even when no container is currently using it. Data in named volumes can be shared between a container and the host machine, as well as between multiple containers.
  • Docker uses a storage driver to create, manage, and mount volumes.
  • Volumes enable persistent data storage in a container environment
  • Volumes are directories that are stored outside of the container’s filesystem and hold reusable and shareable data that persists even after a container is terminated.
  • There are three ways to create volumes with Docker:
    • Map a host directory as a volume to a container directory using the -v option
    • Create a data-only container
    • Explicitly create a Docker volume using the docker create command
  • Volumes are not a part of the containers' Union File System
References:



  • https://docs.docker.com/docker-cloud/apps/volumes/


  • Storage Driver

    Storage Driver

    • A storage driver is how docker implements a Union File System
      • File System choices include: AUFS (default on Ubuntu), Device Mapper (default on Red Hat and Centos), btrfs, OverlayFS, VFS, ZFS,…
        • AUFS
          • AUFS is a unification filesystem
            • AUFS stacks multiple directories on a single Linux host and exposes them as a single unified view through a single mount point
            • To achieve this, AUFS uses a union mount.
            • The directories in the stack, and the union mount point, all must exist on the same Linux host
            • AUFS refers to each directory in the stack as a branch.
          • AUFS supports the copy-on-write (CoW) technology
          • oldest driver, default for Ubuntu, file level operation (CoW operations copy entire files)
          • runs at native speeds, leverages memory sharing and scales well
          • not well suited for working with large files (databases, logs, etc.)
          • it needs to be added as part of the installation process, as it's not in the mainline Linux kernel
          • containers with many changes will result in many branches and long traversal times
          • AUFS storage driver deletes a file from a container by placing a whiteout file (.wh.<filename>) in the container’s top layer
            • The whiteout file effectively obscures the existence of the file in the read-only image layers below.
          • "AuFS is a layered file system, so you can have a read only part and a write part which are merged together
            • One could have the common parts of the operating system as read only (and shared amongst all of your containers)
            • and then give each container its own mount for writing."
        • Device Mapper
          • contributed by Red Hat
          • works at the block level with thin provisioning
          • features include RAID, disk encryption, snapshots; complexity over AUFS, less visibility into diffs between images and containers due to block level operation
        • btrfs (B-tree file system)
          • is a Linux filesystem that Docker supports as a storage backend
    • Storage driver is also used in the context of the Storage backend for Docker Trusted Registry
      • Choice of where the actual images are to be stored:
        • Local file system
        • AWS S3
        • Azure Blobs
    • Storage driver:
      • Enables operational flexibility to choose a solution that fits particular use cases
      • Helps enable the Docker philosophy of "batteries included, but replaceable"

      References:

    Union File System

    Union File System (or UnionFS)

    A file system that amalgamates a collection of different file systems and directories (called branches) into a single logical file system. It allows files and directories of separate file systems, to be transparently overlaid, forming a single coherent file system
    • Docker uses Union File System (UnionFS) to combine multiple layers that make up an image into a single Docker image
    • Enables implementation of a modular image, that can be de/constructed as needed, as opposed to a monolithic image
    • Layers are always read top-to-bottom. If an object is found in a top layer and subsequent lower layer, only the top layer object is used
    • supports copy-on-write and read-only or read-write branches
    • Variants of the Union File System includes: AUFS (Advanced multi layered Unification File System), btrfs, VFS, DeviceMapper
    References

    Copy On Write (CoW)

    Copy on Write

      In the Copy-on-Write (CoW) strategy, system processes that need the same data can share the single instance rather than creating their own copies. If one process needs to modify a file in the shared data, the storage driver makes a copy exclusively for that process. The other processes continue to use the original data. A key benefit of the CoW strategy is optimizing image disk space usage and container start times.

      • Provides a link to the original data (in a read-only layer), on modification the data is first copied to the current (read-write) layer.
      • If a change is made to the file system, a copy of the affected file/directory is "copied up" from the 1st read-only data it is found in, into the container read-write layer.
      • A copy-up operation can incur a noticeable performance overhead: large files, lots of layers, and deep directory trees can make the impact more noticeable.
      • Docker uses a copy-on-write technology with both images and containers.
      • Works in conjunction with the Union File System and its layers

    Docker Tag

    Docker Tag

    • A tag is simply an alphanumeric identifier attached to the image, and used to distinguish one image from another
    • A tag name must be valid ASCII and contain lower or uppercase letters, digits, underscores, periods and dashes


    • The nginx repository on the official Docker registry contains multiple images.
      Note that the same image may have multiple tags, e.g. the alpine stable image has three tags (:1.10.3, :stable, :1.10) that all point to the same image.

      One way to verify that two or more images are identical is if they have the same SHA256 digest. All three tags of the alpine stable image have the same SHA256 digest: f829870f13c0b5471083fb59375fd914cf2597d814175bf1b7e868e191be210b

      Note: if you run $ docker pull nginx you get the “latest” image, which happens to be in the mainline tree. I.e. the above command does the equivalent of $ docker pull nginx:latest
    • The latest tag applies to an image that was built last and pushed onto the repository without a specific tag provided.
    • The latest tag is used as the default tag if no tag is specified when pushing an image to a repository
    • If you pull an image without specifying a tag, you will get the image tagged latest.
    • The more complete format of an image name is shown here:
    • [REGISTRYHOST:[PORT]/[_/][USERNAME/]REGISTRYNAME[:TAG]
      Here are some examples:
        docker pull localhost:5000/hello-world hello-world image on the local registry
        docker pull nginx nginx image from the official Docker Hub registry
        docker pull nginx:1.11 nginx image with tag 1.11 from the official Docker Hub registry
        docker pull registry.access.redhat.com/rhel-atomic rhel-atomic image from the official Red Hat registry

    March 19, 2017

    Docker Identifiers

    Identifiers

    A Docker Container can be identified in three ways: long form UUID (Universally Unique Identifier), a short form UUID and by Name. Identifiers help prevent naming conflicts and facilitate automation.
    • UUID
      • Universally Unique Identifier
      • Assigned to a container on creation.
      • UUIDs come in two forms:
        • a 64 character long form, e.g.
          • “f78375b1c487e03c9438c729345e54db9d20cfa2ac1fc3494b6eb60872e74778”
        • an abbreviated 12 character short form, e.g.
          • “f78375b1c487”
      • automatically generated and applied by the Docker daemon
      • Identifiers are commonly displayed in a truncated 12-character form
    • Name
      • A container name is generated and automatically assigned
      • Generated name format: <adjective>_<notable names>
        • On the left a list of strings, approximately 90 adjectives
        • On the right a list of approximately 150 "notable" scientists and hackers
      • Manually assign a container name using the --name option

      • The NAMES column lists the generated and manually assigned names. "testvol" is a manually assigned container name.
      • Images can be further identified using: image:[tag] and image@[digest]
        • Image:[tag]
          • :tag allows you to add a version to an image, e.g. ubuntu:16.04, where ubuntu is the image, 16.04 is the tag (or version)
        • Image:[@digest]
          • A content addressable identifier, e.g.  sha256:58e1a1bb75db1b5a24a462dd5e2915277ea06438c3f105138f97eb53149673c4
          • As long as the makeup of the image is the same, it's DIGEST value is a predictable and referenceable value, i.e. two images with the same DIGEST value can be assumed to be the same image
          • docker images --digests to display the DIGEST value of an image
          • An image can be pulled using the DIGEST value
      • Images and containers may be identified in one of three ways:
      • Identifier Type Example Value Length
        UUID long identifier f78375b1c887e03c9438c729345e54db9d20cfa2ac1fc3494b6eb60872e74778 64-character
        UUID short identifier f78375b1c887 12-character
        Name Pseudo-random generated names Variable
        Tag String identifying a version of an image Variable
        Digest Calculated SHA value of an image 64-character

      References:

    Docker Registry

    Docker Registry

    • A service that hosts repositories and provides an HTTP API to a distribution service for image upload and download
    • Can be public or private, can be Cloud or server based
    • Docker Hub
      • A registry of Docker images
      • A repository of available Docker images
      • API used to upload and download images and implements version control
      • Official site is hub.docker.com
      • Here's an example of downloading (pulling down) the alpine image from the Docker Hub via the command line:
    • Docker Store
      • A Registry of official Docker images
      • Self-service portal where Docker partners publish images and users deploy them
      • the next-generation Docker hub
      • Official site is store.docker.com
    • Private Registry
      • Local repository
    • Third-Party Registry
      • Providers may provide their own registry sites, e.g. the one by Red Hat

    Docker Layers

    Layers

      Docker images are read-only templates from which Docker containers are instantiated. Each image consists of a series of layers. Docker uses a union file system to combine these layers to form a runnable file, referred to as a Docker image. Layers are discrete entities, promoting modularity and reuse of resources. Each layer results from an instruction in the Dockerfile.
      • Layers represent filesystem differences
      • The Docker storage driver is responsible for stacking these layers and providing a single unified view.

      • Note: Image layer IDs are cryptographic hashes, while the container ID is a randomly generated UUID.
      • Each instruction in the Dockerfile creates a new layer. Note: only non-zero layers and layers that do not already exist on the system are downloaded with the docker run command.
      • Below is repo information for this nginx image on GitHub. There are eight Dockerfile instructions, reflecting the eight layers from the above output.
        Note: There is no FROM instruction listed, it's likely that the FROM instruction is transformed into ADD file: 89ec.. as they both would pull in the base image.

        ADD file: 89ecb642d662ee7edbb868340551106d51336c7e589fdaca4111725ec64da957 in /
        CMD ["/bin/bash"]
        MAINTAINER NGINX Docker Maintainers "docker-maint@nginx.com"
        ENV NGINX_VERSION=1.11.10-1~jessie
        RUN apt-key adv --keyserver hkp://pgp.mit.edu:80 --recv-keys 573BFD6B3D8FBC641079A6ABABF…
        RUN ln -sf /dev/stdout /var/log/nginx/access.log    && ln -sf /dev/stderr /var/log/nginx/error.log
        EXPOSE 443/tcp 80/tcp
        CMD ["nginx" "-g" "daemon off;"]

      • When an image is changed, the new copy of the image stores only the changed layer(s). The new image (e.g. changed-ubuntu below) has layers that are simply pointers to the original image files' layers.

      • Notice the new changed-ubuntu image does not have its own copies of every layer. The new image is sharing its four underlying layers with the ubuntu:15.04 image.