Part 3

This part introduces production-ready practices such as container optimization and deployment pipelines. We’ll also familiarize ourselves with other container orchestration solutions. By the end of this part you are able to:

Deeper understanding of Docker

We’ve focused on using Docker as a tool to solve various types of problems. Meanwhile we have decided to push some of the issues until later and completely ignored others.

The goal for this part is to look into the best practices and improve our processes.

In part 1 we talked about how alpine can be a lot smaller than Ubuntu but didn’t really care about why we’d choose one above the other. On top of that, we have been running the applications as root, which is potentially dangerous. In addition, we’re still restricting ourselves to one physical computer. Unfortunately, the last problem is out of the scope of this course. But we will get to learn about different solutions.

Look into the ubuntu image

Let’s look into the ubuntu image on Docker Hub

The description/readme says:

What’s in this image?

This image is built from official rootfs tarballs provided by Canonical (specifically, https://partner-images.canonical.com/core/).

From the links in the Docker Hub page we can guess (not truly know) that the image is built from https://github.com/tianon/docker-brew-ubuntu-core - So from a repository owned by a person named “Tianon Gravi”.

In that git repository’s README as step 7 it says:

Some more Jenkins happens

This step implies that somewhere there is a Jenkins server that runs this script, builds the image, and publishes the image to the registry - we have no way of knowing if this is true or not.

Let’s see the Dockerfile of https://hub.docker.com/r/_/ubuntu/ by clicking the 18.04 Dockerfile link.

The first line states that the image starts FROM a special image “scratch” that is just empty. Then a file ubuntu-bionic-core-cloudimg-amd64-root.tar.gz is added to the root from the same directory.

This file should be the “..official rootfs tarballs provided by Canonical” mentioned earlier, but it’s not actually coming from canonical, it is copied from repo owned by “tianon”. We could verify the checksums of the file if we were interested.

Notice how the file is not extracted at any point. The ADD instruction documentation states that “If src is a local tar archive in a recognized compression format (identity, gzip, bzip2 or xz) then it is unpacked as a directory. “

Before getting stressed by the potential security problems with this, we have to remind ourselves:

“You can’t trust code that you did not totally create yourself.” - Ken Thompson (1984, Reflections on Trusting Trust).

However, we will assume that the ubuntu:18.04 that we downloaded is this image. The command image history supports us:

$ docker image history --no-trunc ubuntu:18.04 

The output from image history matches with the directives specified in the Dockerfile. In case this isn’t enough, we could also build the image ourselves. The build process is, as we saw, truly open, and there is nothing that makes the “official” image special.

Deployment pipeline with docker-compose

Let’s set up a deployment pipeline from GitHub to a host machine. We will demonstrate this using your local machine, but the same steps can be used for Raspberry Pi or even a virtual machine in the cloud (such as one provided by Hetzner).

We will use GitHub Actions to build an image, push the image to Docker Hub, and then use a project called “Watchtower” automatically pull the image from there.

Let’s work with the repository https://github.com/docker-hy/docker-hy.github.io as it already has a Dockerfile and the GitHub Actions config for our convenience.

First either fork the repository or clone it as your own.

Let’s go over the GitHub Actions instructions. We will be using the official actions offered by docker, but we could’ve just installed docker and ran docker build. Most of the following is simply copied from the action usage instructions:

name: Release DevOps with Docker # Name of the workflow

# On a push to the branch named master
on:
  push:
    branches: 
      - master

# Job called build runs-on ubuntu-latest
jobs:
  build: 
    runs-on: ubuntu-latest
    steps:
    # Checkout to the repository (the actions don't actually need this since they use the repository context anyway)
    - uses: actions/checkout@v2 

    # We need to login so we can later push the image without issues. 
    - name: Login to DockerHub
      uses: docker/login-action@v1 
      with: 
        username: ${{ secrets.DOCKERHUB_USERNAME }}
        password: ${{ secrets.DOCKERHUB_TOKEN }}
            
    # Builds devopsdockeruh/docker-hy.github.io
    - name: Build and push
      uses: docker/build-push-action@v2
      with:
        push: true
        tags: devopsdockeruh/coursepage:latest

Before this will work we will need to add 2 Secrets to the repository: DOCKERHUB_TOKEN and DOCKERHUB_USERNAME. This is done by opening the repository in browser and first pressing Settings then Secrets. The DOCKERHUB_TOKEN can be created in Docker Hub, click your username and then Account Settings and Security.

Now create a docker-compose.yml. We will use watchtower to automate the updates.

Watchtower is an open source project that automates the task of updating images. It will poll the source of the image (in this case dockerhub) for changes in the containers that are running. The container that is running will be updated when a new version of the image is pushed to docker hub. Watchtower respects tags e.g. container using ubuntu:18.04 will not be updated unless a new version of ubuntu:18.04 is released.

version: "3"
services:
  coursematerial:
    image: devopsdockeruh/coursepage
    ports:
      - 4000:80
    container_name: coursematerial
  watchtower:
    image: containrrr/watchtower
    environment:
      -  WATCHTOWER_POLL_INTERVAL=60 # Poll every 60 seconds
    volumes:
      - /var/run/docker.sock:/var/run/docker.sock
    container_name: watchtower

Before running docker-compose up here, beware that watchtower will try to update every image running in case there is a new version. Check the documentation if you want to prevent this.

Run this with docker-compose up and commit something new into the repository. When you do the git push, follow how the github actions pushes a new image to DockerHub and then watchtower pulls the new image to your machine.

3.1 A deployment pipeline to heroku

Let’s create our first deployment pipeline!

For this exercise you can select which ever web application you already have containerized.

If you don’t have any web applications available you can use any one from this course and modify it. (Such as the course material itself)

Use GitHub, Github Actions, and Heroku to deploy a container to heroku. You can also use other CI/CD tools instead of GitHub Actions.

Submit a link to the repository with the config.

3.2 Building images inside of a container

Watchtower uses volume to docker.sock socket to access Docker daemon of the host from the container. By leveraging this ourselves we can create our own simple build service.

Create a project that downloads a repository from github, builds a Dockerfile located in the root and then publishes it into Docker Hub.

You can use any programming language / technology for the project implementation. A simple bash script is viable.

Then create a Dockerfile for it so that it can be run inside a container.

Make sure that it can build at least some of the example projects.

Using a non-root user

Let’s go back to our youtube-dl application. The application could, in theory, escape the container due to a bug in docker/kernel. To mitigate this security issue we will add a non-root user to our container and run our process with that user. Another option would be to map the root user to a high, non-existing user id on the host with https://docs.docker.com/engine/security/userns-remap/, and can be used in case you must use root within the container.

Our status from the previous part was this:

FROM ubuntu:18.04

WORKDIR /mydir

RUN apt-get update
RUN apt-get install -y curl python
RUN curl -L https://yt-dl.org/downloads/latest/youtube-dl -o /usr/local/bin/youtube-dl
RUN chmod a+x /usr/local/bin/youtube-dl

ENV LC_ALL=C.UTF-8

ENTRYPOINT ["/usr/local/bin/youtube-dl"]

We will add an user “appuser” with

RUN useradd -m appuser 

And then we change user with the directive USER - so all commands after this line will be executed as our new user, including the CMD.

FROM ubuntu:18.04 

WORKDIR /usr/videos

ENV LC_ALL=C.UTF-8 

RUN apt-get update
RUN apt-get install -y curl python
RUN curl -L https://yt-dl.org/downloads/latest/youtube-dl -o /usr/local/bin/youtube-dl
RUN chmod a+x /usr/local/bin/youtube-dl
RUN useradd -m appuser

USER appuser

ENTRYPOINT ["/usr/local/bin/youtube-dl"] 

I also renamed the WORKDIR to /usr/videos since it makes more sense as the videos will be downloaded there. When we run this image without bind mounting our local directory:

$ docker container run youtube-dl https://imgur.com/JY5tHqr

  [Imgur] JY5tHqr: Downloading webpage
  [download] Destination: Imgur-JY5tHqr.mp4
  [download] 100% of 190.20KiB in 00:0044MiB/s ETA 00:000
  ERROR: unable to open for writing: [Errno 13] Permission denied: 'Imgur-JY5tHqr.mp4.part'

We’ll see that our appuser user can not write to /usr/videos - this can be fixed with chown or not fix it at all, if the intented usage is to always have a /usr/videos mounted from the host. By mounting the directory the application works as intended.

3.3

This exercise is mandatory

In the previous parts we created Dockerfiles for both example frontend and backend.

Security issues with the user being a root are serious for the example frontend and backend as the containers for web services are supposed to be accessible through the internet.

Make sure the containers start their processes as a non-root user.

TIP man chown may help you if you have access errors

Optimizing the Dockerfile

The bigger your image is the larger the surface area for an attack is. The following tutorial to “Building Small Containers” from Google is an excellent video to showcase the importance of optimizing your Dockerfiles:

Let’s start by reducing the number of layers. To keep track of the improvements, we will follow the image size after each new Dockerfile.

FROM ubuntu:18.04 

WORKDIR /usr/videos

ENV LC_ALL=C.UTF-8 

RUN apt-get update
RUN apt-get install -y curl python
RUN curl -L https://yt-dl.org/downloads/latest/youtube-dl -o /usr/local/bin/youtube-dl
RUN chmod a+x /usr/local/bin/youtube-dl
RUN useradd -m appuser

USER appuser

ENTRYPOINT ["/usr/local/bin/youtube-dl"] 

209MB

We will glue all RUN commands together to reduce the number of layers we are making in our image.

FROM ubuntu:18.04 

WORKDIR /usr/videos

ENV LC_ALL=C.UTF-8  

RUN apt-get update && apt-get install -y \ 
    curl python && \ 
    curl -L https://yt-dl.org/downloads/latest/youtube-dl -o /usr/local/bin/youtube-dl && \ 
    chmod a+x /usr/local/bin/youtube-dl && \
    useradd -m appuser

USER appuser

ENTRYPOINT ["/usr/local/bin/youtube-dl"] 

207MB

As a sidenote not directly related to docker: remember that if needed, it is possible to bind packages to versions with curl=1.2.3 - this will ensure that if the image is built at the later date the image is more likely to work as the versions are exact. On the other hand, the packages will be old and have security issues.

With docker image history we can see that our single RUN layer adds 76.7 megabytes to the image:

$ docker image history youtube-dl 

  IMAGE          CREATED              CREATED BY                                      SIZE      COMMENT
  f221975422c3   About a minute ago   /bin/sh -c #(nop)  ENTRYPOINT ["/usr/local/b…   0B        
  940a7510dc5d   About a minute ago   /bin/sh -c #(nop)  USER appuser                 0B        
  31062eddb851   About a minute ago   /bin/sh -c apt-get update && apt-get install…   76.7MB
  ...

The next step is to remove everything that is not needed in the final image. We don’t need the apt source lists anymore, so we can glue the next line to our single RUN

.. && \ 
rm -rf /var/lib/apt/lists/* 

Now, after we build, the size of the layer is 45.6MB megabytes. We can optimize even further by removing the curl. We can remove curl and all the dependencies it installed with

.. && \ 
apt-get purge -y --auto-remove curl && \ 
rm -rf /var/lib/apt/lists/* 

..which brings us down to 34.9MB.

Now our slimmed down container should work, but:

$ docker container run -v "$(pwd):/usr/videos" youtube-dl https://imgur.com/JY5tHqr

  [Imgur] JY5tHqr: Downloading webpage

  ERROR: Unable to download webpage: <urlopen error [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed (_ssl.c:590)> (caused by URLError(SSLError(1, u'[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed (_ssl.c:590)'),))

Because --auto-remove also removed dependencies, like:

  Removing ca-certificates (20170717~18.04.1) ... 

We can now see that our youtube-dl worked previously because of our curl dependencies. If youtube-dl would have been installed as a package, it would have declared ca-certificates as its dependency.

Now what we could do is to first purge --auto-remove and then add ca-certificates back with apt-get install or just install ca-certificates along with other packages before removing curl:

FROM ubuntu:18.04

WORKDIR /usr/videos

ENV LC_ALL=C.UTF-8

RUN apt-get update && apt-get install -y \ 
    curl python ca-certificates && \ 
    curl -L https://yt-dl.org/downloads/latest/youtube-dl -o /usr/local/bin/youtube-dl && \ 
    chmod a+x /usr/local/bin/youtube-dl && \ 
    apt-get purge -y --auto-remove curl && \ 
    rm -rf /var/lib/apt/lists/* && \
    useradd -m appuser

USER appuser

ENTRYPOINT ["/usr/local/bin/youtube-dl"] 

168MB

From the build output we can see that ca-certificates also adds openssl

  The following additional packages will be installed: 
  openssl 

  The following NEW packages will be installed: 
  ca-certificates openssl 

and this brings us to 36.9 megabytes in our RUN layer (from the original 76.7 megabytes).

3.4

Return back to our frontend & backend Dockerfiles and you should see the some mistakes we now know to fix.

Document both image sizes at this point, as was done in the material. Optimize the Dockerfiles of both programs, frontend and backend, by joining the RUN commands and removing useless parts.

After your improvements document the image sizes again. The size difference may not be very much yet. The frontend should be around 432MB when using FROM ubuntu:18.04. The backend should be around 351MB. The sizes may vary.

Alpine Linux variant

Our Ubuntu base image adds the most megabytes to our image (approx 113MB). Alpine Linux provides a popular alternative base in https://hub.docker.com/_/alpine/ that is around 4 megabytes. It’s based on altenative glibc implementation musl and busybox binaries, so not all software run well (or at all) with it, but our python container should run just fine. We’ll create the following Dockerfile.alpine file:

FROM alpine:3.13

WORKDIR /usr/videos

ENV LC_ALL=C.UTF-8 

RUN apk add --no-cache curl python3 ca-certificates && \ 
    curl -L https://yt-dl.org/downloads/latest/youtube-dl -o /usr/local/bin/youtube-dl && \ 
    chmod a+x /usr/local/bin/youtube-dl && \ 
    apk del curl && \ 
    adduser -D userapp 

USER userapp

ENTRYPOINT ["/usr/local/bin/youtube-dl"] 

45.1MB

Notes:

  • The package manager is apk and it can work without downloading sources (caches) first with --no-cache.
  • useradd is missing, but adduser exists.
  • Most of the package names are the same - there’s a good package browser at https://pkgs.alpinelinux.org/packages.

Now when we build this file with :alpine-3.13 as the tag:

$ docker build -t youtube-dl:alpine-3.13 -f Dockerfile.alpine . 

It seems to run fine:

$ docker container run -v "$(pwd):/usr/videos" youtube-dl:alpine-3.13 https://imgur.com/JY5tHqr

From the history we can see that the our single RUN layer size is 39.4MB

$ docker image history youtube-dl:alpine-3.13

  IMAGE... 
  ... 
  14cfb0b531fb        20 seconds ago         /bin/sh -c apk add --no-cache curl python ca…   39.4MB
  ... 
  <missing>           3 weeks ago         /bin/sh -c #(nop) ADD file:093f0723fa46f6cdb…   5.61MB

So in total our Alpine variant is about 45 megabytes, significantly less than our Ubuntu based image.

Back in part 1 we published the ubuntu version of youtube-dl with tag latest.

We can publish both variants without overriding the other by publishing them with a describing tag:

$ docker image tag youtube-dl:alpine-3.13 <username>/youtube-dl:alpine-3.13
$ docker image push <username>/youtube-dl:alpine-3.13

OR, if we don’t want to upkeep the ubuntu version anymore we can replace our Ubuntu image by pushing this as the latest. Someone might depend on the image being ubuntu though.

$ docker image tag youtube-dl:alpine-3.13 <username>/youtube-dl 
$ docker image push <username>/youtube-dl 

Also remember that unless specified the :latest tag will always just refer to the latest image build & pushed - that can basically contain anything.

3.5

Document the image size before the changes.

Let’s test what the image sizes are when using FROM golang and FROM node in the backend and frontend projects respectively.

Return back to our frontend & backend Dockerfiles and change the FROM to something more suitable. Both should have at least alpine variants ready in DockerHub. Make sure the application still works after the changes.

Document the size after your changes.

Multi-stage builds

Multi-stage builds are useful when you need some tools just for the build but not for the execution of the image CMD. This is an easy way to reduce size in some cases.

Let’s create a website with Jekyll, build the site for production and serve the static files with nginx. Start by creating the recipe for Jekyll to build the site.

FROM ruby:3

WORKDIR /usr/app

RUN gem install jekyll
RUN jekyll new .
RUN jekyll build

This creates a new Jekyll application and builds it. We could start thinking about optimizations at this point but instead we’re going add a new FROM for nginx, this is what resulting image will be. And copy the built static files from the ruby image to our nginx image.

FROM ruby:3 as build-stage
...
FROM nginx:1.19-alpine

COPY --from=build-stage /usr/app/_site/ /usr/share/nginx/html

This copies contents from the first image /usr/app/_site/ to /usr/share/nginx/html Note the naming from ruby to build-stage. We could also use external image as a stage, --from=python:3.7 for example. Lets build and check the size difference:

$ docker build . -t jekyll
$ docker image ls
  REPOSITORY          TAG                 IMAGE ID            CREATED             SIZE
  jekyll              latest              5f8839505f37        37 seconds ago      109MB
  ruby                latest              616c3cf5968b        28 hours ago        870MB

As you can see, even though our jekyll image needed ruby during the build process, its considerably smaller since it only has nginx and the static files. docker container run -it -p 8080:80 jekyll also works as expected.

Often the best choice is to use a FROM scratch image as it doesn’t have anything we don’t explicitly add there, making it most secure option over time.

3.6: Multi-stage frontend

Multi-stage builds. Lets do a multi-stage build for the frontend project since we’ve come so far with the application.

Even though multi-stage builds are designed mostly for binaries in mind, we can leverage the benefits with our frontend project as having original source code with the final assets makes little sense. Build it with the instructions in README and the built assets should be in build folder.

You can still use the serve to serve the static files or try out something else.

3.6: Multi-stage backend

Lets do a multi-stage build for the backend project since we’ve come so far with the application.

The project is in golang and building a binary that runs in a container, while straightforward, isn’t exactly trivial. Use resources that you have available (Google, example projects) to build the binary and run it inside a container that uses FROM scratch.

To pass the exercise the image must be smaller than 25MB.

3.7

Do all or most of the optimizations from security to size for any other Dockerfile you have access to, in your own project or for example the ones used in previous “standalone” exercises. Please document Dockerfiles both before and after.

A peek into multi-host environment options

Now that we’ve mastered containers in small systems with docker-compose it’s time to look beyond what the tools we practiced are capable of. In situations where we have more than a single host machine we cannot use docker-compose solely. However, Docker does contain other tools to help us with automatic deployment, scaling and management of dockerized applications.

In the scope of this course, we cannot go into how to use the tools in this section, but leaving them out would be a disservice.

Docker swarm mode is built into docker. It turns a pool of Docker hosts into a single virtual host. You can read the feature highlights here. You can run right away with docker swarm. Docker swarm mode is the lightest way of utilizing multiple hosts.

Docker Swarm (not to be confused with swarm mode) is a separate product for container orchestration on multiple hosts. It and other enterprise features were separated from Docker and sold to Mirantis late 2019. Initially, Mirantis announced that support for Docker Swarm would stop after two years. However, in the months thereafter they decided to continue supporting and developing Docker Swarm without a definitive end-date. Read more here.

Kubernetes is the de facto way of orchestrating your containers in large multi-host environments. The reason being it’s customizability, large community and robust features. However the drawback is the higher learning curve compared to Docker swarms. You can read their introduction here.

The main difference you should take is that the tools are at their best in different situations. In a 2-3 host environment for a hobby project the gains from Kubernetes might not be as large compared to a environment where you need to orchestrate hundreds of hosts with multiple containers each.

You can get to know Kubernetes with k3s a lightweight Kubernetes distribution which you can run inside containers with k3d. This is a great way to get started as you don’t have to worry about any credit limits.

Rather than maintaining one yourself the most common way to use Kubernetes is by using a managed service by a cloud provider. Such as Google Kubernetes Engine (GKE) or Amazon Elastic Kubernetes Service (Amazon EKS) which are both offering some credits to get started.

3.8 Kubernetes

Familiarize yourself with Kubernetes terminology and draw a diagram.

Similarly to the networking diagrams in part 2. You will need to draw a diagram of at least three host machines in a Kubernetes cluster. The cluster is running two applications. The applications can be anything you want. An example could be a videogame server and a blog website.

The applications may utilize other machines or APIs that are not part of the cluster. At least three of the machines should be utilized. Include “your own computer” in the diagram as the one sending instructions via kubectl to deploy an application. In addition include a HTTP message coming from the internet to your Kubernetes cluster and how it may reach an application.

Make sure to label the diagram so that anyone else who has completed this exercise, and read the glossary, would understand it. The diagram should contain at least four of the following labels: Pod, Cluster, Container, Service and a Volume.

Glossary. And some helpful diagrams

I prefer to use draw.io but you can use whichever tool you want.

If you’re interested in Kubernetes you should join DevOps with Kubernetes, a free MOOC course just like this one.

Ending

Remember to mark your exercises into the submission application! Instructions on how and what to submit are on the exercises page.

ECTS Credits

Enrolling after each part is required for the ECTS credits. Now that you have completed part 3 use the following link to enroll in this course:

If you wish to end in this part and not do the following parts, follow the instructions at the bottom of exercises page

NOTE!

  • Enrollment for the course through the Open University is possible until Dec 12, 2021.

  • Credits for the course are only available to those students who have successfully enrolled on the course through the Open University and have completed the course according to the instructions.

* Electronic enrollment is available if you meet one of the following criteria:

  • You have a Finnish personal identity number

  • you are a student at the University of Helsinki, or

  • you are a student at a HAKA member institution.

If you are not a student in Finland and want to enroll on the course and receive ECTS credits.

Read the guide here under “Re­gis­tra­tion without a Finnish per­sonal identity code or on­line bank­ing ID at the Uni­versity’s Ad­mis­sions Services”: https://www.helsinki.fi/en/open-university/studying/beginning-your-studies/registration-and-fees