Build a (Docker) container from scratch in Python

(Docker) Container technology is popular to the degree of being standard in software development today, since it is so useful in making applications isolated, portable, and easily deployed across different environments.

This article assumes basic functional knowledge about Docker and explains how it works under the hood by building it from scratch.

What is a container?

Essentially an isolated process with all the files needed to run your program.

Examine what docker container does

The docker run command takes an image, the files needed to run the container, as an argument, and starts the container.

Steps

Run the container you built with python docker.py run bash in each step, and interact with it.

Create a CLI and start a process that runs Bash

Import the related libraries for reading commandline argument and create process.

import subprocess
import os

Problem: the filesystem is visible to our process (yet to become a container)

Isolate the filesystem the process can access by "chrootting") changing root of the process.

os.chroot("./container")
os.chdir("/") # go to root after setting the new root

To change root, you will need to be a super user.

Problem: Changing the hostname inside of our process also changed the hostname outside

Need to unshare the CLONE_NEWUTS flag before spawning a new child process.

# inside parent process
os.unshare(os.CLONE_NEWUTS)

Then, we can set the hostname for the container inside the child process (our container).

def child():
    ...
    socket.sethostname("container")

Problem: Processes running outside our container can be killed from within the container.

Unshare the CLONE_NEWPID flag with,

os.unshare(os.CLONE_NEWUTS | os.CLONE_NEWPID)

Mount a /proc filesystem so our container read information about processes spawn from it. This is helpful for, for example, the ps command.

target_dir = "proc"
    if not os.path.exists(target_dir):
        os.makedirs(target_dir)
    subprocess.run(["mount", "-t", "proc", "/proc", "proc"])

Now we can see the processes run from within the container with their id starting from 1!

Problem: Our yet to be container process can freely spawn processes and eat up the compute resource of our OS.

Register our container process to a linux control group called "container_cg" and set the group to have a max pid of 20. This effectively limits the number of process that can be spawned from inside the container.

def cg():
    cgroups = "/sys/fs/cgroup/"
    pids = os.path.join(cgroups, "pids")
    os.makedirs(os.path.join(pids, "container_cg"), exist_ok=True)
    with open(os.path.join(pids, "container_cg/pids.max"), 'w') as f:
        f.write("20")
    with open(os.path.join(pids, "container_cg/notify_on_release"), 'w') as f:
        f.write("1")
    with open(os.path.join(pids, "container_cg/cgroup.procs"), 'w') as f:
        f.write(str(os.getpid()))

Summary

Containers isolates

  • the file system with chroot.
  • system namespaces such as hostname and process id with linux process clone flags.
  • compute resouces by using linux control groups.