Build and push your Docker images using Github Actions

February 20, 2020

I was working on a CI (continuous integration) portion for a project that lives in a mono-repo and hosts multiple, independently deployable services. Each service in the repository is versioned separately. There is a VERSION file with the version number inside each service folder. However, the application itself is currently deployed as a monolith - a single, versioned Helm chart with sub-charts for each of the services. Theoretically, we are in the spot where we could be deploying all these services separately by replacing one helm install/upgrade command with a Helm command for each service.

Here’s how the project structure looks like:

src
├── service1
│   ├── Dockerfile
│   └── VERSION
└── service2
    ├── Dockerfile
    └── VERSION

The first step I did is to figure out what the easiest way would be to build Docker images for each service and then push the images to a Docker image repository. Github Actions were the logical and simplest choice in this case since the source code lives on Github as well.

I have used an existing starter workflow from Github that builds the image, logs into the Docker registry and then pushes the image to the repository. The initial version I came up with after removing a couple of steps from the original workflow looked like this:

name: Docker

on:
  push:
    branches:
      - master

  env:
    # TODO: Change variable to your image's name.
    IMAGE_NAME: image

jobs:
  push:
    runs-on: ubuntu-latest

    steps:
      - uses: actions/checkout@v2

      - name: Build image
        run: docker build . --file Dockerfile --tag image

      - name: Log into registry
        run: echo "${{ secrets.DOCKER_PASSWORD }}" | docker login ${{ secrets.DOCKER_REGISTRY_URL }} -u ${{ secrets.DOCKER_USERNAME }} --password-stdin

      - name: Push image
        run: |
          IMAGE_ID=${{ secrets.DOCKER_REGISTRY_URL }}/${{ secrets.DOCKER_REPOSITORY_NAME }}/$IMAGE_NAME
          # Strip git ref prefix from version
          VERSION=$(echo "${{ github.ref }}" | sed -e 's,.*/\(.*\),\1,')
          # Strip "v" prefix from tag name
          [[ "${{ github.ref }}" == "refs/tags/"* ]] && VERSION=$(echo $VERSION | sed -e 's/^v//')
          # Use Docker `latest` tag convention
          [ "$VERSION" == "master" ] && VERSION=latest
          echo IMAGE_ID=$IMAGE_ID
          echo VERSION=$VERSION

          docker tag image $IMAGE_ID:$VERSION
          docker push $IMAGE_ID:$VERSION

I added the following Github Secrets, so I don't need to hardcode them inside the workflow (you probably don't want to have your Docker registry password in there):

Github Secret nameDescription
DOCKER_PASSWORDPassword for the Docker registry
DOCKER_USERNAMEUsername for the Docker registry
DOCKER_REGISTRY_URLDocker registry URL (for example: docker.pkg.github.com)
DOCKER_REPOSITORY_NAMERepository name (for example: myrepo)

You could argue that DOCKER_REGISTRY_URL, DOCKER_REPOSITORY_NAME and/or DOCKER_REPOSITORY_NAME don't belong in the secrets and I agree, however it makes it much easier to update them without changing the code. The downside (at least for non-secret variable) is that the values are masked in the logs. For example, when you try to push the Docker image the $IMAGE_ID:$VERSION would show up like this in the logs:

...
The push refers to repository [***/***/service1:0.0.1]
...

Another downside of storing non-secret values as secrets is that you don't get any auditing trail/tracking. Heck, once you set them you can't even see their values. If auditing is something you'd like to have or your workflow hits the Github Secrets limit (max. 100 secrets and up to 65KB in size), you can also store encrypted secrets in your repository and store the password for decrypting them as a Github Secret. Then, whenever your workflow needs access to secrets, you would encrypt them from the repository. You can read more about using encrypted secrets here.

In the above snippet, the version is determined from the tag - since we have multiple services and multiple different versions, I had to change that. So instead of getting github.ref, I had to read the value(s) from the VERSION file:

---
- name: Push image
  run: |
    IMAGE_ID=${{ secrets.DOCKER_REGISTRY_URL }}/${{ secrets.DOCKER_REPOSITORY_NAME }}/$IMAGE_NAME

    VERSION=$(cat service1/VERSION)

    echo IMAGE_ID=$IMAGE_ID
    echo VERSION=$VERSION

    docker tag image $IMAGE_ID:$VERSION
    docker push $IMAGE_ID:$VERSION

Now this works if I want to build/push service1. You could duplicate those version lines and create an IMAGE_NAME variable for each service, but that doesn't look too good and it requires you update the workflow each time you add or remove a service.

Ideally, I only want to build and push the images if the version file has changed. It's more performant to build and push only when you need to vs. building and push on all changes, regardless if the services or if their version were modified.

The following line in the workflow already ensure that it only runs on pushes to the master branch:

on:
  push:
    branches:
      - master

By adding the paths key, you can also do matching on files or folders. I added the following match, so the workflow would only trigger on the master branch and if the change contains files that match the following pattern: src/**/VERSION. This pattern matches any VERSION file under the src folder and its subfolders. So this would match src/service1/VERSION, src/something/service-a/VERSION or src/VERSION.

on:
  push:
    branches:
      - master
    paths:
      - 'src/**/VERSION'

With this change I will ensure the workflow only runs if any of the VERSION files were updated, however, I would still need to know which version file was updated, so I know which service I need to build.

To figure that out, I used the git diff-tree command that looks like this:

git diff-tree --no-commit-id --name-only -r ${{ github.sha }}

The github.sha is the commit SHA that triggered the workflow to run. The no-commit-id and name-only flags ensure only the file paths are displayed - no commit IDs, and -r will recurse into sub-trees.

Let's look at a couple of examples on how this command works. If I use SHA 28e8761 as an example, here's how different flags control the output of the diff-tree command:

# Shows the full commit-id and folder name only (src)
$ git diff-tree 28e8761
28e8761d1f382d28ed9cfbf55407cfff8c3d0bea
:040000 040000 73b19d6e19192b77df6bbcf9750d19555af2763a 694fcd1447cec1f59fba2d3d21708890e02c03d7 M      src


# Don't show the commit ID
$ git diff-tree --no-commit-id 28e8761
:040000 040000 73b19d6e19192b77df6bbcf9750d19555af2763a 694fcd1447cec1f59fba2d3d21708890e02c03d7 M      src

# Only show the name
$ git diff-tree --no-commit-id --name-only 28e8761
src


# And recurse into the subtree
$ git diff-tree --no-commit-id --name-only -r 28e8761
src/service1/VERSION

This looks great! And so I thought. The issue is that if you have multiple commits (assuming you aren't squashing them) the diff-tree will only look at the last one. You need to provide another parameter to the diff-tree to tell it which commit to compare it to - that would be the last merge to the branch. So the first parameter is the last commit and the second one is the last merge to that branch, so the command would output changed files between those two trees.

If we assume the last merge to master is SHA d158d52, the command and its output would be this:

$ git diff-tree --no-commit-id --name-only -r 28e8761 d158d52
README.md
src/service1/VERSION
src/service2/VERSION

The second SHA value can be obtained from the ${{ github.event.before }} value, so at least you don't have to do more git magic. With this you get all files that have changed, and to only get the VERSION files, just use grep:

$ git diff-tree --no-commit-id --name-only -r 28e8761 d158d52 | grep "VERSION"
README.md
src/service1/VERSION
src/service2/VERSION

Note that this could be improved, as the grep matches all lines containing the VERSION string and that could be other files as well.

Putting this together into a for loop, I ended up with this:

for versionFilePath in $(git diff-tree --no-commit-id --name-only -r ${{ github.sha }} ${{ github.event.before }} | grep "VERSION");
do
  # Do the magic here!
done;

Next, I needed the folder name where the service lives (e.g. src/service1/) and the service name which I am using for the image name (service1).

To get the folder, you can use parameter expansion/substition:

# If versionFilePath is "src/service1/VERSION", folder variable value will be "src/service1"
folder=$(versionFilePath%"/VERSION")

Using the % you can strip the string in quotes "/VERSION" from the original variable (versionFilePath). I did something similar to get the image name (or the folder name):

IMAGE_NAME=${folder##*/}

Note: you can probably use cut, rev, tr and bunch of other commands as well.

The above command trims everything from the $folder variable until it hits the / character. Which means I am left with the last folder name in the path.

After some testing and trying stuff out, I ended up with the following workflow file:

name: Docker

on:
  push:
    branches:
      - master
    paths:
      - 'src/**/VERSION'
jobs:
  push:
    runs-on: ubuntu-latest

    steps:
      - uses: actions/checkout@v2
        with:
          fetch-depth: 0

      - name: Log into registry
        run: echo "${{ secrets.DOCKER_PASSWORD }}" | docker login ${{ secrets.DOCKER_REGISTRY_URL }} -u ${{ secrets.DOCKER_USERNAME }} --password-stdin

      - name: Build and push the images
        run: |
          for versionFilePath in $(git diff-tree --no-commit-id --name-only -r ${{ github.sha }} ${{ github.event.before }} | grep "VERSION");
          do
            folder=${versionFilePath%"/VERSION"}
            IMAGE_NAME=${folder##*/}

            tmpName="image-$RANDOM"
            docker build $folder --file $folder/Dockerfile --tag $tmpName
            IMAGE_ID=${{ secrets.DOCKER_REGISTRY_URL }}/${{ secrets.REPOSITORY }}/$IMAGE_NAME
            VERSION=$(cat $versionFilePath)

            echo IMAGE_ID=$IMAGE_ID
            echo VERSION=$VERSION

            docker tag $tmpName $IMAGE_ID:$VERSION
            docker push $IMAGE_ID:$VERSION
          done;

Once I get the values I need, I use docker build to build the image from the folder and I use the tmpName for the temporary image name. Next, I tag the temporary image name with the 'real' image name and version ($IMAGE_ID:$VERSION) and push that same image to the registry.

One thing I forgot to mention is the fetch-depth setting on the checkout action:

- uses: actions/checkout@v2
  with:
    fetch-depth: 0

For a while, I couldn't get the diff-tree command to work and it took a bit to figure out why. The reason was the fetch-depth setting. By default, the fetch-depth is set to 1 and this translates to fetching only 1 commit (this is done to improve the performance). Changing the value to 0 (fetch all history) fixed the issue with diff-tree.


Enjoyed this post? Receive the next one in your inbox!