Building & deploying a versioned documentation site with werf

Published in

werf blog

16 min readFeb 21, 2020

Please note this post was moved to a new blog: https://blog.werf.io/ — follow it if you want to stay in touch with the project’s news!

You may already know our GitOps tool called werf — we have discussed it in several of our articles. Today, we would like to share our experience in building & deploying the site with our tool’s documentation, werf.io. Despite it is a regular static site, the building process is noteworthy as we use a dynamic number of artifacts.

Intro: How the site works

First of all, the werf documentation is stored along with its code — a crucial moment. It imposes certain limitations on development (though they fall outside the scope of this article). At a minimum, we can say that:

The release of new werf features must coincide with respective updates of the documentation, and vice versa, the update of the documentation means that the new version of werf is released;
werf is being developed quite intensively: often, new versions emerge several times per day;
Any manual activity regarding the deployment of the site with the new version of documentation shall be deemed as tedious and unwelcome;
The project uses a semantic versioning approach with five stability channels. The release process involves a successful, sequential passage of the code through channels while increasing its stability from alpha to rock-solid;
The site has another language (Russian) version, which is maintained simultaneously with the main one (English).

To improve the user experience and make the whole process convenient, we have developed a dedicated tool for installing and updating werf called multiwerf. You only have to specify the release number and the preferred stability channel — multiwerf will take care of the rest. It will check if the new version is available in the channel and download it if necessary.

The latest version of werf in each channel is available in the version selection menu on the site. By default, the https://werf.io/documentation/ page corresponds to the most stable version for the latest release (so it is indexed by search engines). The documentation for a particular channel is available at individual addresses (for example, https://werf.io/v1.0-beta/documentation/ for the beta release 1.0).

So, the site has the following versions:

root version (it is displayed by default),
individual version for every active channel of each release (e.g., https://werf.io/v1.0-beta/).

Generally, to generate a specific version of the site, you have to compile it with Jekyll by running jekyll build in the /docs directory of werf repository. Of course, before that, you have to switch to a Git tag of the required version.

Several supplementary notes:

We use werf itself for the build process;
CI/CD processes are based on GitLab CI;
All these operations run in Kubernetes.

Tasks

Now, it is time to define tasks that take into account all the particularities described above:

The documentation on the site must be automatically updated if there are newer versions in any of the werf release channels.
For development, there must be the ability to browse preliminary versions of the site.

Recompilation of the site must be performed if there are changes in any channel with the corresponding Git tags. However, the process of building an image would be associated with the following intricacies:

Since the range of versions in channels can change, it is only necessary to recompile the documentation for channels where there are newer versions (it doesn’t make sense to recompile everything).
The set of channels for releases can change. For example, at some point, there might be no version more stable than the early-access 1.1. But over time, there will be such versions, and we do not want to intervene manually in the building process in this case.

It turns out the build process depends on constantly evolving external factors.

Implementation

Choosing an approach

As an option, we could run every required version as a separate pod in Kubernetes. Such an approach means a larger number of objects in the Kubernetes cluster. That number will increase with a growing amount of stable releases of werf. And this, in turn, implies more troublesome maintenance: each version requires its own HTTP server with a relatively low load. Obviously, it entails higher resource requirements and costs.

We have chosen another way when all required versions are built in the single image. The compiled static documentation for all versions of the site sits in the NGINX container. Traffic to the corresponding Deployment comes through NGINX Ingress. Such a simple structure (the stateless application) makes it easy to scale the Deployment (depending on the load) by means of the Kubernetes itself.

To be exact, we build two images: the first one for the production environment, the second (additional) — for the development environment. The additional image is run jointly with the main image but in the development environment only. It contains a version of the site based on the review commit, and we use Ingress resources for routing between them.

Werf vs git clone and artifacts

As we’ve mentioned above, to generate static content for the specific version of the documentation, you have to perform the build by switching to the corresponding repository tag. Another approach is to clone the repository each time during the build while selecting relevant tags from the list along the way. However, it is a rather resource-intensive operation. Besides, it requires writing quite non-trivial instructions. Another major drawback of such an approach is the lack of ability to cache something during building.

And at that precise moment, werf comes to the rescue. It supports smart caching and external repositories. Using werf to include code from the repository will accelerate the build process considerably, given that it clones the repository just once, and then uses fetch when necessary. Furthermore, when adding data from the repository, we can select only the required directories (the docs folder in our case), which will considerably reduce the amount of data being added.

Since Jekyll is intended for compiling static content, it doesn’t make any sense to include it in the final image. Thus, it would be logical to perform a compilation within the werf artifact and then import results into the final image.

Creating werf.yaml

So, we’ve decided to compile each version in a separate werf artifact. However, we still do not know how many artifacts we will need during the building, therefore we cannot create a single, unified build configuration (strictly speaking, we can, but it would be inefficient).

werf supports Go templates in the werf.yaml configuration file. They allow us to generate the configuration “on the fly” depending on external data (that’s what we need!). In our case, the external data consists of information about versions and releases. We use it as the basis to build the necessary number of artifacts and get two images for two different environments in the end: werf-doc and werf-dev.

External data is passed through environment variables. Here is the list of environment variables with their description:

RELEASES — a string with a set of releases and the corresponding current version of werf in the form of a list of values, separated by a space, according to the following pattern: <RELEASE_NUMBER>%<VERSION_NUMBER>. For example: 1.0%v1.0.4-beta.20
CHANNELS= — a string with a set of channels and the corresponding current version of werf in the form of a list of values, separated by a space, according to the following pattern: <CHANNEL>%<VERSION_NUMBER>. For example: 1.0-beta%v1.0.4-beta.20 1.0-alpha%v1.0.5-alpha.22
ROOT_VERSION — the version of a werf release for displaying by default on the site (sometimes you need to display the documentation for the release other than the latest one). For example: v1.0.4-beta.20
REVIEW_SHA — the hash of a review commit that serves as the basis for the testing environment.

These variables will be defined in the GitLab CI pipeline (read on to learn how…).

Firstly, for convenience’s sake, let’s define variables for Go templates by assigning them values from the environment variables:

{{ $_ := set . "WerfVersions" (cat (env "CHANNELS") (env "RELEASES") | splitList " ") }}
{{ $Root := . }}
{{ $_ := set . "WerfRootVersion" (env "ROOT_VERSION") }}
{{ $_ := set . "WerfReviewCommit" (env "REVIEW_SHA") }}

The description of the artifact for compiling static contents of the site is broadly the same for all versions (including root and dev). It makes sense to put it into a separate block by using the define function to be able to call it later with the include directive. The following arguments will be passed to the template:

Version — the version we generate (the name of the tag);
Channel — the name of the release channel for which we generate the artifact;
Commit — the hash of the commit (if we generate the artifact for the review commit);
the context.

The description of the artifact’s template:

{{- define "doc_artifact" -}}
{{- $Root := index . "Root" -}}
artifact: doc-{{ .Channel }}
from: jekyll/builder:3
mount:
- from: build_dir
  to: /usr/local/bundle
ansible:
  install:
  - shell: |
      export PATH=/usr/jekyll/bin/:$PATH
  - name: "Install Dependencies"
    shell: bundle install
    args:
      executable: /bin/bash
      chdir: /app/docs
  beforeSetup:
{{- if .Commit }}
  - shell: echo "Review SHA - {{ .Commit }}."
{{- end }}
{{- if eq .Channel "root" }}
  - name: "releases.yml HASH: {{ $Root.Files.Get "releases.yml" | sha256sum }}"
    copy:
      content: |
{{ $Root.Files.Get "releases.yml" | indent 8 }}
      dest:  /app/docs/_data/releases.yml
{{- else }}
  - file:
      path: /app/docs/_data/releases.yml
      state: touch
{{- end }}
  - file:
      path: "{{`{{ item }}`}}"
      state: directory
      mode: 0777
    with_items:
    - /app/main_site/
    - /app/ru_site/
  - file:
      dest: /app/docs/pages_ru/cli
      state: link
      src: /app/docs/pages/cli
  - shell: |
      echo -e "werfVersion: {{ .Version }}\nwerfChannel: {{ .Channel }}" > /tmp/_config_additional.yml
      export PATH=/usr/jekyll/bin/:$PATH
{{- if and (ne .Version "review") (ne .Channel "root") }}
{{- $_ := set . "BaseURL" ( printf "v%s" .Channel ) }}
{{- else if ne .Channel "root" }}
{{- $_ := set . "BaseURL" .Channel }}
{{- end }}
      jekyll build -s /app/docs  -d /app/_main_site/{{ if .BaseURL }} --baseurl /{{ .BaseURL }}{{ end }} --config /app/docs/_config.yml,/tmp/_config_additional.yml
      jekyll build -s /app/docs  -d /app/_ru_site/{{ if .BaseURL }} --baseurl /{{ .BaseURL }}{{ end }} --config /app/docs/_config.yml,/app/docs/_config_ru.yml,/tmp/_config_additional.yml
    args:
      executable: /bin/bash
      chdir: /app/docs
git:
- url: https://github.com/flant/werf.git
  to: /app/
  owner: jekyll
  group: jekyll
{{- if .Commit }}
  commit: {{ .Commit }}
{{- else }}
  tag: {{ .Version }}
{{- end }}
  stageDependencies:
    install: ['docs/Gemfile','docs/Gemfile.lock']
    beforeSetup: '**/*'
  includePaths: 'docs'
  excludePaths: '**/*.sh'
{{- end }}

The name of an artifact must be unique. It can be done, for example, by adding the name of the channel (the value of .Channel variable) as a suffix to the artifact’s name: artifact: doc-{{ .Channel }}. However, you should take into account that you will have to refer to those names when importing from artifacts.

When describing an artifact, we use werf’s mounting feature. If you specify the build_dir service directory when mounting, werf will keep the Jekyll cache between pipeline launches, which considerably speeds up the rebuilding.

As you may have noticed, we use releases.yml in the description above. It is a YAML file with release data extracted from github.com (an artifact generated by a pipeline). This file is required for compiling the site. However, in the context of our article, it is notable for the fact that its state affects the rebuilding of just one of the artifacts — the artifact for the root version of the site (other artifacts do not need it).

We implement this logic using the conditional operator if for Go templates and the {{ $Root.Files.Get "releases.yml" | sha256sum }} expression in one of the steps of the stage. Here is how it works: when building the root version’s artifact (the .Channel is set to root), the hash of the releases.yml affects the signature of an entire stage since it is a part of the name of an Ansible task (the name parameter). Thus, when the contents of the releases.yml change, the corresponding artifact is rebuilt.

Please note how we use an external repository. We add to the artifact image only the /docs directory from the werf repository, while at the same time, depending on the parameters passed, the data for the required tag or a review commit is added.

To use an artifact template to generate descriptions of artifacts for various versions and channels, we set up a loop (using .WerfVersions variable) in the werf.yaml:

{{ range .WerfVersions -}}
{{ $VersionsDict := splitn "%" 2 . -}}
{{ dict "Version" $VersionsDict._1 "Channel" $VersionsDict._0 "Root" $Root | include "doc_artifact" }}
---
{{ end -}}

Since the loop will generate several artifacts (at least we hope it will), we have to take into account the delimiter between them, the sequence of dashes --- (you can learn more about the syntax of the configuration file in the documentation). As we have previously stated, we pass parameters of the version, URL, and the root context when calling the template in the loop.

Similarly, let’s call the artifact template (this time outside of the loop) for specific cases: for the root version and the version based on the review commit:

{{ dict "Version" .WerfRootVersion "Channel" "root" "Root" $Root  | include "doc_artifact" }}
---
{{- if .WerfReviewCommit }}
{{ dict "Version" "review" "Channel" "review" "Commit" .WerfReviewCommit "Root" $Root  | include "doc_artifact" }}
{{- end }}

Please note that the artifact for the review commit will only be built if the .WerfReviewCommit variable is set.

Okay, artifacts are ready. Time to import them!

The resulting image intended for running in Kubernetes is a regular NGINX with the integrated nginx.conf server configuration file and static content from artifacts. In addition to the artifact for the root version, we need to repeat the loop for the .WerfVersions variable to import artifacts for channels’ and releases’ versions (and to adhere to the rule for naming artifacts that we’ve outlined above). Since each artifact includes versions of the site for both languages, we import them at paths defined in the configuration.

Here is the description for the werf-doc resulting image:

image: werf-doc
from: nginx:stable-alpine
ansible:
  setup:
  - name: "Setup /etc/nginx/nginx.conf"
    copy:
      content: |
{{ .Files.Get ".werf/nginx.conf" | indent 8 }}
      dest: /etc/nginx/nginx.conf
  - file:
      path: "{{`{{ item }}`}}"
      state: directory
      mode: 0777
    with_items:
    - /app/main_site/assets
    - /app/ru_site/assets
import:
- artifact: doc-root
  add: /app/_main_site
  to: /app/main_site
  before: setup
- artifact: doc-root
  add: /app/_ru_site
  to: /app/ru_site
  before: setup
{{ range .WerfVersions -}}
{{ $VersionsDict := splitn "%" 2 . -}}
{{ $Channel := $VersionsDict._0 -}}
{{ $Version := $VersionsDict._1 -}}
- artifact: doc-{{ $Channel }}
  add: /app/_main_site
  to: /app/main_site/v{{ $Channel }}
  before: setup
{{ end -}}
{{ range .WerfVersions -}}
{{ $VersionsDict := splitn "%" 2 . -}}
{{ $Channel := $VersionsDict._0 -}}
{{ $Version := $VersionsDict._1 -}}
- artifact: doc-{{ $Channel }}
  add: /app/_ru_site
  to: /app/ru_site/v{{ $Channel }}
  before: setup
{{ end -}}

The additional image that we run in the dev environment contains only two versions of the site: the review commit and root (if you recall, they have shared assets and release data). Thus, the additional image differs from the main one only by the import section (and by the name, obviously).

image: werf-dev
...
import:
- artifact: doc-root
  add: /app/_main_site
  to: /app/main_site
  before: setup
- artifact: doc-root
  add: /app/_ru_site
  to: /app/ru_site
  before: setup
{{- if .WerfReviewCommit  }}
- artifact: doc-review
  add: /app/_main_site
  to: /app/main_site/review
  before: setup
- artifact: doc-review
  add: /app/_ru_site
  to: /app/ru_site/review
  before: setup
{{- end }}

As previously stated, the artifact for the review commit will be generated if the REVIEW_SHA environment variable is set. It is possible to avoid generating the werf-dev image if there is no REVIEW_SHA environment variable. But we’ll leave it as it is to simplify the structure of the pipeline and to ensure the regular operation of policies for cleaning up Docker images in case of werf-dev. Thus, werf-dev will be generated with the artifact from the root version only (it has been built already).

The build is ready! Let’s move on to CI/CD and essential points.

Pipeline in GitLab CI and intricacies of dynamic building

Before starting the building process, we need to set the environment variables used in werf.yaml. This does not apply to the REVIEW_SHA variable — we’ll set it when the pipeline will be called by the GitHub hook.

Let’s put the generation of the necessary external data into the generate_artifacts Bash script. It will be generating two GitLab pipeline artifacts:

releases.yml file with release data,
common_envs.sh file with environment variables for the export needs.

A sample generate_artifacts file is available in our examples repository. Obtaining data itself is outside the scope of this article, but the common_envs.sh is essential to us since werf operation depends on it. Here is an example of its contents:

export RELEASES='1.0%v1.0.6-4'
export CHANNELS='1.0-alpha%v1.0.7-1 1.0-beta%v1.0.7-1 1.0-ea%v1.0.6-4 1.0-stable%v1.0.6-4 1.0-rock-solid%v1.0.6-4'
export ROOT_VERSION='v1.0.6-4'

As a matter of fact, you can use the output of such a script with the source Bash function.

And now to the most fascinating part. For both building and deploying processes to run correctly, we have to make sure that werf.yaml is the same within the single pipeline, at least. If we do not meet this condition, stage signatures that werf calculates when building, and, for example, deploying, will be different. It will lead to an error during the deployment, as the required image will be missing.

In other words, if the information about releases and versions will change after the building of the site image is complete (say, a new version will be released, and environment variables will get other values), then the deployment will fail since an artifact for the new version is not yet ready.

If generating werf.yaml depends on external data (for example, on the list of current versions, as in our case), then the composition and values of such data must be kept unchanged within the pipeline. It is especially true if external parameters change quite often.

We will be collecting and making a snapshot of external data during the first stage of the pipeline in the GitLab (Prebuild) and then transfer them further as a GitLab CI artifact. It will allow us to start and restart the pipeline’s tasks (build, deploy, clean up) with unchanged werf.yaml configuration.

Here is an example of a Prebuild stage in the .gitlab-ci.yml file:

Prebuild:
  stage: prebuild
  script:
    - bash ./generate_artifacts 1> common_envs.sh
    - cat ./common_envs.sh
  artifacts:
    paths:
      - releases.yml
      - common_envs.sh
    expire_in: 2 week

Okay, we have made a snapshot of data and put it into an artifact. Now we can build and deploy using regular pipeline stages in the GitLab CI: Build and Deploy. The pipeline itself is started by the hooks from the werf GitHub repository (i.e., in response to changes in the GitHub repository). The relevant data is available in the GitLab project properties in the CI / CD Settings -> Pipeline triggers section. Now we just need to create a Webhook on the GitHub (Settings -> Webhooks).

Here’s how the Build stage will look like:

Build:
  stage: build
  script:
    - type multiwerf && . $(multiwerf use 1.0 alpha --as-file)
    - type werf && source <(werf ci-env gitlab --tagging-strategy tag-or-branch --verbose)
    - source common_envs.sh
    - werf build-and-publish --stages-storage :local
  except:
    refs:
      - schedules
  dependencies:
    - Prebuild

GitLab would add two artifacts from the Prebuild stage to the build stage, that’s why we export variables with the prepared input data using source common_envs.sh. We’re starting the build stage in all cases except for running the pipeline on schedule. It is because we will be running the on-schedule pipeline for cleaning up (in this case, we do not need to build anything).

Let’s define two deploying tasks for the deployment stage: one for the production and another for the development environment. We’ll do that with a YAML template:

.base_deploy: &base_deploy
  stage: deploy
  script:
    - type multiwerf && . $(multiwerf use 1.0 alpha --as-file)
    - type werf && source <(werf ci-env gitlab --tagging-strategy tag-or-branch --verbose)
    - source common_envs.sh
    - werf deploy --stages-storage :local
  dependencies:
    - Prebuild
  except:
    refs:
      - schedulesDeploy to Production:
  <<: *base_deploy
  variables:
    WERF_KUBE_CONTEXT: prod
  environment:
    name: production
    url: werf.io
  only:
    refs:
      - master
  except:
    variables:
      - $REVIEW_SHA
    refs:
      - schedulesDeploy to Test:
  <<: *base_deploy
  variables:
    WERF_KUBE_CONTEXT: dev
  environment:
    name: test
    url: werf.test.flant.com
  except:
    refs:
      - schedules
  only:
    variables:
      - $REVIEW_SHA

In essence, tasks differ in the cluster context only (WERF_KUBE_CONTEXT) to which werf will be deploying to, and in setting environment variables (environment.name and environment.url) for each environment (later they will be used in the Helm chart templates). We skip template contents since they fall outside the scope of this article (you can find them in the example repository).

Final touch

New versions of werf are released quite often. The same goes for new images. Therefore, the Docker registry will be growing steadily. Thus, it is necessary to set up an automated cleaning of images by policies. Well, it’s easy to do. We have to:

define the cleanup stage in the .gitlab-ci.yml;
set up the periodic cleanup job;
set the environment variable with the write access token.

Let’s define the Cleanup stage in the .gitlab-ci.yml:

Cleanup:
  stage: cleanup
  script:
    - type multiwerf && . $(multiwerf use 1.0 alpha --as-file)
    - type werf && source <(werf ci-env gitlab --tagging-strategy tag-or-branch --verbose)
    - source common_envs.sh
    - docker login -u nobody -p ${WERF_IMAGES_CLEANUP_PASSWORD} ${WERF_IMAGES_REPO}
    - werf cleanup --stages-storage :local
  only:
    refs:
      - schedules

We have already seen almost all of this stuff before. For cleanup, you have to log in to the Docker registry with a token with a right to delete images (the default token for the GitLab CI job does not have such a right). You have to enter the token into GitLab in advance and assign its value to the WERF_IMAGES_CLEANUP_PASSWORD environment variable (CI/CD Settings -> Variables).

You can set up the periodic cleanup job in the CI/CD → Schedules.

That is all! From now on, the Docker Registry won’t be growing because of the accumulation of unused images.

Please note that full listings are available in the Git repository:

The result

In the end, we have created the logical and efficient build process: one artifact per each version.
This process is universal and does not require manual interventions when new werf versions are released: the documentation on the site is updated automatically.
We build two images for different environments.
The process is fast because we use caching at full capacity: when a new version of werf is released (or a GitHub hook for the review commit is called), we rebuild only the relevant artifact with the updated version.
There is no need to bear in mind the disposal of unused images: werf’s policy-based cleanup keeps the Docker Registry in order.

Conclusion

werf significantly speeds up the building process thanks to the caching of the build as well as caching external repositories.
Support for external repositories obviates the need to clone the entire repository or to invent some tricky optimization. werf uses cache or performs cloning just once and then uses fetch only when necessary.
The support for Go templates in the build configuration file (werf.yaml) allows us to define a building process that is dependent on external data.
Mounting in werf significantly speeds up the building of artifacts thanks to the cache, common to all pipelines.
werf makes cleaning of artifacts an easy task, which is especially true for dynamic builds.