CI / Jenkins

The main CI infrastructure is maintained at https://jenkins.cilium.io/

Triggering Pull-Request Builds With Jenkins

To ensure that build resources are used judiciously, builds on Jenkins are manually triggered via comments on each pull-request that contain “trigger-phrases”. Only members of the Cilium GitHub organization are allowed to trigger these jobs.

Depending on the PR target branch, a specific set of jobs is marked as required, as per the Cilium CI matrix. They will be automatically featured in PR checks directly on the PR page. The following trigger phrases may be used to trigger them all at once:

PR target branch

Trigger required PR jobs

master

/test

v1.11

/test-backport-1.11

v1.10

/test-backport-1.10

v1.9

/test-backport-1.9

For master PRs: on top of /test, one may use /test-missed-k8s to trigger all non-required K8s versions on Kernel 4.9 as per the Cilium CI matrix.

For a full list of Jenkins PR jobs, see Jenkins (PR tab). Trigger phrases are configured within each job’s build triggers advanced options.

There are some feature flags based on Pull Requests labels, the list of labels are the following:

  • area/containerd: Enable containerd runtime on all Kubernetes test.

  • ci/net-next: Run tests on net-next kernel. This causes the /test target to only run on the net-next kernel. It is purely for testing on a different kernel, to merge a PR it must pass the CI without this flag.

Retrigger specific jobs

For all PRs: one may manually retrigger a specific job (e.g. in case of a flake) with the individual trigger featured directly in the PR check’s name (e.g. for K8s-1.20-kernel-4.9 (test-1.20-4.9), use /test-1.20-4.9).

This works for all displayed Jenkins tests.

Testing with race condition detection enabled

Optional non-required Jenkins are available for running the test suite with race condition detection enabled, and may be triggered using the trigger phrase /test-race.

For a full list of Jenkins PR jobs with race detection enabled, see Jenkins (Race Detection tab). Trigger phrases are configured within each job’s build triggers advanced options.

Using Jenkins for testing

Typically when running Jenkins tests via one of the above trigger phases, it will run all of the tests in that particular category. However, there may be cases where you just want to run a single test quickly on Jenkins and observe the test result. To do so, you need to update the relevant test to have a custom name, and to update the Jenkins file to focus that test. Below is an example patch that shows how this can be achieved.

  1. diff --git a/ginkgo.Jenkinsfile b/ginkgo.Jenkinsfile
  2. index ee17808748a6..637f99269a41 100644
  3. --- a/ginkgo.Jenkinsfile
  4. +++ b/ginkgo.Jenkinsfile
  5. @@ -62,10 +62,10 @@ pipeline {
  6. steps {
  7. parallel(
  8. "Runtime":{
  9. - sh 'cd ${TESTDIR}; ginkgo --focus="RuntimeValidated" --tags=integration_tests'
  10. + sh 'cd ${TESTDIR}; ginkgo --focus="XFoooo" --tags=integration_tests'
  11. },
  12. "K8s-1.9":{
  13. - sh 'cd ${TESTDIR}; K8S_VERSION=1.9 ginkgo --focus="K8sValidated" --tags=integration_tests ${FAILFAST}'
  14. + sh 'cd ${TESTDIR}; K8S_VERSION=1.9 ginkgo --focus="K8sFooooo" --tags=integration_tests ${FAILFAST}'
  15. },
  16. failFast: true
  17. )
  18. diff --git a/test/k8s/nightly.go b/test/k8s/nightly.go
  19. index 62b324619797..3f955c73a818 100644
  20. --- a/test/k8s/nightly.go
  21. +++ b/test/k8s/nightly.go
  22. @@ -466,7 +466,7 @@ var _ = Describe("NightlyExamples", func() {
  23. })
  24. - It("K8sValidated Updating Cilium stable to master", func() {
  25. + FIt("K8sFooooo K8sValidated Updating Cilium stable to master", func() {
  26. podFilter := "k8s:zgroup=testapp"
  27. //This test should run in each PR for now.

Jobs Overview

Cilium-PR-Ginkgo-Tests-Validated

Runs validated Ginkgo tests which are confirmed to be stable and have been verified. These tests must always pass.

The configuration for this job is contained within ginkgo.Jenkinsfile.

The job runs the following steps in parallel:

  • Runs the single-node e2e tests using the Docker runtime.

  • Runs the multi-node Kubernetes e2e tests against the latest default version of Kubernetes specified above.

This job can be used to run tests on custom branches. To do so, log into Jenkins and go to https://jenkins.cilium.io/job/cilium-ginkgo/configure . Then add your branch name to GitHub Organization -> cilium -> Filter by name (with wildcards) -> Include field and save changes. After you don’t need to run tests on your branch, please remove the branch from this field.

Note

It is also possible to run specific tests from this suite via test-only. The comment can contain 3 arguments: --focus which specifies which tests should be run, --kernel_version for supported kernel version (net-next, 49, 419 are possible values right now), --k8s_version for k8s version. If you want to run only one It block, you need to prepend it with a test suite and create a regex, e.g /test-only --focus="K8sDatapathConfig.*Check connectivity with automatic direct nodes routes" --k8s_version=1.18 --kernel_version=net-next will run specified test in 1.18 Kubernetes cluster running on net-next nodes. Kubernetes version defaults to 1.21, kernel version defaults to 4.19.

/test-only —focus=”K8s”

Runs all kubernetes tests

/test-only —focus=”K8sConformance”

Runs all k8s conformance tests

/test-only —focus=”K8sChaos”

Runs all k8s chaos tests

/test-only —focus=”K8sDatapathConfig”

Runs all k8s datapath configuration tests

/test-only —focus=”K8sDemos”

Runs all k8s demo tests

/test-only —focus=”K8sKubeProxyFreeMatrix”

Runs all k8s kube-proxy free matrix tests

/test-only —focus=”K8sFQDNTest”

Runs all k8s fqdn tests

/test-only —focus=”K8sHealthTest”

Runs all k8s health tests

/test-only —focus=”K8sHubbleTest”

Runs all k8s Hubble tests

/test-only —focus=”K8sIdentity”

Runs all k8s identity tests

/test-only —focus=”K8sIstioTest”

Runs all k8s Istio tests

/test-only —focus=”K8sKafkaPolicyTest”

Runs all k8s Kafka tests

/test-only —focus=”K8sPolicyTest”

Runs all k8s policy tests

/test-only —focus=”K8sServicesTest”

Runs all k8s services tests

/test-only —focus=”K8sUpdates”

Runs k8s update tests

Running Runtime test suite is still done via /test-focus command.

/test-focus Runtime

Runs all runtime tests

Note

It is not possible to run specific tests within the runtime test suite.

Cilium-PR-Ginkgo-Tests-Kernel

Runs the Kubernetes e2e tests with a 4.19 kernel. The configuration for this job is contained within ginkgo-kernel.Jenkinsfile.

Cilium-PR-Ginkgo-Tests-k8s

Runs the Kubernetes e2e tests against all Kubernetes versions that are not currently not tested as part of each pull-request, but which Cilium still supports, as well as the most-recently-released versions of Kubernetes that might not be declared stable by Kubernetes upstream. Check the contents of ginkgo-kubernetes-all.Jenkinsfile in the branch of Cilium for which you are running tests to see which Kubernetes versions will be tested against.

Ginkgo-CI-Tests-Pipeline

Ginkgo-CI-Tests-Pipeline

Packer-CI-Build

As part of Cilium development, we use a custom base box with a bunch of pre-installed libraries and tools that we need to enhance our daily workflow. That base box is built with Packer and it is hosted in the packer-ci-build GitHub repository.

New versions of this box can be created via Jenkins Packer Build, where new builds of the image will be pushed to Vagrant Cloud . The version of the image corresponds to the BUILD_ID environment variable in the Jenkins job. That version ID will be used in Cilium Vagrantfiles.

Changes to this image are made via contributions to the packer-ci-build repository. Authorized GitHub users can trigger builds with a GitHub comment on the PR containing the trigger phrase /build. In case that a new box needs to be rebased with a different branch than master, authorized developers can run the build with custom parameters. To use a different Cilium branch in the job go to Build with parameters and a base branch can be set as the user needs.

This box will need to be updated when a new developer needs a new dependency that is not installed in the current version of the box, or if a dependency that is cached within the box becomes stale.

Make sure that you update vagrant box versions in vagrant_box_defaults.rb after new box is built and tested.

Once you change the image versions locally, create a branch named pr/update-packer-ci-build and open a PR github.com/cilium/cilium. It is important that you use that branch name so the VM images are cached into packet.net before the branch is merged.

Once this PR is merged, ask Cilium’s CI team to ensure:

  1. The autoscaler provisioning code is up to date.

  2. That all Jenkins nodes are scaled down and then back up.

Testing matrix

Up to date CI testing information regarding k8s - kernel version pairs can always be found in the Cilium CI matrix.

CI Failure Triage

This section describes the process to triage CI failures. We define 3 categories:

Keyword

Description

Flake

Failure due to a temporary situation such as loss of connectivity to external services or bug in system component, e.g. quay.io is down, VM race conditions, kube-dns bug, …

CI-Bug

Bug in the test itself that renders the test unreliable, e.g. timing issue when importing and missing to block until policy is being enforced before connectivity is verified.

Regression

Failure is due to a regression, all failures in the CI that are not caused by bugs in the test are considered regressions.

Pipelines subject to triage

Build/test failures for the following Jenkins pipelines must be reported as GitHub issues using the process below:

Pipeline

Description

Ginkgo-Tests-Validated-master

Runs whenever a PR is merged into master

Ginkgo-CI-Tests-Pipeline

Runs every two hours on the master branch

Vagrant-Master-Boxes-Packer-Build

Runs on merge into packer-ci-build repository.

Release-branch

Runs various Ginkgo tests on merge into branch “v1.12”

Triage process

  1. Discover untriaged Jenkins failures via the jenkins-failures.sh script. It defaults to checking the previous 24 hours but this can be modified by setting the SINCE environment variable (it is a unix timestamp). The script checks the various test pipelines that need triage.

    1. $ contrib/scripts/jenkins-failures.sh

    Note

    You can quickly assign SINCE with statements like SINCE=`date -d -3days`

  2. Investigate the failure you are interested in and determine if it is a CI-Bug, Flake, or a Regression as defined in the table above.

    1. Search GitHub issues to see if bug is already filed. Make sure to also include closed issues in your search as a CI issue can be considered solved and then re-appears. Good search terms are:

      • The test name, e.g.

        1. k8s-1.7.K8sValidatedKafkaPolicyTest Kafka Policy Tests KafkaPolicies (from (k8s-1.7.xml))
      • The line on which the test failed, e.g.

        1. github.com/cilium/cilium/test/k8s/kafka_policies.go:202
      • The error message, e.g.

        1. Failed to produce from empire-hq on topic deathstar-plan
  3. If a corresponding GitHub issue exists, update it with:

    1. A link to the failing Jenkins build (note that the build information is eventually deleted).

    2. Attach the zipfile downloaded from Jenkins with logs from the failing tests. A zipfile for all tests is also available.

    3. Check how much time has passed since the last reported occurrence of this failure and move this issue to the correct column in the CI flakes project board.

  4. If no existing GitHub issue was found, file a new GitHub issue:

    1. Attach zipfile downloaded from Jenkins with logs from failing test

    2. If the failure is a new regression or a real bug:

      1. Title: <Short bug description>

      2. Labels kind/bug and needs/triage.

    3. If failure is a new CI-Bug, Flake or if you are unsure:

      1. Title CI: <testname>: <cause>, e.g. CI: K8sValidatedPolicyTest Namespaces: cannot curl service

      2. Labels kind/bug/CI and needs/triage

      3. Include a link to the failing Jenkins build (note that the build information is eventually deleted).

      4. Attach zipfile downloaded from Jenkins with logs from failing test

      5. Include the test name and whole Stacktrace section to help others find this issue.

      6. Add issue to CI flakes project.

    Note

    Be extra careful when you see a new flake on a PR, and want to open an issue. It’s much more difficult to debug these without context around the PR and the changes it introduced. When creating an issue for a PR flake, include a description of the code change, the PR, or the diff. If it isn’t related to the PR, then it should already happen in master, and a new issue isn’t needed.

  5. Edit the description of the Jenkins build to mark it as triaged. This will exclude it from future jenkins-failures.sh output.

    1. Login -> Click on build -> Edit Build Information

    2. Add the failure type and GH issue number. Use the table describing the failure categories, at the beginning of this section, to help categorize them.

    Note

    This step can only be performed with an account on Jenkins. If you are interested in CI failure reviews and do not have an account yet, ping us on Slack.

Examples:

  • Flake, quay.io is down

  • Flake, DNS not ready, #3333

  • CI-Bug, K8sValidatedPolicyTest: Namespaces, pod not ready, #9939

  • Regression, k8s host policy, #1111

Bisect process

If you are unable to triage the issue, you may try to use bisect job to find when things went awry in Jenkins.

  1. Log in to Jenkins

  2. Go to https://jenkins.cilium.io/job/bisect-cilium/configure .

  3. Under Git Bisect build step fill in Good start revision and Bad end revision.

  4. Write description of what you are looking for under Search Identifier.

  5. Adjust Retry number and Min Successful Runs to account for current CI flakiness.

  6. Save the configuration.

  7. Click “Build Now” in https://jenkins.cilium.io/job/bisect-cilium/ .

  8. This may take over a day depending on how many underlying builds will be created. The result will be in bisect-cilium console output, actual builds will be happening in https://jenkins.cilium.io/job/cilium-revision/ job.

Infrastructure details

Logging into VM running tests

  1. If you have access to credentials for Jenkins, log into the Jenkins slave running the test workload

  2. Identify the vagrant box running the specific test

  1. $ vagrant global-status
  2. id name provider state directory
  3. -------------------------------------------------------------------------------------------------------------------------------------------------------------------------
  4. 6e68c6c k8s1-build-PR-1588-6 virtualbox running /root/jenkins/workspace/cilium_cilium_PR-1588-CWL743UTZEF6CPEZCNXQVSZVEW32FR3CMGKGY6667CU7X43AAZ4Q/tests/k8s
  5. ec5962a cilium-master-build-PR-1588-6 virtualbox running /root/jenkins/workspace/cilium_cilium_PR-1588-CWL743UTZEF6CPEZCNXQVSZVEW32FR3CMGKGY6667CU7X43AAZ4Q
  6. bfaffaa k8s2-build-PR-1588-6 virtualbox running /root/jenkins/workspace/cilium_cilium_PR-1588-CWL743UTZEF6CPEZCNXQVSZVEW32FR3CMGKGY6667CU7X43AAZ4Q/tests/k8s
  7. 3fa346c k8s1-build-PR-1588-7 virtualbox running /root/jenkins/workspace/cilium_cilium_PR-1588-CWL743UTZEF6CPEZCNXQVSZVEW32FR3CMGKGY6667CU7X43AAZ4Q@2/tests/k8s
  8. b7ded3c cilium-master-build-PR-1588-7 virtualbox running /root/jenkins/workspace/cilium_cilium_PR-1588-CWL743UTZEF6CPEZCNXQVSZVEW32FR3CMGKGY6667CU7X43AAZ4Q@2
  1. Log into the specific VM
  1. $ JOB_BASE_NAME=PR-1588 BUILD_NUMBER=6 vagrant ssh 6e68c6c

Jenkinsfiles Extensions

Cilium uses a custom Jenkins helper library to gather metadata from PRs and simplify our Jenkinsfiles. The exported methods are:

  • ispr(): return true if the current build is a PR.

  • setIfPr(string, string): return the first argument in case of a PR, if not a PR return the second one.

  • BuildIfLabel(String label, String Job): trigger a new Job if the PR has that specific Label.

  • Status(String status, String context): set pull request check status on the given context, example Status("SUCCESS", "$JOB_BASE_NAME")