Contributing to Kubeflow
How to start contributing to Kubeflow
Welcome to the Kubeflow project!
Getting started as a Kubeflow contributor
This document is the single source of truth for how to contribute to the code base.We’d love to accept your patches and contributions to this project. There arejust a few small guidelines you need to follow.
Sign the CLA
Contributions to this project must be accompanied by a Contributor License Agreement (CLA).You (or your employer) retain the copyright to your contribution.This gives us permission to use and redistribute your contributions aspart of the project. Head over to https://cla.developers.google.com/ to seeyour current agreements on file or to sign a new one.
You generally only need to submit a CLA once, so if you’ve already submitted one(even if it was for a different project), you probably don’t need to do itagain.
Follow the code of conduct
Please make sure to read and observe our Code of Conduct.
Consider participating in Kubeflow user research
Maggie Lynn, a user experience researcher, is conducting user studies to inform future developments for Kubeflow. These typically involve a one hour study session conducted online with a thank you gift for providing your feedback. As a member of the Kubeflow community, your feedback and expertise are extremely valuable to us, so if you have time in the next month, please consider participating. To gather your interest, availability, and some basic information about you, please fill out this form where you’ll find out more details about this research opportunity: https://goo.gl/forms/sv5sRo3UfsgeUEjK2
Joining the community
Follow these instructions if you want to
- Become a member of the Kubeflow GitHub org (see below)
- Become part of the Kubeflow build cop or release teams
- Be recognized as an individual or organization contributing to Kubeflow
Joining the Kubeflow GitHub Org
Before asking to join the community, we ask that you first make a small number of contributionsto demonstrate your intent to continue contributing to Kubeflow.
There are are a number of ways to contribute to Kubeflow
- Submit PRs
- File issues reporting bugs or providing feedback
- Answer questions on Slack or GitHub issues
You can use this table to see how many contributionsyou’ve made
- Note: This only counts GitHub related ways of contributing
When you are ready to join
- Send a PR adding yourself as a member in org.yaml
- After the PR is merged an admin will send you an invitation
- This is a manual process that’s generally run a couple times a week
- If a week passes without receiving an invitation reach out on kubeflow#community
Companies/organizations
If you would like your company or organization to be acknowledged for contributing toKubeflow or participating in the community (being a user counts) please send a PRadding the relevant info tomember_organizations.yaml.
If you want your employee’s GitHub contributions to be attributed to your company please ask them to setthe company field in their GitHub profile.
Community discussions
There are many ways to contribute! Join one of our communication channels,attend a community meeting, get to know the community. Read the details inour community guide.
Your first contribution
Find something to work on
Help is always welcome! For example, documentation (like the text you are readingnow) can always use improvement. There’s always code that can be clarified andvariables or functions that can be renamed or commented. There’s always a needfor more test coverage. You get the idea - if you ever see something you thinkshould be fixed, you should own it. Here is how you get started.
Starter issues
To find Kubeflow issues that make good entry points:
- Start with issues labeled good first issue. For example, see the goodfirst issues in the kubeflow/websiterepositoryfor doc updates, and in the kubeflow/kubeflowrepositoryfor updates to the core Kubeflow code.
- For issues that require deeper knowledge of one or more technical aspects,look at issues labeled help wanted. For example, see these issues in thekubeflow/kubeflowrepository
- Examine the issues in any of theKubeflow repositories.
Owners files and PR workflow
Our PR workflow is nearly identical to Kubernetes’. Most of these instructions are amodified version of Kubernetes’ contributorsand ownersguides.
Overview of OWNERS files
OWNERS files are used to designate responsibility over different parts of the Kubeflow codebase.Today, we use them to assign the reviewer and approver roles used in our two-phase codereview process. Our OWNERS files were inspired by Chromium OWNERSfiles, which in turninspired GitHub’s CODEOWNERS files.
The velocity of a project that uses code review is limited by the number of people capable ofreviewing code. The quality of a person’s code review is limited by their familiarity with the codeunder review. Our goal is to address both of these concerns through the prudent use and maintenanceof OWNERS files
OWNERS
Each directory that contains a unit of independent code or content may also contain an OWNERS file.This file applies to everything within the directory, including the OWNERS file itself, siblingfiles, and child directories.
OWNERS files are in YAML format and support the following keys:
approvers
: a list of GitHub usernames or aliases that can/approve
a PRlabels
: a list of GitHub labels to automatically apply to a PRoptions
: a map of options for how to interpret this OWNERS file, currently only one:no_parent_owners
: defaults tofalse
if not present; iftrue
, exclude parent OWNERS files.Allows the use case wherea/deep/nested/OWNERS
file preventsa/OWNERS
file from having anyeffect ona/deep/nested/bit/of/code
reviewers
: a list of GitHub usernames or aliases that are good candidates to/lgtm
a PR
All users are expected to be assignable. In GitHub terms, this means they are either collaboratorsof the repo, or members of the organization to which the repo belongs.
A typical OWNERS file looks like:
approvers:
- alice
- bob # this is a comment
reviewers:
- alice
- carol # this is another comment
- sig-foo # this is an alias
OWNERS_ALIASES
Each repo may contain at its root an OWNERS_ALIAS file.
OWNERS_ALIAS files are in YAML format and support the following keys:
aliases
: a mapping of alias name to a list of GitHub usernames
We use aliases for groups instead of GitHub Teams, because changes to GitHub Teams are notpublicly auditable.
A sample OWNERS_ALIASES file looks like:
aliases:
sig-foo:
- david
- erin
sig-bar:
- bob
- frank
GitHub usernames and aliases listed in OWNERS files are case-insensitive.
The code review process
- The author submits a PR
- Phase 0: Automation suggests reviewers and approvers for the PR
- Determine the set of OWNERS files nearest to the code being changed
- Choose at least two suggested reviewers, trying to find a unique reviewer for every leafOWNERS file, and request their reviews on the PR
- Choose suggested approvers, one from each OWNERS file, and list them in a comment on the PR
- Phase 1: Humans review the PR
- Reviewers look for general code quality, correctness, sane software engineering, style, etc.
- Anyone in the organization can act as a reviewer with the exception of the individual whoopened the PR
- If the code changes look good to them, a reviewer types
/lgtm
in a PR comment or review;if they change their mind, they/lgtm cancel
- Once a reviewer has
/lgtm
‘ed, prow(@k8s-ci-robot) applies anlgtm
label to the PR
- Phase 2: Humans approve the PR
- The PR author
/assign
’s all suggested approvers to the PR, and optionally notifiesthem (eg: “pinging @foo for approval”) - Only people listed in the relevant OWNERS files, either directly or through an alias, can actas approvers, including the individual who opened the PR
- Approvers look for holistic acceptance criteria, including dependencies with other features,forwards/backwards compatibility, API and flag definitions, etc
- If the code changes look good to them, an approver types
/approve
in a PR comment orreview; if they change their mind, they/approve cancel
- prow (@k8s-ci-robot) updates itscomment in the PR to indicate which approvers still need to approve
- Once all approvers (one from each of the previously identified OWNERS files) have approved,prow (@k8s-ci-robot) applies an
approved
label
- The PR author
Phase 3: Automation merges the PR:
If all of the following are true:
- All required labels are present (eg:
lgtm
,approved
) - Any blocking labels are missing (eg: there is no
do-not-merge/hold
,needs-rebase
)
- All required labels are present (eg:
And if any of the following are true:
- there are no presubmit prow jobs configured for this repo
- there are presubmit prow jobs configured for this repo, and they all pass after automaticallybeing re-run one last time
- Then the PR will automatically be merged
Quirks of the process
There are a number of behaviors we’ve observed that while possible are discouraged, as they goagainst the intent of this review process. Some of these could be prevented in the future, but thisis the state of today.
- An approver’s
/lgtm
is simultaneously interpreted as an/approve
- While a convenient shortcut for some, it can be surprising that the same command is interpretedin one of two ways depending on who the commenter is
- Instead, explicitly write out
/lgtm
and/approve
to help observers, or save the/lgtm
fora reviewer - This goes against the idea of having at least two sets of eyes on a PR, and may be a sign thatthere are too few reviewers (who aren’t also approver)
- Technically, anyone who is a member of the Kubeflow GitHub organization can drive-by
/lgtm
aPR- Drive-by reviews from non-members are encouraged as a way of demonstrating experience andintent to become a collaborator or reviewer
- Drive-by
/lgtm
’s from members may be a sign that our OWNERS files are too small, or that theexisting reviewers are too unresponsive - This goes against the idea of specifying reviewers in the first place, to ensure thatauthor is getting actionable feedback from people knowledgeable with the code
- Reviewers, and approvers are unresponsive
- This causes a lot of frustration for authors who often have little visibility into why theirPR is being ignored
- Many reviewers and approvers are so overloaded by GitHub notifications that @mention’ingis unlikely to get a quick response
- If an author
/assign
’s a PR, reviewers and approvers will be made aware of it ontheir PR dashboard - An author can work around this by manually reading the relevant OWNERS files,
/unassign
‘ing unresponsive individuals, and/assign
‘ing others - This is a sign that our OWNERS files are stale; pruning the reviewers and approvers listswould help with this
- Authors are unresponsive
- This costs a tremendous amount of attention as context for an individual PR is lost over time
- This hurts the project in general as its general noise level increases over time
- Instead, close PR’s that are untouched after too long (we currently have a bot do this after 90days)
Automation using OWNERS files
prow
Prow receives events from GitHub, and reacts to them. It is effectively stateless. The followingpieces of prow are used to implement the code review process above.
- cmd: tide
- per-repo configuration:
labels
: list of labels required to be present for merge (eg:lgtm
)missingLabels
: list of labels required to be missing for merge (eg:do-not-merge/hold
)reviewApprovedRequired
: defaults tofalse
; when true, require that there must be at leastone approved pull request reviewpresent for mergemerge_method
: defaults tomerge
; whensquash
orrebase
, use that merge method insteadwhen clicking a PR’s merge button- merges PR’s once they meet the appropriate criteria as configured above
- if there are any presubmit prow jobs for the repo the PR is against, they will be re-run onefinal time just prior to merge
- plugin: assign
- assigns GitHub users in response to
/assign
comments on a PR - unassigns GitHub users in response to
/unassign
comments on a PR
- assigns GitHub users in response to
- plugin: approve
- per-repo configuration:
issue_required
: defaults tofalse
; whentrue
, require that the PR description link toan issue, or that at least one approver issues a/approve no-isse
implicit_self_approve
: defaults tofalse
; whentrue
, if the PR author is in relevantOWNERS files, act as if they have implicitly/approve
’d- adds the
approved
label once an approver for each of the requiredOWNERS files has/approve
’d - comments as required OWNERS files are satisfied
- removes outdated approval status comments
- plugin: blunderbuss
- determines reviewers and requests their reviews on PR’s
- plugin: lgtm
- adds the
lgtm
label when a reviewer comments/lgtm
on a PR - the PR author may not
/lgtm
their own PR
- adds the
- pkg: k8s.io/test-infra/prow/repoowners
- parses OWNERS and OWNERS_ALIAS files
- if the
no_parent_owners
option is encountered, parent owners are excluded from havingany influence over files adjacent to or underneath of the current OWNERS file
Maintaining OWNERS files
OWNERS files should be regularly maintained.
We encourage people to self-nominate or self-remove from OWNERS files via PR’s. Ideally in the futurewe could use metrics-driven automation to assist in this process.
We should strive to:
- grow the number of OWNERS files
- add new people to OWNERS files
- ensure OWNERS files only contain org members and repo collaborators
- ensure OWNERS files only contain people are actively contributing to or reviewing the code they own
- remove inactive people from OWNERS files
Bad examples of OWNERS usage:
- directories that lack OWNERS files, resulting in too many hitting root OWNERS
- OWNERS files that have a single person as both approver and reviewer
- OWNERS files that haven’t been touched in over 6 months
- OWNERS files that have non-collaborators present
Good examples of OWNERS usage:
- there are more
reviewers
thanapprovers
- the
approvers
are not in thereviewers
section - OWNERS files that are regularly updated (at least once per release)