熟悉代码和工具
Now that you have an issue you want to fix, enhancement to add, or documentation to improve, you need to learn how to work with GitHub and the pandas code base.
Version control, Git, and GitHub
To the new user, working with Git is one of the more daunting aspects of contributing to pandas. It can very quickly become overwhelming, but sticking to the guidelines below will help keep the process straightforward and mostly trouble free. As always, if you are having difficulties please feel free to ask for help.
The code is hosted on GitHub. To contribute you will need to sign up for a free GitHub account. We use Git for version control to allow many people to work together on the project.
Some great resources for learning Git:
- the GitHub help pages.
- the NumPy’s documentation.
- Matthew Brett’s Pydagogue.
Getting started with Git
GitHub has instructions for installing git, setting up your SSH key, and configuring git. All these steps need to be completed before you can work seamlessly between your local repository and GitHub.
Forking
You will need your own fork to work on the code. Go to the pandas project page and hit the Fork button. You will want to clone your fork to your machine:
git clone https://github.com/your-user-name/pandas.git pandas-yourname
cd pandas-yourname
git remote add upstream https://github.com/pandas-dev/pandas.git
This creates the directory pandas-yourname and connects your repository to the upstream (main project) pandas repository.
Creating a development environment
To test out code changes, you’ll need to build pandas from source, which requires a C compiler and Python environment. If you’re making documentation changes, you can skip to Contributing to the documentation but you won’t be able to build the documentation locally before pushing your changes.
Installing a C Compiler
Pandas uses C extensions (mostly written using Cython) to speed up certain operations. To install pandas from source, you need to compile these C extensions, which means you need a C compiler. This process depends on which platform you’re using. Follow the CPython contributing guidelines for getting a compiler installed. You don’t need to do any of the ./configure or make steps; you only need to install the compiler.
For Windows developers, the following links may be helpful.
- https://blogs.msdn.microsoft.com/pythonengineering/2016/04/11/unable-to-find-vcvarsall-bat/
- https://github.com/conda/conda-recipes/wiki/Building-from-Source-on-Windows-32-bit-and-64-bit
- https://cowboyprogrammer.org/building-python-wheels-for-windows/
- https://blog.ionelmc.ro/2014/12/21/compiling-python-extensions-on-windows/
- https://support.enthought.com/hc/en-us/articles/204469260-Building-Python-extensions-with-Canopy
Let us know if you have any difficulties by opening an issue or reaching out on Gitter.
Creating a Python Environment
Now that you have a C compiler, create an isolated pandas development environment:
- Install either Anaconda or miniconda
- Make sure your conda is up to date (conda update conda)
- Make sure that you have cloned the repository
- cd to the pandas source directory
We’ll now kick off a three-step process:
- Install the build dependencies
- Build and install pandas
- Install the optional dependencies
# Create and activate the build environment
conda env create -f ci/environment-dev.yaml
conda activate pandas-dev
# or with older versions of Anaconda:
source activate pandas-dev
# Build and install pandas
python setup.py build_ext --inplace -j 4
python -m pip install -e .
# Install the rest of the optional dependencies
conda install -c defaults -c conda-forge --file=ci/requirements-optional-conda.txt
At this point you should be able to import pandas from your locally built version:
$ python # start an interpreter
>>> import pandas
>>> print(pandas.__version__)
0.22.0.dev0+29.g4ad6d4d74
This will create the new environment, and not touch any of your existing environments, nor any existing Python installation.
To view your environments:
conda info -e
To return to your root environment:
conda deactivate
See the full conda docs here.
Creating a Python Environment (pip)
If you aren’t using conda for you development environment, follow these instructions. You’ll need to have at least python3.5 installed on your system.
# Create a virtual environment
# Use an ENV_DIR of your choice. We'll use ~/virtualenvs/pandas-dev
# Any parent directories should already exist
python3 -m venv ~/virtualenvs/pandas-dev
# Activate the virtulaenv
. ~/virtualenvs/pandas-dev/bin/activate
# Install the build dependencies
python -m pip install -r ci/requirements_dev.txt
# Build and install pandas
python setup.py build_ext --inplace -j 4
python -m pip install -e .
# Install additional dependencies
python -m pip install -r ci/requirements-optional-pip.txt
Creating a branch
You want your master branch to reflect only production-ready code, so create a feature branch for making your changes. For example:
git branch shiny-new-feature
git checkout shiny-new-feature
The above can be simplified to:
git checkout -b shiny-new-feature
This changes your working directory to the shiny-new-feature branch. Keep any changes in this branch specific to one bug or feature so it is clear what the branch brings to pandas. You can have many shiny-new-features and switch in between them using the git checkout command.
When creating this branch, make sure your master branch is up to date with the latest upstream master version. To update your local master branch, you can do:
git checkout master
git pull upstream master --ff-only
When you want to update the feature branch with changes in master after you created the branch, check the section on updating a PR.