6.7.8. Best practices for .orig.tar.{gz,bz2,xz} files

There are two kinds of original source tarballs: Pristine source and repackaged upstream source.

6.7.8.1. Pristine source

The defining characteristic of a pristine source tarball is that the .orig.tar.{gz,bz2,xz} file is byte-for-byte identical to a tarball officially distributed by the upstream author. 1 This makes it possible to use checksums to easily verify that all changes between Debian’s version and upstream’s are contained in the Debian diff. Also, if the original source is huge, upstream authors and others who already have the upstream tarball can save download time if they want to inspect your packaging in detail.

There are no universally accepted guidelines that upstream authors follow regarding the directory structure inside their tarball, but dpkg-source is nevertheless able to deal with most upstream tarballs as pristine source. Its strategy is equivalent to the following:

  1. It unpacks the tarball in an empty temporary directory by doing

    1. zcat path/to/packagename_upstream-version.orig.tar.gz | tar xf -
  2. If, after this, the temporary directory contains nothing but one directory and no other files, dpkg-source renames that directory to packagename-upstream-version(.orig). The name of the top-level directory in the tarball does not matter, and is forgotten.

  3. Otherwise, the upstream tarball must have been packaged without a common top-level directory (shame on the upstream author!). In this case, dpkg-source renames the temporary directory itself to packagename-upstream-version(.orig).

6.7.8.2. Repackaged upstream source

You should upload packages with a pristine source tarball if possible, but there are various reasons why it might not be possible. This is the case if upstream does not distribute the source as gzipped tar at all, or if upstream’s tarball contains non-DFSG-free material that you must remove before uploading.

In these cases the developer must construct a suitable .orig.tar.{gz,bz2,xz} file themselves. We refer to such a tarball as a repackaged upstream source. Note that a repackaged upstream source is different from a Debian-native package. A repackaged source still comes with Debian-specific changes in a separate .diff.gz or .debian.tar.{gz,bz2,xz} and still has a version number composed of upstream-version and debian-version.

There may be cases where it is desirable to repackage the source even though upstream distributes a .tar.{gz,bz2,xz} that could in principle be used in its pristine form. The most obvious is if significant space savings can be achieved by recompressing the tar archive or by removing genuinely useless cruft from the upstream archive. Use your own discretion here, but be prepared to defend your decision if you repackage source that could have been pristine.

A repackaged .orig.tar.{gz,bz2,xz}

  1. should be documented in the resulting source package. Detailed information on how the repackaged source was obtained, and on how this can be reproduced should be provided in debian/copyright. It is also a good idea to provide a get-orig-source target in your debian/rules file that repeats the process, as described in the Policy Manual, Main building script: ``debian/rules` <https://www.debian.org/doc/debian-policy/ch-source.html#s-debianrules>`__.

  2. should not contain any file that does not come from the upstream author(s), or whose contents has been changed by you. 2

  3. should, except where impossible for legal reasons, preserve the entire building and portablility infrastructure provided by the upstream author. For example, it is not a sufficient reason for omitting a file that it is used only when building on MS-DOS. Similarly, a Makefile provided by upstream should not be omitted even if the first thing your debian/rules does is to overwrite it by running a configure script.

    (Rationale: It is common for Debian users who need to build software for non-Debian platforms to fetch the source from a Debian mirror rather than trying to locate a canonical upstream distribution point).

  4. may use packagename-upstream-version+dfsg (or any atter suffix which is added to the tarball name) as the name of the top-level directory in its tarball. This makes it possible to distinguish pristine tarballs from repackaged ones.

  5. should be compressed with xz (or gzip or bzip) with maximal compression.

6.7.8.3. Changing binary files

Sometimes it is necessary to change binary files contained in the original tarball, or to add binary files that are not in it. This is fully supported when using source packages in “3.0 (quilt)” format; see the dpkg-source1 manual page for details. When using the older format “1.0”, binary files can’t be stored in the .diff.gz so you must store a uuencoded (or similar) version of the file(s) and decode it at build time in debian/rules (and move it in its official location).