tarfile —- 读写tar归档文件
源代码:Lib/tarfile.py
The tarfile
module makes it possible to read and write tararchives, including those using gzip, bz2 and lzma compression.Use the zipfile
module to read or write .zip
files, or thehigher-level functions in shutil.
Some facts and figures:
reads and writes
gzip
,bz2
andlzma
compressed archivesif the respective modules are available.read/write support for the POSIX.1-1988 (ustar) format.
read/write support for the GNU tar format including longname and longlink_extensions, read-only support for all variants of the _sparse extensionincluding restoration of sparse files.
read/write support for the POSIX.1-2001 (pax) format.
handles directories, regular files, hardlinks, symbolic links, fifos,character devices and block devices and is able to acquire and restore fileinformation like timestamp, access permissions and owner.
在 3.3 版更改: Added support for lzma
compression.
tarfile.
open
(name=None, mode='r', fileobj=None, bufsize=10240, **kwargs)- Return a
TarFile
object for the pathname name. For detailedinformation onTarFile
objects and the keyword arguments that areallowed, see TarFile Objects.
mode has to be a string of the form 'filemode[:compression]'
, it defaultsto 'r'
. Here is a full list of mode combinations:
模式
动作
'r' or 'r:*'
打开和读取使用透明压缩(推荐)。
'r:'
打开和读取不使用压缩。
'r:gz'
打开和读取使用gzip 压缩。
'r:bz2'
打开和读取使用bzip2 压缩。
'r:xz'
打开和读取使用lzma 压缩。
'x'
或 'x:'
创建tarfile不进行压缩。如果文件已经存在,则抛出 FileExistsError
异常。
'x:gz'
使用gzip压缩创建tarfile。如果文件已经存在,则抛出 FileExistsError
异常。
'x:bz2'
使用bzip2 压缩创建tarfile。如果文件已经存在,则抛出 FileExistsError
异常。
'x:xz'
使用lzma 压缩创建tarfile。如果文件已经存在,则抛出 FileExistsError
异常。
'a' or 'a:'
打开以便在没有压缩的情况下追加。如果文件不存在,则创建该文件。
'w' or 'w:'
Open for uncompressed writing.
'w:gz'
Open for gzip compressed writing.
'w:bz2'
Open for bzip2 compressed writing.
'w:xz'
Open for lzma compressed writing.
Note that 'a:gz'
, 'a:bz2'
or 'a:xz'
is not possible. If mode_is not suitable to open a certain (compressed) file for reading,ReadError
is raised. Use _mode 'r'
to avoid this. If acompression method is not supported, CompressionError
is raised.
If fileobj is specified, it is used as an alternative to a file objectopened in binary mode for name. It is supposed to be at position 0.
For modes 'w:gz'
, 'r:gz'
, 'w:bz2'
, 'r:bz2'
, 'x:gz'
,'x:bz2'
, tarfile.open()
accepts the keyword argumentcompresslevel (default 9
) to specify the compression level of the file.
For special purposes, there is a second format for mode:'filemode|[compression]'
. tarfile.open()
will return a TarFile
object that processes its data as a stream of blocks. No random seeking willbe done on the file. If given, fileobj may be any object that has aread()
or write()
method (depending on the mode). _bufsize_specifies the blocksize and defaults to 20 * 512
bytes. Use this variantin combination with e.g. sys.stdin
, a socket file object or a tapedevice. However, such a TarFile
object is limited in that it doesnot allow random access, see 示例. The currentlypossible modes:
模式
动作
'r|*'
打开 tar 块的 流 以进行透明压缩读取。
'r|'
Open a stream of uncompressed tar blocksfor reading.
'r|gz'
Open a gzip compressed stream forreading.
'r|bz2'
Open a bzip2 compressed stream forreading.
'r|xz'
Open an lzma compressed stream forreading.
'w|'
Open an uncompressed stream for writing.
'w|gz'
Open a gzip compressed stream forwriting.
'w|bz2'
Open a bzip2 compressed stream forwriting.
'w|xz'
Open an lzma compressed stream forwriting.
在 3.5 版更改: 添加了 'x'
(仅创建) 模式。
在 3.6 版更改: The name parameter accepts a path-like object.
- class
tarfile.
TarFile
Class for reading and writing tar archives. Do not use this class directly:use
tarfile.open()
instead. See TarFile Objects.- Return
True
if name is a tar archive file, that thetarfile
module can read.
The tarfile
module defines the following exceptions:
- exception
tarfile.
TarError
Base class for all
tarfile
exceptions.Is raised when a tar archive is opened, that either cannot be handled by the
tarfile
module or is somehow invalid.Is raised when a compression method is not supported or when the data cannot bedecoded properly.
Is raised for the limitations that are typical for stream-like
TarFile
objects.Is raised for non-fatal errors when using
TarFile.extract()
, but only ifTarFile.errorlevel
== 2
.- Is raised by
TarInfo.frombuf()
if the buffer it gets is invalid.
The following constants are available at the module level:
tarfile.
ENCODING
- The default character encoding:
'utf-8'
on Windows, the value returned bysys.getfilesystemencoding()
otherwise.
Each of the following constants defines a tar archive format that thetarfile
module is able to create. See section Supported tar formats fordetails.
tarfile.
USTAR_FORMAT
POSIX.1-1988 (ustar) format.
GNU tar format.
POSIX.1-2001 (pax) format.
- The default format for creating archives. This is currently
PAX_FORMAT
.
在 3.8 版更改: The default format for new archives was changed toPAX_FORMAT
from GNU_FORMAT
.
参见
- Module
zipfile
Documentation of the
zipfile
standard module.Documentation of the higher-level archiving facilities provided by thestandard
shutil
module.- Documentation for tar archive files, including GNU tar extensions.
TarFile Objects
The TarFile
object provides an interface to a tar archive. A tararchive is a sequence of blocks. An archive member (a stored file) is made up ofa header block followed by data blocks. It is possible to store a file in a tararchive several times. Each archive member is represented by a TarInfo
object, see TarInfo Objects for details.
A TarFile
object can be used as a context manager in a with
statement. It will automatically be closed when the block is completed. Pleasenote that in the event of an exception an archive opened for writing will notbe finalized; only the internally used file object will be closed. See the示例 section for a use case.
3.2 新版功能: Added support for the context management protocol.
- class
tarfile.
TarFile
(name=None, mode='r', fileobj=None, format=DEFAULT_FORMAT, tarinfo=TarInfo, dereference=False, ignore_zeros=False, encoding=ENCODING, errors='surrogateescape', pax_headers=None, debug=0, errorlevel=0) - All following arguments are optional and can be accessed as instance attributesas well.
name is the pathname of the archive. name may be a path-like object.It can be omitted if fileobj is given.In this case, the file object's name
attribute is used if it exists.
mode is either 'r'
to read from an existing archive, 'a'
to appenddata to an existing file, 'w'
to create a new file overwriting an existingone, or 'x'
to create a new file only if it does not already exist.
If fileobj is given, it is used for reading or writing data. If it can bedetermined, mode is overridden by fileobj's mode. fileobj will be usedfrom position 0.
注解
fileobj is not closed, when TarFile
is closed.
format controls the archive format for writing. It must be one of the constantsUSTAR_FORMAT
, GNU_FORMAT
or PAX_FORMAT
that aredefined at module level. When reading, format will be automatically detected, evenif different formats are present in a single archive.
The tarinfo argument can be used to replace the default TarInfo
classwith a different one.
If dereference is False
, add symbolic and hard links to the archive. If itis True
, add the content of the target files to the archive. This has noeffect on systems that do not support symbolic links.
If ignore_zeros is False
, treat an empty block as the end of the archive.If it is True
, skip empty (and invalid) blocks and try to get as many membersas possible. This is only useful for reading concatenated or damaged archives.
debug can be set from 0
(no debug messages) up to 3
(all debugmessages). The messages are written to sys.stderr
.
If errorlevel is 0
, all errors are ignored when using TarFile.extract()
.Nevertheless, they appear as error messages in the debug output, when debuggingis enabled. If 1
, all fatal errors are raised as OSError
exceptions. If 2
, all non-fatal errors are raised as TarError
exceptions as well.
The encoding and errors arguments define the character encoding to beused for reading or writing the archive and how conversion errors are goingto be handled. The default settings will work for most users.See section Unicode issues for in-depth information.
The pax_headers argument is an optional dictionary of strings whichwill be added as a pax global header if format is PAX_FORMAT
.
在 3.2 版更改: Use 'surrogateescape'
as the default for the errors argument.
在 3.5 版更改: 添加了 'x'
(仅创建) 模式。
在 3.6 版更改: The name parameter accepts a path-like object.
- classmethod
TarFile.
open
(…) Alternative constructor. The
tarfile.open()
function is actually ashortcut to this classmethod.- Return a
TarInfo
object for member name. If name can not be foundin the archive,KeyError
is raised.
注解
If a member occurs more than once in the archive, its last occurrence is assumedto be the most up-to-date version.
TarFile.
getmembers
()Return the members of the archive as a list of
TarInfo
objects. Thelist has the same order as the members in the archive.Return the members as a list of their names. It has the same order as the listreturned by
getmembers()
.- Print a table of contents to
sys.stdout
. If verbose isFalse
,only the names of the members are printed. If it isTrue
, outputsimilar to that of ls -l is produced. If optional members isgiven, it must be a subset of the list returned bygetmembers()
.
在 3.5 版更改: Added the members parameter.
TarFile.
next
()Return the next member of the archive as a
TarInfo
object, whenTarFile
is opened for reading. ReturnNone
if there is no moreavailable.TarFile.
extractall
(path=".", members=None, *, numeric_owner=False)- Extract all members from the archive to the current working directory ordirectory path. If optional members is given, it must be a subset of thelist returned by
getmembers()
. Directory information like owner,modification time and permissions are set after all members have been extracted.This is done to work around two problems: A directory's modification time isreset each time a file is created in it. And, if a directory's permissions donot allow writing, extracting files to it will fail.
If numeric_owner is True
, the uid and gid numbers from the tarfileare used to set the owner/group for the extracted files. Otherwise, the namedvalues from the tarfile are used.
警告
Never extract archives from untrusted sources without prior inspection.It is possible that files are created outside of path, e.g. membersthat have absolute filenames starting with "/"
or filenames with twodots ".."
.
在 3.5 版更改: Added the numeric_owner parameter.
在 3.6 版更改: The path parameter accepts a path-like object.
TarFile.
extract
(member, path="", set_attrs=True, *, numeric_owner=False)- Extract a member from the archive to the current working directory, using itsfull name. Its file information is extracted as accurately as possible. member_may be a filename or a
TarInfo
object. You can specify a differentdirectory using _path. path may be a path-like object.File attributes (owner, mtime, mode) are set unless set_attrs is false.
If numeric_owner is True
, the uid and gid numbers from the tarfileare used to set the owner/group for the extracted files. Otherwise, the namedvalues from the tarfile are used.
注解
The extract()
method does not take care of several extraction issues.In most cases you should consider using the extractall()
method.
警告
See the warning for extractall()
.
在 3.2 版更改: Added the set_attrs parameter.
在 3.5 版更改: Added the numeric_owner parameter.
在 3.6 版更改: The path parameter accepts a path-like object.
TarFile.
extractfile
(member)- Extract a member from the archive as a file object. member may be a filenameor a
TarInfo
object. If member is a regular file or a link, anio.BufferedReader
object is returned. Otherwise,None
isreturned.
在 3.3 版更改: Return an io.BufferedReader
object.
TarFile.
add
(name, arcname=None, recursive=True, *, filter=None)- Add the file name to the archive. name may be any type of file(directory, fifo, symbolic link, etc.). If given, arcname specifies analternative name for the file in the archive. Directories are addedrecursively by default. This can be avoided by setting recursive to
False
. Recursion adds entries in sorted order.If filter is given, itshould be a function that takes aTarInfo
object argument andreturns the changedTarInfo
object. If it instead returnsNone
theTarInfo
object will be excluded from thearchive. See 示例 for an example.
在 3.2 版更改: Added the filter parameter.
在 3.7 版更改: Recursion adds entries in sorted order.
TarFile.
addfile
(tarinfo, fileobj=None)Add the
TarInfo
object tarinfo to the archive. If fileobj is given,it should be a binary file, andtarinfo.size
bytes are read from it and added to the archive. You cancreateTarInfo
objects directly, or by usinggettarinfo()
.- Create a
TarInfo
object from the result ofos.stat()
orequivalent on an existing file. The file is either named by name, orspecified as a file objectfileobj with a file descriptor.name may be a path-like object. Ifgiven, arcname specifies an alternative name for the file in thearchive, otherwise, the name is taken from fileobj’sname
attribute, or the name argument. The nameshould be a text string.
You can modifysome of the TarInfo
’s attributes before you add it using addfile()
.If the file object is not an ordinary file object positioned at thebeginning of the file, attributes such as size
may needmodifying. This is the case for objects such as GzipFile
.The name
may also be modified, in which case _arcname_could be a dummy string.
在 3.6 版更改: The name parameter accepts a path-like object.
TarFile.
close
()Close the
TarFile
. In write mode, two finishing zero blocks areappended to the archive.- A dictionary containing key-value pairs of pax global headers.
TarInfo Objects
A TarInfo
object represents one member in a TarFile
. Asidefrom storing all required attributes of a file (like file type, size, time,permissions, owner etc.), it provides some useful methods to determine its type.It does not contain the file's data itself.
TarInfo
objects are returned by TarFile
's methodsgetmember()
, getmembers()
and gettarinfo()
.
- class
tarfile.
TarInfo
(name="") Create a
TarInfo
object.- Create and return a
TarInfo
object from string buffer buf.
Raises HeaderError
if the buffer is invalid.
- classmethod
TarInfo.
fromtarfile
(tarfile) Read the next member from the
TarFile
object tarfile and return it asaTarInfo
object.TarInfo.
tobuf
(format=DEFAULT_FORMAT, encoding=ENCODING, errors='surrogateescape')- Create a string buffer from a
TarInfo
object. For information on thearguments see the constructor of theTarFile
class.
在 3.2 版更改: Use 'surrogateescape'
as the default for the errors argument.
A TarInfo
object has the following public data attributes:
TarInfo.
name
Name of the archive member.
Size in bytes.
上次修改的时间。
Permission bits.
File type. type is usually one of these constants:
REGTYPE
,AREGTYPE
,LNKTYPE
,SYMTYPE
,DIRTYPE
,FIFOTYPE
,CONTTYPE
,CHRTYPE
,BLKTYPE
,GNUTYPE_SPARSE
. To determine the type of aTarInfo
objectmore conveniently, use theis*()
methods below.Name of the target file name, which is only present in
TarInfo
objectsof typeLNKTYPE
andSYMTYPE
.User ID of the user who originally stored this member.
Group ID of the user who originally stored this member.
User name.
Group name.
- A dictionary containing key-value pairs of an associated pax extended header.
A TarInfo
object also provides some convenient query methods:
TarInfo.
isfile
()Return
True
if theTarinfo
object is a regular file.Same as
isfile()
.Return
True
if it is a directory.Return
True
if it is a symbolic link.Return
True
if it is a hard link.Return
True
if it is a character device.Return
True
if it is a block device.Return
True
if it is a FIFO.- Return
True
if it is one of character device, block device or FIFO.
命令行界面
3.4 新版功能.
The tarfile
module provides a simple command-line interface to interactwith tar archives.
If you want to create a new tar archive, specify its name after the -c
option and then list the filename(s) that should be included:
- $ python -m tarfile -c monty.tar spam.txt eggs.txt
Passing a directory is also acceptable:
- $ python -m tarfile -c monty.tar life-of-brian_1979/
If you want to extract a tar archive into the current directory, usethe -e
option:
- $ python -m tarfile -e monty.tar
You can also extract a tar archive into a different directory by passing thedirectory's name:
- $ python -m tarfile -e monty.tar other-dir/
For a list of the files in a tar archive, use the -l
option:
- $ python -m tarfile -l monty.tar
命令行选项
-l
<tarfile>
—list
<tarfile>
List files in a tarfile.
—create
<tarfile> <source1> … <sourceN>
Create tarfile from source files.
—extract
<tarfile> [<output_dir>]
Extract tarfile into the current directory if output_dir is not specified.
—test
<tarfile>
Test whether the tarfile is valid or not.
- Verbose output.
示例
How to extract an entire tar archive to the current working directory:
- import tarfile
- tar = tarfile.open("sample.tar.gz")
- tar.extractall()
- tar.close()
How to extract a subset of a tar archive with TarFile.extractall()
usinga generator function instead of a list:
- import os
- import tarfile
- def py_files(members):
- for tarinfo in members:
- if os.path.splitext(tarinfo.name)[1] == ".py":
- yield tarinfo
- tar = tarfile.open("sample.tar.gz")
- tar.extractall(members=py_files(tar))
- tar.close()
How to create an uncompressed tar archive from a list of filenames:
- import tarfile
- tar = tarfile.open("sample.tar", "w")
- for name in ["foo", "bar", "quux"]:
- tar.add(name)
- tar.close()
The same example using the with
statement:
- import tarfile
- with tarfile.open("sample.tar", "w") as tar:
- for name in ["foo", "bar", "quux"]:
- tar.add(name)
How to read a gzip compressed tar archive and display some member information:
- import tarfile
- tar = tarfile.open("sample.tar.gz", "r:gz")
- for tarinfo in tar:
- print(tarinfo.name, "is", tarinfo.size, "bytes in size and is", end="")
- if tarinfo.isreg():
- print("a regular file.")
- elif tarinfo.isdir():
- print("a directory.")
- else:
- print("something else.")
- tar.close()
How to create an archive and reset the user information using the _filter_parameter in TarFile.add()
:
- import tarfile
- def reset(tarinfo):
- tarinfo.uid = tarinfo.gid = 0
- tarinfo.uname = tarinfo.gname = "root"
- return tarinfo
- tar = tarfile.open("sample.tar.gz", "w:gz")
- tar.add("foo", filter=reset)
- tar.close()
Supported tar formats
There are three tar formats that can be created with the tarfile
module:
The POSIX.1-1988 ustar format (
USTAR_FORMAT
). It supports filenamesup to a length of at best 256 characters and linknames up to 100 characters.The maximum file size is 8 GiB. This is an old and limited but widelysupported format.The GNU tar format (
GNU_FORMAT
). It supports long filenames andlinknames, files bigger than 8 GiB and sparse files. It is the de factostandard on GNU/Linux systems.tarfile
fully supports the GNU tarextensions for long names, sparse file support is read-only.The POSIX.1-2001 pax format (
PAX_FORMAT
). It is the most flexibleformat with virtually no limits. It supports long filenames and linknames, largefiles and stores pathnames in a portable way. Modern tar implementations,including GNU tar, bsdtar/libarchive and star, fully support extended pax_features; some old or unmaintained libraries may not, but should treat_pax archives as if they were in the universally-supported ustar format.It is the current default format for new archives.
It extends the existing ustar format with extra headers for informationthat cannot be stored otherwise. There are two flavours of pax headers:Extended headers only affect the subsequent file header, globalheaders are valid for the complete archive and affect all following files.All the data in a pax header is encoded in UTF-8 for portability reasons.
There are some more variants of the tar format which can be read, but notcreated:
The ancient V7 format. This is the first tar format from Unix Seventh Edition,storing only regular files and directories. Names must not be longer than 100characters, there is no user/group name information. Some archives havemiscalculated header checksums in case of fields with non-ASCII characters.
The SunOS tar extended format. This format is a variant of the POSIX.1-2001pax format, but is not compatible.
Unicode issues
The tar format was originally conceived to make backups on tape drives with themain focus on preserving file system information. Nowadays tar archives arecommonly used for file distribution and exchanging archives over networks. Oneproblem of the original format (which is the basis of all other formats) isthat there is no concept of supporting different character encodings. Forexample, an ordinary tar archive created on a UTF-8 system cannot be readcorrectly on a Latin-1 system if it contains non-ASCII characters. Textualmetadata (like filenames, linknames, user/group names) will appear damaged.Unfortunately, there is no way to autodetect the encoding of an archive. Thepax format was designed to solve this problem. It stores non-ASCII metadatausing the universal character encoding UTF-8.
The details of character conversion in tarfile
are controlled by theencoding and errors keyword arguments of the TarFile
class.
encoding defines the character encoding to use for the metadata in thearchive. The default value is sys.getfilesystemencoding()
or 'ascii'
as a fallback. Depending on whether the archive is read or written, themetadata must be either decoded or encoded. If encoding is not setappropriately, this conversion may fail.
The errors argument defines how characters are treated that cannot beconverted. Possible values are listed in section 错误处理方案.The default scheme is 'surrogateescape'
which Python also uses for itsfile system calls, see 文件名,命令行参数,以及环境变量。.
For PAX_FORMAT
archives (the default), encoding is generally not neededbecause all the metadata is stored using UTF-8. encoding is only used inthe rare cases when binary pax headers are decoded or when strings withsurrogate characters are stored.