gzip, zipfile, tarfile 模块:处理压缩文件

In [1]:

  1. import os, shutil, glob
  2. import zlib, gzip, bz2, zipfile, tarfile

gzip

zilb 模块

zlib 提供了对字符串进行压缩和解压缩的功能:

In [2]:

  1. orginal = "this is a test string"
  2.  
  3. compressed = zlib.compress(orginal)
  4.  
  5. print compressed
  6. print zlib.decompress(compressed)
  1. x�+��,VD�����⒢̼tS��
  2. this is a test string

同时提供了两种校验和的计算方法:

In [3]:

  1. print zlib.adler32(orginal) & 0xffffffff
  1. 1407780813

In [4]:

  1. print zlib.crc32(orginal) & 0xffffffff
  1. 4236695221

gzip 模块

gzip 模块可以产生 .gz 格式的文件,其压缩方式由 zlib 模块提供。

我们可以通过 gzip.open 方法来读写 .gz 格式的文件:

In [5]:

  1. content = "Lots of content here"
  2. with gzip.open('file.txt.gz', 'wb') as f:
  3. f.write(content)

读:

In [6]:

  1. with gzip.open('file.txt.gz', 'rb') as f:
  2. file_content = f.read()
  3.  
  4. print file_content
  1. Lots of content here

将压缩文件内容解压出来:

In [7]:

  1. with gzip.open('file.txt.gz', 'rb') as f_in, open('file.txt', 'wb') as f_out:
  2. shutil.copyfileobj(f_in, f_out)

此时,目录下应有 file.txt 文件,内容为:

In [8]:

  1. with open("file.txt") as f:
  2. print f.read()
  1. Lots of content here

In [9]:

  1. os.remove("file.txt.gz")

bz2 模块

bz2 模块提供了另一种压缩文件的方法:

In [10]:

  1. orginal = "this is a test string"
  2.  
  3. compressed = bz2.compress(orginal)
  4.  
  5. print compressed
  6. print bz2.decompress(compressed)
  1. BZh91AY&SY*�v ��@"� 10"zi��FLT`�軒)„�P�˰
  2. this is a test string

zipfile 模块

产生一些 file.txt 的复制:

In [11]:

  1. for i in range(10):
  2. shutil.copy("file.txt", "file.txt." + str(i))

将这些复制全部压缩到一个 .zip 文件中:

In [12]:

  1. f = zipfile.ZipFile('files.zip','w')
  2.  
  3. for name in glob.glob("*.txt.[0-9]"):
  4. f.write(name)
  5. os.remove(name)
  6.  
  7. f.close()

解压这个 .zip 文件,用 namelist 方法查看压缩文件中的子文件名:

In [13]:

  1. f = zipfile.ZipFile('files.zip','r')
  2. print f.namelist()
  1. ['file.txt.9', 'file.txt.6', 'file.txt.2', 'file.txt.1', 'file.txt.5', 'file.txt.4', 'file.txt.3', 'file.txt.7', 'file.txt.8', 'file.txt.0']

使用 f.read(name) 方法来读取 name 文件中的内容:

In [14]:

  1. for name in f.namelist():
  2. print name, "content:", f.read(name)
  3.  
  4. f.close()
  1. file.txt.9 content: Lots of content here
  2. file.txt.6 content: Lots of content here
  3. file.txt.2 content: Lots of content here
  4. file.txt.1 content: Lots of content here
  5. file.txt.5 content: Lots of content here
  6. file.txt.4 content: Lots of content here
  7. file.txt.3 content: Lots of content here
  8. file.txt.7 content: Lots of content here
  9. file.txt.8 content: Lots of content here
  10. file.txt.0 content: Lots of content here

可以用 extract(name) 或者 extractall() 解压单个或者全部文件。

tarfile 模块

支持 .tar 格式文件的读写:

例如可以这样将 file.txt 写入:

In [15]:

  1. f = tarfile.open("file.txt.tar", "w")
  2. f.add("file.txt")
  3. f.close()

清理生成的文件:

In [16]:

  1. os.remove("file.txt")
  2. os.remove("file.txt.tar")
  3. os.remove("files.zip")

原文: https://nbviewer.jupyter.org/github/lijin-THU/notes-python/blob/master/11-useful-tools/11.06-gzip,-zipfile,-tarfile.ipynb