Creating a Zip Archive of a Directory in Python

Introduction

When dealing with large amounts of data or files, you might find yourself needing to compress files into a more manageable format. One of the best ways to do this is by creating a zip archive.

In this article, we'll be exploring how you can create a zip archive of a directory using Python. Whether you're looking to save space, simplify file sharing, or just keep things organized, Python's zipfile module provides a to do this.

Creating a Zip Archive with Python

Python's standard library comes with a module named zipfile that provides methods for creating, reading, writing, appending, and listing contents of a ZIP file. This module is useful for creating a zip archive of a directory. We'll start by importing the zipfile and os modules:

import zipfile
import os

Now, let's create a function that will zip a directory:

def zip_directory(directory_path, zip_path):
    with zipfile.ZipFile(zip_path, 'w') as zipf:
        for root, dirs, files in os.walk(directory_path):
            for file in files:
                zipf.write(os.path.join(root, file), 
                           os.path.relpath(os.path.join(root, file), 
                                           os.path.join(directory_path, '..')))

In this function, we first open a new zip file in write mode. Then, we walk through the directory we want to zip. For each file in the directory, we use the write() method to add it to the zip file. The os.path.relpath() function is used so that we store the relative path of the file in the zip file, instead of the absolute path.

Let's test our function:

zip_directory('test_directory', 'archive.zip')

After running this code, you should see a new file named archive.zip in your current directory. This zip file contains all the files from test_directory.

Note: Be careful when specifying the paths. If the zip_path file already exists, it will be overwritten.

Python's zipfile module makes it easy to create a zip archive of a directory. With just a few lines of code, you can compress and organize your files.

In the following sections, we'll dive deeper into handling nested directories, large directories, and error handling. This may seem a bit backwards, but the above function is likely what most people came here for, so I wanted to show it first.

Using the zipfile Module

In Python, the zipfile module is the best tool for working with zip archives. It provides functions to read, write, append, and extract data from zip files. The module is part of Python's standard library, so there's no need to install anything extra.

Here's a simple example of how you can create a new zip file and add a file to it:

import zipfile

# Create a new zip file
zip_file = zipfile.ZipFile('example.zip', 'w')

# Add a file to the zip file
zip_file.write('test.txt')

# Close the zip file
zip_file.close()

In this code, we first import the zipfile module. Then, we create a new zip file named 'example.zip' in write mode ('w'). We add a file named 'test.txt' to the zip file using the write() method. Finally, we close the zip file using the close() method.

Creating a Zip Archive of a Directory

Creating a zip archive of a directory involves a bit more work, but it's still fairly easy with the zipfile module. You need to walk through the directory structure, adding each file to the zip archive.

import os
import zipfile

def zip_directory(folder_path, zip_file):
    for folder_name, subfolders, filenames in os.walk(folder_path):
        for filename in filenames:
            # Create complete filepath of file in directory
            file_path = os.path.join(folder_name, filename)
            # Add file to zip
            zip_file.write(file_path)

# Create a new zip file
zip_file = zipfile.ZipFile('example_directory.zip', 'w')

# Zip the directory
zip_directory('/path/to/directory', zip_file)

# Close the zip file
zip_file.close()

We first define a function zip_directory() that takes a folder path and a ZipFile object. It uses the os.walk() function to iterate over all files in the directory and its subdirectories. For each file, it constructs the full file path and adds the file to the zip archive.

The os.walk() function is a convenient way to traverse directories. It generates the file names in a directory tree by walking the tree either top-down or bottom-up.

Note: Be careful with the file paths when adding files to the zip archive. The write() method adds files to the archive with the exact path you provide. If you provide an absolute path, the file will be added with the full absolute path in the zip archive. This is usually not what you want. Instead, you typically want to add files with a relative path to the directory you're zipping.

In the main part of the script, we create a new zip file, call the zip_directory() function to add the directory to the zip file, and finally close the zip file.

Working with Nested Directories

When working with nested directories, the process of creating a zip archive is a bit more complicated. The first function we showed in this article actually handles this case as well, which we'll show again here:

import os
import zipfile

def zipdir(path, ziph):
    for root, dirs, files in os.walk(path):
        for file in files:
            ziph.write(os.path.join(root, file), 
                       os.path.relpath(os.path.join(root, file), 
                                       os.path.join(path, '..')))
            
zipf = zipfile.ZipFile('Python.zip', 'w', zipfile.ZIP_DEFLATED)
zipdir('/path/to/directory', zipf)
zipf.close()

The main difference is that we're actually creating the zip directory outside of the function and pass it as a parameter. Whether you do it within the function itself or not is up to personal preference.

Handling Large Directories

So what if we're dealing with a large directory? Zipping a large directory can consume a lot of memory and even crash your program if you don't take the right precautions.

Luckily, the zipfile module allows us to create a zip archive without loading all files into memory at once. By using the with statement, we can ensure that each file is closed and its memory freed after it's added to the archive.

Free eBook: Git Essentials

Check out our hands-on, practical guide to learning Git, with best-practices, industry-accepted standards, and included cheat sheet. Stop Googling Git commands and actually learn it!

import os
import zipfile

def zipdir(path, ziph):
    for root, dirs, files in os.walk(path):
        for file in files:
            with open(os.path.join(root, file), 'r') as fp:
                ziph.write(fp.read())

with zipfile.ZipFile('Python.zip', 'w', zipfile.ZIP_DEFLATED) as zipf:
    zipdir('/path/to/directory', zipf)

In this version, we're using the with statement when opening each file and when creating the zip archive. This will guarantee that each file is closed after it's read, freeing up the memory it was using. This way, we can safely zip large directories without running into memory issues.

Error Handling in zipfile

When working with zipfile in Python, we need to remember to handle exceptions so our program doesn't crash unexpectedly. The most common exceptions you might encounter are RuntimeError, ValueError, and FileNotFoundError.

Let's take a look at how we can handle these exceptions while creating a zip file:

import zipfile

try:
    with zipfile.ZipFile('example.zip', 'w') as myzip:
        myzip.write('non_existent_file.txt')
except FileNotFoundError:
    print('The file you are trying to zip does not exist.')
except RuntimeError as e:
    print('An unexpected error occurred:', str(e))
except zipfile.LargeZipFile:
    print('The file is too large to be compressed.')

FileNotFoundError is raised when the file we're trying to zip doesn't exist. RuntimeError is a general exception that might be raised for a number of reasons, so we print the exception message to understand what went wrong. zipfile.LargeZipFile is raised when the file we're trying to compress is too big.

Note: Python's zipfile module raises a LargeZipFile error when the file you're trying to compress is larger than 2 GB. If you're working with large files, you can prevent this error by calling ZipFile with the allowZip64=True argument.

Common Errors and Solutions

While working with the zipfile module, you might encounter several common errors. Let's explore some of these errors and their solutions:

FileNotFoundError

This error happens when the file or directory you're trying to zip does not exist. So always check if the file or directory exists before attempting to compress it.

IsADirectoryError

This error is raised when you're trying to write a directory to a zip file using ZipFile.write(). To avoid this, use os.walk() to traverse the directory and write the individual files instead.

PermissionError

As you probably guessed, this error happens when you don't have the necessary permissions to read the file or write to the directory. Make sure you have the correct permissions before trying to manipulate files or directories.

LargeZipFile

As mentioned earlier, this error is raised when the file you're trying to compress is larger than 2 GB. To prevent this error, call ZipFile with the allowZip64=True argument.

try:
    with zipfile.ZipFile('large_file.zip', 'w', allowZip64=True) as myzip:
        myzip.write('large_file.txt')
except zipfile.LargeZipFile:
    print('The file is too large to be compressed.')

In this snippet, we're using the allowZip64=True argument to allow zipping files larger than 2 GB.

Compressing Individual Files

With zipfile, not only can it compress directories, but it can also compress individual files. Let's say you have a file called document.txt that you want to compress. Here's how you'd do that:

import zipfile

with zipfile.ZipFile('compressed_file.zip', 'w') as myzip:
    myzip.write('document.txt')

In this code, we're creating a new zip archive named compressed_file.zip and just adding document.txt to it. The 'w' parameter means that we're opening the zip file in write mode.

Now, if you check your directory, you should see a new zip file named compressed_file.zip.

Extracting Zip Files

And finally, let's see how to reverse this zipping by extracting the files. Let's say we want to extract the document.txt file we just compressed. Here's how to do it:

import zipfile

with zipfile.ZipFile('compressed_file.zip', 'r') as myzip:
    myzip.extractall()

In this code snippet, we're opening the zip file in read mode ('r') and then calling the extractall() method. This method extracts all the files in the zip archive to the current directory.

Note: If you want to extract the files to a specific directory, you can pass the directory path as an argument to the extractall() method like so: myzip.extractall('/path/to/directory/').

Now, if you check your directory, you should see the document.txt file. That's all there is to it!

Conclusion

In this guide, we focused on creating and managing zip archives in Python. We explored the zipfile module, learned how to create a zip archive of a directory, and even dove into handling nested directories and large directories. We've also covered error handling within zipfile, common errors and their solutions, compressing individual files, and extracting zip files.

Last Updated: September 14th, 2023
Was this article helpful?

Improve your dev skills!

Get tutorials, guides, and dev jobs in your inbox.

No spam ever. Unsubscribe at any time. Read our Privacy Policy.

Project

Building Your First Convolutional Neural Network With Keras

# python# artificial intelligence# machine learning# tensorflow

Most resources start with pristine datasets, start at importing and finish at validation. There's much more to know. Why was a class predicted? Where was...

David Landup
David Landup
Details
Course

Data Visualization in Python with Matplotlib and Pandas

# python# pandas# matplotlib

Data Visualization in Python with Matplotlib and Pandas is a course designed to take absolute beginners to Pandas and Matplotlib, with basic Python knowledge, and...

David Landup
David Landup
Details

© 2013-2024 Stack Abuse. All rights reserved.

AboutDisclosurePrivacyTerms