Beginner 25 min Mac & PC

Working with Compressed Files

Learn ZIP files and other compressed formats. Essential for data sharing, backup, and working with DH datasets.

What You'll Learn:

  • Understand different compression formats and their uses
  • Create and extract archives using GUI and command line tools
  • Implement best practices for DH data archiving
  • Troubleshoot common compression problems

Compressed files (like ZIP archives) are everywhere in digital humanities work. You’ll encounter them when downloading datasets, sharing research materials, backing up projects, and working with many DH tools. This guide will help you understand and confidently work with compressed files on both Mac and PC.

Why Compression Matters in DH

Knowledge Check: DH Compression Scenarios

Which scenario would benefit MOST from file compression?

Common DH Scenarios

  • Downloading datasets: Many archives provide data as ZIP files
  • Sharing research: Email attachments have size limits; compression helps
  • Backing up projects: Compress completed work for long-term storage
  • Software installation: Many DH tools are distributed as compressed archives
  • Collaborative work: Share entire project folders easily

Benefits of Compression

  • Smaller file sizes: Save storage space and transfer time
  • Bundling: Keep related files together in one package
  • Organization: Archive completed work cleanly
  • Compatibility: ZIP files work across all platforms

Types of Compressed Files

Format Identification Quiz

You receive a file called "medieval-manuscripts.tar.gz". What can you tell about this file?

Common Formats You’ll Encounter

Format Extension Platform Notes
ZIP .zip All Most common, widely supported
RAR .rar All Good compression, requires special software
7-Zip .7z All Excellent compression ratio
Gzip .gz Mac/Linux Common for single files
Tar .tar Mac/Linux Archive format, often combined with gzip
Tar.gz .tar.gz or .tgz Mac/Linux Tar archive compressed with gzip

Platform-Specific Formats

Mac:

  • .dmg - Disk images (software installation)
  • .sit - StuffIt archives (legacy)

PC:

  • .cab - Cabinet files (Windows system files)
  • .msi - Windows installer packages

Hands-On Exercise: Creating Your First Archive

Exercise 1: Create a Research Archive

Step 1: Prepare Sample Files

Create a folder called "DH-Sample-Project" with these files:

  • README.txt - Project description
  • data.csv - Sample dataset
  • analysis.py - Python script

Step 2: Create Archive (Choose Your Platform)

Mac Method:

  1. Right-click on the "DH-Sample-Project" folder
  2. Select "Compress DH-Sample-Project"
  3. Rename to "dh-sample-project-2024.zip"

PC Method:

  1. Right-click on the "DH-Sample-Project" folder
  2. Select "Send to" β†’ "Compressed (zipped) folder"
  3. Rename to "dh-sample-project-2024.zip"

Step 3: Test Your Archive

  1. Move original folder to a different location
  2. Double-click your ZIP file to extract
  3. Verify all files are present and can be opened

Command Line Compression

Terminal Loading...
If this message persists, there may be a JavaScript issue. Check browser console for errors.

Mac/Linux Commands

Try these commands in the terminal simulator above:

# List files in current directory
ls

# Create a ZIP archive
zip research.zip *.txt

# List contents of ZIP file
unzip -l research.zip

# Extract ZIP file
unzip research.zip

# Compress a folder recursively
zip -r project.zip DH-Project/

PC PowerShell Commands

# List files
dir

# Create ZIP archive
Compress-Archive -Path "*.txt" -DestinationPath "research.zip"

# Extract ZIP file
Expand-Archive -Path "research.zip" -DestinationPath "."

Advanced Techniques

Best Practices Quiz

You're about to share a 500MB dataset with collaborators worldwide. What's the BEST approach?

Password Protection

Mac (Terminal):

# Create password-protected ZIP
zip -er secure.zip confidential-files/

# Extract password-protected ZIP
unzip secure.zip

PC (PowerShell with 7-Zip):

# Install 7-Zip first, then:
& "C:\Program Files\7-Zip\7z.exe" a -pPASSWORD secure.7z files/

Compression Levels

Higher compression = smaller files but slower processing

Mac:

# Fast compression (less compression)
zip -1 fast.zip files/

# Maximum compression
zip -9 small.zip files/

Real-World DH Scenario

Exercise 2: Digital Archive Workflow

πŸ“š Scenario

You've completed transcribing 50 Civil War letters for a digital humanities project. You need to:

  • Archive the complete project
  • Share it with your research team
  • Prepare it for long-term preservation

Step 1: Organize Project Structure

civil-war-letters/
β”œβ”€β”€ README.txt
β”œβ”€β”€ raw-scans/
β”‚   β”œβ”€β”€ letter-001.jpg
β”‚   └── letter-002.jpg
β”œβ”€β”€ transcriptions/
β”‚   β”œβ”€β”€ letter-001.txt
β”‚   └── letter-002.txt
β”œβ”€β”€ metadata/
β”‚   └── catalog.csv
└── scripts/
    └── text-analysis.py

Step 2: Create Documentation

Write a README.txt that includes:

  • Project description and dates
  • Creator contact information
  • File descriptions
  • Usage rights and citation info

Step 3: Archive with Best Practices

  1. Use descriptive filename: civil-war-letters-smith-2024-09-27.zip
  2. Test extraction on different computer
  3. Verify all files open correctly
  4. Create backup copy

Common Problems and Solutions

Problem 1: β€œArchive is Damaged/Corrupted”

Causes:

  • Incomplete download
  • Transfer error
  • Storage media failure

Solutions:

  1. Re-download the file
  2. Try different extraction tool
  3. Use repair utilities (7-Zip has repair function)

Problem 2: Can’t Extract All Files

Causes:

  • Insufficient disk space
  • File permission issues
  • Long file paths (PC)

Solutions:

  1. Check available disk space
  2. Extract to different location
  3. Run as administrator (PC)
  4. Use shorter destination path

Problem 3: Password Issues

Solutions:

  1. Check for correct capitalization
  2. Try copying/pasting password
  3. Verify with sender if password is correct

Third-Party Tools

Cross-Platform:

  • 7-Zip (free) - Handles many formats, excellent compression
  • PeaZip (free) - User-friendly interface, many formats

Mac-Specific:

  • The Unarchiver (free) - Handles many formats Finder can’t
  • Keka (free/paid) - Modern interface, good format support

PC-Specific:

  • WinRAR (paid) - Industry standard, handles RAR files

Best Practices for DH Work

1. Naming Conventions

Good: victorian-novels-corpus-2024-09-27.zip
Bad:  stuff.zip

2. Documentation

Include a README.txt file in your archives:

Contents of victorian-novels-corpus.zip
=====================================
Created: 2024-09-27
Creator: Sarah Johnson
Contact: sarah@university.edu

Contents:
/raw-texts/     - Original OCR files
/cleaned/       - Processed text files
/metadata/      - Bibliographic information
README.txt      - This file

Notes:
- All texts in UTF-8 encoding
- See metadata/sources.csv for complete bibliography

3. Test Your Archives

Always test that archives extract properly:

  1. Create the archive
  2. Extract it to a different location
  3. Verify all files are present and functional

Proficiency Check: Compression Skills

A colleague sends you "research-data.rar" but you can't open it with your system's built-in tools. What should you do?

Next Steps

Understanding compression prepares you for working with different file formats, where you’ll learn about choosing the right format for different types of DH data and ensuring compatibility across platforms and tools.


Remember: Compression is a fundamental skill for managing digital materials. Practice with small files first, then apply these techniques to your real research data.