Working with Compressed Files
Learn ZIP files and other compressed formats. Essential for data sharing, backup, and working with DH datasets.
What You'll Learn:
- Understand different compression formats and their uses
- Create and extract archives using GUI and command line tools
- Implement best practices for DH data archiving
- Troubleshoot common compression problems
Compressed files (like ZIP archives) are everywhere in digital humanities work. Youβll encounter them when downloading datasets, sharing research materials, backing up projects, and working with many DH tools. This guide will help you understand and confidently work with compressed files on both Mac and PC.
Why Compression Matters in DH
Knowledge Check: DH Compression Scenarios
Which scenario would benefit MOST from file compression?
Common DH Scenarios
- Downloading datasets: Many archives provide data as ZIP files
- Sharing research: Email attachments have size limits; compression helps
- Backing up projects: Compress completed work for long-term storage
- Software installation: Many DH tools are distributed as compressed archives
- Collaborative work: Share entire project folders easily
Benefits of Compression
- Smaller file sizes: Save storage space and transfer time
- Bundling: Keep related files together in one package
- Organization: Archive completed work cleanly
- Compatibility: ZIP files work across all platforms
Types of Compressed Files
Format Identification Quiz
You receive a file called "medieval-manuscripts.tar.gz". What can you tell about this file?
Common Formats Youβll Encounter
| Format | Extension | Platform | Notes |
|---|---|---|---|
| ZIP | .zip |
All | Most common, widely supported |
| RAR | .rar |
All | Good compression, requires special software |
| 7-Zip | .7z |
All | Excellent compression ratio |
| Gzip | .gz |
Mac/Linux | Common for single files |
| Tar | .tar |
Mac/Linux | Archive format, often combined with gzip |
| Tar.gz | .tar.gz or .tgz |
Mac/Linux | Tar archive compressed with gzip |
Platform-Specific Formats
Mac:
.dmg- Disk images (software installation).sit- StuffIt archives (legacy)
PC:
.cab- Cabinet files (Windows system files).msi- Windows installer packages
Hands-On Exercise: Creating Your First Archive
Exercise 1: Create a Research Archive
Step 1: Prepare Sample Files
Create a folder called "DH-Sample-Project" with these files:
README.txt- Project descriptiondata.csv- Sample datasetanalysis.py- Python script
Step 2: Create Archive (Choose Your Platform)
Mac Method:
- Right-click on the "DH-Sample-Project" folder
- Select "Compress DH-Sample-Project"
- Rename to "dh-sample-project-2024.zip"
PC Method:
- Right-click on the "DH-Sample-Project" folder
- Select "Send to" β "Compressed (zipped) folder"
- Rename to "dh-sample-project-2024.zip"
Step 3: Test Your Archive
- Move original folder to a different location
- Double-click your ZIP file to extract
- Verify all files are present and can be opened
Command Line Compression
If this message persists, there may be a JavaScript issue. Check browser console for errors.
Mac/Linux Commands
Try these commands in the terminal simulator above:
# List files in current directory
ls
# Create a ZIP archive
zip research.zip *.txt
# List contents of ZIP file
unzip -l research.zip
# Extract ZIP file
unzip research.zip
# Compress a folder recursively
zip -r project.zip DH-Project/
PC PowerShell Commands
# List files
dir
# Create ZIP archive
Compress-Archive -Path "*.txt" -DestinationPath "research.zip"
# Extract ZIP file
Expand-Archive -Path "research.zip" -DestinationPath "."
Advanced Techniques
Best Practices Quiz
You're about to share a 500MB dataset with collaborators worldwide. What's the BEST approach?
Password Protection
Mac (Terminal):
# Create password-protected ZIP
zip -er secure.zip confidential-files/
# Extract password-protected ZIP
unzip secure.zip
PC (PowerShell with 7-Zip):
# Install 7-Zip first, then:
& "C:\Program Files\7-Zip\7z.exe" a -pPASSWORD secure.7z files/
Compression Levels
Higher compression = smaller files but slower processing
Mac:
# Fast compression (less compression)
zip -1 fast.zip files/
# Maximum compression
zip -9 small.zip files/
Real-World DH Scenario
Exercise 2: Digital Archive Workflow
π Scenario
You've completed transcribing 50 Civil War letters for a digital humanities project. You need to:
- Archive the complete project
- Share it with your research team
- Prepare it for long-term preservation
Step 1: Organize Project Structure
civil-war-letters/
βββ README.txt
βββ raw-scans/
β βββ letter-001.jpg
β βββ letter-002.jpg
βββ transcriptions/
β βββ letter-001.txt
β βββ letter-002.txt
βββ metadata/
β βββ catalog.csv
βββ scripts/
βββ text-analysis.py
Step 2: Create Documentation
Write a README.txt that includes:
- Project description and dates
- Creator contact information
- File descriptions
- Usage rights and citation info
Step 3: Archive with Best Practices
- Use descriptive filename:
civil-war-letters-smith-2024-09-27.zip - Test extraction on different computer
- Verify all files open correctly
- Create backup copy
Common Problems and Solutions
Problem 1: βArchive is Damaged/Corruptedβ
Causes:
- Incomplete download
- Transfer error
- Storage media failure
Solutions:
- Re-download the file
- Try different extraction tool
- Use repair utilities (7-Zip has repair function)
Problem 2: Canβt Extract All Files
Causes:
- Insufficient disk space
- File permission issues
- Long file paths (PC)
Solutions:
- Check available disk space
- Extract to different location
- Run as administrator (PC)
- Use shorter destination path
Problem 3: Password Issues
Solutions:
- Check for correct capitalization
- Try copying/pasting password
- Verify with sender if password is correct
Third-Party Tools
Recommended Software
Cross-Platform:
- 7-Zip (free) - Handles many formats, excellent compression
- PeaZip (free) - User-friendly interface, many formats
Mac-Specific:
- The Unarchiver (free) - Handles many formats Finder canβt
- Keka (free/paid) - Modern interface, good format support
PC-Specific:
- WinRAR (paid) - Industry standard, handles RAR files
Best Practices for DH Work
1. Naming Conventions
Good: victorian-novels-corpus-2024-09-27.zip
Bad: stuff.zip
2. Documentation
Include a README.txt file in your archives:
Contents of victorian-novels-corpus.zip
=====================================
Created: 2024-09-27
Creator: Sarah Johnson
Contact: sarah@university.edu
Contents:
/raw-texts/ - Original OCR files
/cleaned/ - Processed text files
/metadata/ - Bibliographic information
README.txt - This file
Notes:
- All texts in UTF-8 encoding
- See metadata/sources.csv for complete bibliography
3. Test Your Archives
Always test that archives extract properly:
- Create the archive
- Extract it to a different location
- Verify all files are present and functional
Proficiency Check: Compression Skills
A colleague sends you "research-data.rar" but you can't open it with your system's built-in tools. What should you do?
Next Steps
Understanding compression prepares you for working with different file formats, where youβll learn about choosing the right format for different types of DH data and ensuring compatibility across platforms and tools.
Remember: Compression is a fundamental skill for managing digital materials. Practice with small files first, then apply these techniques to your real research data.