Thursday, January 27, 2022

Java Tutorial: Compressing and Decompressing ZIP File

Chapters

Overview

In this tutorial, We're gonna discuss how to compress files/directories in a ZIP file and how to decompress ZIP file. java.util.zip provides classes for reading and writing the standard ZIP and GZIP file formats. Also includes classes for compressing and decompressing data using the DEFLATE compression algorithm, which is used by the ZIP and GZIP file formats.

Additionally, there are utility classes for computing the CRC-32, CRC-32C and Adler-32 checksums of arbitrary input streams. More information can be found in the documentation.

Compressing Files

To compress files, we need to set up output stream for the compressed Zip file and input stream for the files. This example demonstrates compressing files using ZipOutputStream.
import java.io.FileOutputStream;
import java.io.FileInputStream;
import java.util.zip.ZipOutputStream;
import java.util.zip.ZipEntry;
import java.io.File;
import java.io.IOException;

public class SampleClass{
  
  void putToZip(ZipOutputStream zos,
                File source, String entryName)
                throws IOException{
                
    for(File f : source.listFiles()){
        
        if(f.isDirectory()){
          
          if(f.getName().endsWith("/"))
            zos.putNextEntry(
            new ZipEntry(
            entryName+f.getName()));
          else
            zos.putNextEntry(
            new ZipEntry(
            entryName+f.getName()+"/"));
          
          zos.closeEntry();
          
          putToZip(zos, f, 
          entryName+f.getName()+"/");
          continue;
        }
        
        FileInputStream fis = null;
        try{
          fis = new FileInputStream(f);
          ZipEntry ze = new ZipEntry(
          entryName+f.getName());
          zos.putNextEntry(ze);
        
          byte[] buffer = new byte[1024];
          int length;
          while((length = fis.read(buffer)) >= 0)
            zos.write(buffer);
       }finally{
         if(fis != null)
           fis.close();
         zos.closeEntry();
       }
        
    }
  }
  
  public static void main(String[] args)
                      throws IOException{
    File source = new File("C:\\test\\Files");
    File output = 
    new File("C:\\test\\output\\"+
             source.getName()+".zip");
    
    if(!source.exists()){
       System.out.println("Source doesn't exist!");
       return;
    }
    if(output.exists()){
      System.out.println
      (source.getName()+".zip "+" already exists!");
      return;
    }
    
    try(
    FileOutputStream fos =
    new FileOutputStream(output);
    ZipOutputStream zos = new ZipOutputStream(fos)){
      new SampleClass()
      .putToZip(zos, source, "");
    }
    System.out.println("Operation Complete!");
  }
}
First off, we need source and destination of the zip file. In the example above, the source is C:\\test\\Files which is a directory. Next, create a FileOutputStream and attach it to ZipOutputStream. Before writing files to ZipOutputStream, we need to make an entry per file using ZipEntry. Directories doesn't need to be written by ZipOutputStream.

When putting directories in a ZipEntry, their names need to end with "/". Some operating systems put "/" after directory name. It better to check if "/" is already part of directory name so we don't accidentally add another "/".

putNextEntry() method begins writing a new Zip file entry and positions ZipOutputStream to the start of the entry data. Closes the current entry if still active. Once we create a ZipEntry, we need to close it. Once this method is invoked, it automatically closes previous entry if it's currently active.

closeEntry() closes the current Zip entry and positions the stream for writing the next entry. In the example above, putToZip is a recursive method.

Decompressing Zip File

To decompress ZIP file, we need to input stream for extracting data bytes in the ZIP file and output stream for converting the bits to file. This example demonstrates decompressing files using ZipInputStream.
import java.io.FileOutputStream;
import java.io.FileInputStream;
import java.util.zip.ZipInputStream;
import java.util.zip.ZipEntry;
import java.io.File;
import java.io.IOException;

public class SampleClass{
  
  public static void main(String[] args)
                      throws IOException{
    File output = new File("C:\\test\\output");
    File source = 
    new File("C:\\test\\output\\Files.zip");
    
    if(!source.exists()){
       System.out.println("Source doesn't exist!");
       return;
    }
    
    if(!output.exists()){
      System.out.println
      ("output folder doesn't exist!");
      return;
    }
    else if(!output.isDirectory()){
      System.out.println
      ("Destination must be a directory!");
      return;
    }
    
    try(
    FileInputStream fis = 
    new FileInputStream(source);
    ZipInputStream zis = new ZipInputStream(fis)){
      ZipEntry ze = zis.getNextEntry();
      
      byte[] buffer = new byte[1024];
      while(ze != null){
        /*zip slip guard*/
        File entryFile = 
        new File(output, ze.getName());
        
        String entryPath = 
        entryFile.getCanonicalPath();
        String outputPath = 
        output.getCanonicalPath();
        
        if(!entryPath
        .startsWith(
         outputPath + File.separator)){
          System.err.println
          ("File destination is invalid!");
          return;
        }
        /**/
        
        if(ze.isDirectory())
          if(!entryFile.mkdirs()){
            System.out.println
            ("Failed to create directories or "+
             "directories already exist!");
            System.out.println
            ("Operation Aborted!");
            return;
          }
          else{
            ze = zis.getNextEntry();
            continue;
          }
        
        try(
        FileOutputStream fos =
        new FileOutputStream(entryFile)){
         int len = 0;
         while((len = zis.read(buffer)) >= 0) 
             fos.write(buffer, 0, len);
        }
        ze = zis.getNextEntry();
      }
    System.out.println("Operation Complete!");
    }
  }
}
First off, we need a ZIP file and a destination folder. Then, we need FileInputStream and attach it to ZipInputStream to read zip entries and content of ZIP file. Zip slip guard code snippet is used for protecting our program from Zip Slip vulnerability.

Next, we need to check if there are entries that are directories. If there are, we need to make directories in our storage disk before writing their content to it. isDirectory() method in ZipEntry checks if an entry is a directory entry. If the entry name ends with "/" then, that entry is a directory.

getNextEntry() gets the next entry in a ZIP file. This method returns null if there are no more entries left in the ZIP file. write(byte[] b, int off, int len) Writes len bytes from the specified byte array starting at offset. In the example above, This expression fos.write(buffer, 0, len) is equivalent to fos.write(buffer)

CheckedInputStream and CheckedOutputStream

CheckedInputStream is an input stream that also maintains a checksum of the data being read whereas CheckedOutputStream is an output stream that also maintains a checksum of the data being written. The checksum can then be used to verify the integrity of the processed data.

This example demonstrates CheckedInputStream and CheckedOutputStream.
import java.util.zip.CheckedInputStream;
import java.util.zip.CheckedOutputStream;
import java.util.zip.CRC32;
import java.io.FileInputStream;
import java.io.FileOutputStream;
import java.io.IOException;
import java.io.File;

public class SampleClass{

  public static void main(String[] args)
                      throws IOException{
    File source = 
    new File("C:\\test\\output\\Files.zip");
    File copy = 
    new File("C:\\test\\output\\FilesCopy.zip");
    
    try(CheckedInputStream cis = 
    new CheckedInputStream(
    new FileInputStream(source),new CRC32());
    CheckedOutputStream cos = 
    new CheckedOutputStream(
    new FileOutputStream(copy),new CRC32())){
      byte[] buffer = new byte[1024];
      while(cis.read(buffer) > 0)
        cos.write(buffer);
        
      long sourceCheckSum = 
      cis.getChecksum().getValue();
      
      long copyCheckSum =
      cos.getChecksum().getValue();
      
      System.out.println
      ("source: "+sourceCheckSum);
      System.out.println
      ("copy: "+copyCheckSum);
    }
    
  }
}

Result(may vary)
source: 3855373666
copy: 2474678416
Let's assume that Files.zip is created by ZipOutputStream. In the example above, the source and copy have different checksum. In my opinion, this happens because the source is created differently from the copy. In the copy creation, we just copy the source bytes. In the source creation, we used ZipOutputStream and laid out entries in the zip file. Now, run the example above but this time let FilesCopy.zip be the source and make a copy of it using the example above. This time the source and copy will have the same checksum.

getChecksum() returns Checksum object. Note that CheckedInputStream and CheckedOutputStream are not limited to zip files. They can be used for other file types. Also, java supports three types of checksums: CRC-32, CRC-32C and Adler-32.

When we want create an uncompressed zip file, we need change the ZipOutputStream compression level. To do that, we use seLevel() method and put Deflater.NO_COMPRESSION as argument. However, this compression level requires a maintained checksum. We can use CheckedOutputStream to do this. This snippet demonstrates setting compression level to Deflater.NO_COMPRESSION.
FileOutputStream fos = new FileOutputStream(sourceZip);
CheckedOutputStream checksum = 
new CheckedOutputStream(fos,new CRC32());
ZipOutputStream zos = new ZipOutputStream(checksum);
zos.setLevel(Deflater.NO_COMPRESSION);

Splitting Zip File

Java doesn't provide any tools for splitting zip file. However, there are ways to split zip file. If you have lots of time in your hands, you may read this zip specification. This article has a nice explanation about zip headers. The constants(e.g. CENATX, CENEXT, etc.) that you see in some classes in java.util.zip package like ZipEntry are zip headers.

This article has solutions on how to split zip files. One of them is using zip4j library. It's better to use library like zip4j if you don't have time reading the zip specification.

No comments:

Post a Comment