Gzip vs Bzip2. Some notes when moving files from Windows 7 to Mac OSX

For a metagenomic bioinformatics course I needed to created a Virtual machine (VM) with all software that will be used during the course. For this I started with the QIIME Virtual machine disk , since it has already a nice bunch of software installed, and I then installed the extra software for the course. The software includes MEGAN5, Metaxa2, the metabarcoding tool : obitools, and much more. In addition, I added some tutorial datasets as well. This entire VM is available here.

As you can expect this virtual machine is large. The current size of the disk is about 19 Gb. I created the VM on a windows laptop but I needed to transfer it to my old iMAC. That machine is hooked upbox to the University nextwork and makes it easy to upload the VM to my personal webpage at UIO so it can be downloaded.

To make it relatively painless to retrieve this VM online I needed to compress it. The VM archive was made on a Windows 7 machine, but the retrieval should be easy on Mac osx or Linux as well

I first tried gzip which comes with the 7-zip software. After transferring the gzip archive, I extracted it first with the mac archiver software and then using gunzip from the commandline. In both cases the VM disk had a size over 20 Gb? When I tried to load the VM into virtualbox it failed. That suggests that gzip had corrupted the VM-disk in either the compression or decompression fase. So I could not use gzip.

Since the 7-zip software comes with many options of compression algorithms I could choose what to use next. I did not want to use tar, since most people are not familiar with it and then the whole goal of making these bioinformatics tools available does not work.

So I then checked bzip2. It is installed on my mac with OSX 10.7.5. A bzip2 archive can be extracted from the terminal., but also with the Mac archiver. In my experience with this huge VM, it worked in both ways without errors and the VM could be started with virtualbox.

So in conclusion bzip2 is relatively easy to use for archives that are use on different platforms. Of course I have not tested this later statement in depth, and you should tell me if I am jumping to conclusions here.


