The long-standing way to compress a file on Unix-y systems has been
gzip (or the
-z option to
tar, or similar).
reverse the process.
The compression landscape has shifted a little, though.
New programs have come along.
You are likely to be compressing very-much-larger files.
You likely have very-much-larger disks, and frankly don't care so much.
Let's review the "new programs". The first one that turned heads was
bzip2. It provided (and provides) useful extra
compression, admittedly at a cost in time. You know you've made it in
compression-land when you garner a GNU
tar flag --
The newest kid on the compression block is
"Lempel-Ziv-Markov chain-Algorithm" is also what's behind the
7zip compression tool; in fact, the compression
may be identical, with only the interface being different.) We know
LZMA is important because, yes, it has a GNU
tar option... too bad it's
been jumping around in recent versions of
Why? Well, a modern variant (fork?) of LZMA is to sweep the nation:
Sooo... In GNU
tar version 1.20, the
--lzma (possibly with
-J as the short form). In version
1.22, the option is
-J switches to being the short form
The most space you are going to gain, vs
gzip, is about 15%. However, you are likely to spend several times more CPU time to get there.
bzip2. Lzma compresses just as well (space-wise) and in far, far less time.
For IT work,
gzip remains an excellent choice. Why? Because, yes,
you do want to compress that stonking-big backup, but, no, you don't
want to triple the CPU-time cost.
For I-care-about-space-savings work,
xz is now the place
If you really care about space savings, you're strongly advised to run some tests of your own. The type of data, amount of memory available, etc., etc. can all really matter.
Incidentally, this compression lark is a good fit for those multiple
(useless) CPU cores that vendors are selling us. Each of these
tools has a "parallel" version:
gzip, and something like a
-mmt=on option for
the fast-shifting documentation). For further idle amusement, see
Jeff Atwood's blog item.
Finally, if you like compression, why not just do the whole filesystem (and relive the days of DoubleSpace on Windows 95)?
SquashFS is very widely used in the embedded world Only problem is, it's just for read-only filesystems. For ext2/ext3 read-write filesystems, there are add-on compression patches (urk). If you want to try it all in user space, there's something based on the super-flexible FUSE: compFUSEd. None excites me; I'd rather buy a bigger disk.
[An earlier version of this note appeared in Verilab's internal newsletter.]