Lzma stands for Lempel-Ziv-Markov chain Algorithm. Lzma is a compression tool like bzip2 and gzip to compress and decompress files. It tends to be significantly faster and efficient than bzip compression. As we know, gzip compression ratio is worse than bzip2 (and lzma).
In this article, let us understand how to use lzma, an effective compression utility which is significantly better in compression ratio and faster operation.
Compress the input text file using lzma -c
$ lzma -c --stdout sample.txt >sample.lzma
Decompress the lzma file using -d option
$ lzma -d –stdout sample.lzma >sample.txt
Comparison between bzip2 and lzma compression tools
To understand the effectiveness of lzma, let us compress/decompress a 1MB sample.txt with both lzma and bzip2 and compare the outcome. These testing has been done with the machine which has 1GB of RAM and the processor is Pentium 4.
Size of the sample.txt input file:
$ ls -l sample.txt -rw-r--r-- 1 bala bala 1048576 2010-05-14 19:43 sample.txt
Note: We used time command in front of every compression and decompression commands to get the CPU usage of the command.
Compress the sample.txt using bzip2
Compress the input file with bzip2 command and it doesnt require the option during compression.
$ time bzip2 sample.txt real 0m27.874s user 0m13.981s sys 0m0.148s $ ls -l sample.txt.bz2 -rw-r--r-- 1 bala bala 1750 2010-05-14 19:43 sample.txt.bz2
After bzip2 compression, the output file size is of 1750 bytes.
Decompress the sample.txt using bunzip2
Decompress the compressed file with bunzip2 utility and it also doesn’t need any option to be passed.
$ bunzip2 sample.txt.bz2 real 0m0.232s user 0m0.128s sys 0m0.020s
Compress the sample.txt using lzma
Now, let us compress the sample.txt using lzma command with the following options:
- -c to compress
- –stdout to print the compressed output in stdout
$ time lzma -c --stdout sample.txt >sample.lzma real 0m2.035s user 0m1.544s sys 0m0.132s $ ls -l sample.lzma -rw-r--r-- 1 bala bala 543 2010-05-14 19:48 sample.lzma
After the compression, lzma produces the output file with the size as 543 bytes, which is comparatively less than bzip2 command. Also, as seen above, the CPU time used by lzma is much less than the bzip2.
Decompress the sample.txt using lzma
Decompress the *.lzma file using the lzma command with following options:
- -d to compress
- –stdout to print the decompressed output in stdout
$ time lzma -d --stdout sample.lzma >sample.txt real 0m0.043s user 0m0.016s sys 0m0.004s
As seen above, the decompression done by lzma is many times quicker than bzip2
Different Levels of Lzma Compression
- Lzma provides the compression range from -1 to -9.
- -9 is the highest compression ratio, which requires certain amount of time and system resources to do it. These ratio are not applicable for decompression.
- -1 is the lowest level compression ratio and it runs much quicker.
Do the following to do a quick lzma compression using the low level compression ratio:
$ lzma -1 -c --stdout sample.txt >sample.lzma $ ls -l sample.lzma -rw-r--r-- 1 bala bala 548 2010-05-14 20:47 sample.lzma
Note: -fast is alias to -1.
-9 is the highest level compression ratio and it takes longer time to compress than the low level ratio. Do the following to do a intensive compression using the high level compression ratio:
$ lzma -9 -c --stdout sample.txt >sample.lzma $ ls -l sample.lzma -rw-r--r-- 1 bala bala 543 2010-05-14 20:55 sample.lzma
Note: -best is alias to -9.
Comments on this entry are closed.
Hi! Maybe you would like to compare to the XZ compression (which is based on LZMA):
http://tukaani.org/xz/
Here is a short comparative:
http://tukaani.org/lzma/benchmarks.html
As you can see, if you don’t mind about the time it takes to compress, you can get the best ratios with XZ. Also, the decompression time is lower than that of bzip2 and it’s supported by the tar archiver 😉
Thank you for this valuable guide! Before reading it I always compressed my pdfs and text based documents, backups with bzip2. Now I’m completely swicthing for sure 🙂
Note: LZMA utils[1] are deprecated in favor of XZ utils[2]. Maybe your system aliased them.
1: http://tukaani.org/lzma/
2: http://tukaani.org/xz/
I also learned a useful command,it’s very usefull!Thank you.
how to compress a directory?
rogue:
first you have to tar it then lzma on the tar, simple 🙂 just az with bzip2
In my experience, bzip2 is faster and often produces smaller files than lzma:
$ ls -l 100_0758.mov*
-rwxr-xr-x 1 chris chris 10246430 14-Apr-1903 05:49:04 100_0758.mov
$ time lzma -9 100_0758.mov
real 0m5.418s
user 0m5.159s
sys 0m0.219s
$ ls -l 100_0758.mov*
-rwxr-xr-x 1 chris chris 10094080 14-Apr-1903 05:49:04 100_0758.mov.lzma
$ lzma -d 100_0758.mov*
$ time bzip2 -9 100_0758.mov
real 0m4.486s
user 0m4.412s
sys 0m0.038s
$ ls -l 100_0758.mov*
-rwxr-xr-x 1 chris chris 10025989 14-Apr-1903 05:49:04 100_0758.mov.bz2
With a text file, lzma does produce smaller files, but it still takes longer than bzip2.
Chris:
bzip and lzma are both destined for compressing text data, not binary. They both produce better results when not used for binary data then rar for example. Keep that in mind. So if you run your benchmarks now on for example a pdf file, which contains much text, then you will see the difference.
They are both designed for all types of files.
The lzma man page states, “lzma provides notably better compression ratio than bzip2 especially with files having other than plain text content.”
Very intersting, thanks for this article. I will try lzma more often.
There are other things than the ratio of compression to consider. bzip2 has the advantage over gzip of making his work by “blocks”; if a compressed file was damaged or partially transmitted on the net, remaining blocks were unaffected and could be recovered (see the man page).
And with lzma ?
I think you compressed an empty file. That is hardly a comprehensive benchmark, or anything that really shows LZMA can be better than bzip2… try compressing ~10mb of source code, that will tell others about real life use.
Anonymous, ok I have compressed a full Zend Framework library with api and reference documentation that weights 16,368 items, totalling 187.5 MB. With bzip2 it’s 21.3 MB and with lzma it’s 18.9 MB which is only 12% smaller.
So still if you need to have it piped through ssh and decpomressed and loaded maybe in the same time I suggest bzip2. When you need it for storage then lzma is perfect.
Hi Sir, This informational tutorial is really very helpful for day-to-day backup utilites on the server which saves space. Thank You,