Database Backup Compression

From MythTV Official Wiki
Revision as of 17:07, 16 November 2009 by Sphery (talk | contribs) (Add link to XZ Utils, explain which compression xz uses)

Jump to: navigation, search

On my Athlon X2 5200+, compressing the SQL backup:

$ ls -l ~/backup/mythconverg-1214-20090403154601.sql
-rw-r--r-- 1 me group 321488312 Apr  3 16:34 /home/me/backup/mythconverg-1214-20090403154601.sql     

with gzip takes 14.3sec and creates a 53MiB file (approximately 18% the size of the uncompressed backup); while using bzip2 takes 2min 1.2sec and creates a 45MiB file (approximately 84% the size of the gzip-compressed file or 15% the size of the uncompressed backup). However, using xz (from the XZ Utils package), which uses the same LZMA compression algorithms used by 7-zip, takes 8min 59sec and creates a 30MiB file (approximately 56% the size of the gzip-compressed file or 10% the size of the uncompressed backup)

So, it seems gzip is still the winner (at least for my needs) as it's extremely fast and the space saved--23.6MiB with xz--is /not/ worth the processor time (let alone the energy cost of using a better compression algorithm[*]).


$ time gzip ~/backup/mythconverg-1214-20090403154601.sql

real    0m14.329s
user    0m13.907s
sys     0m0.421s
$ ls -l ~/backup/mythconverg-1214-20090403154601.sql*
-rw-r--r-- 1 me group 56496450 Apr  3 16:34 /home/me/backup/mythconverg-1214-20090403154601.sql.gz


$ time bzip2 ~/backup/mythconverg-1214-20090403154601.sql

real    2m1.168s
user    2m0.704s
sys     0m0.445s
$ ls -l ~/backup/mythconverg-1214-20090403154601.sql*
-rw-r--r-- 1 me group 47531259 Apr  3 16:34 /home/me/backup/mythconverg-1214-20090403154601.sql.bz2


$ time xz ~/backup/mythconverg-1214-20090403154601.sql

real    8m59.496s
user    8m58.710s
sys     0m0.654s
$ ls -l ~/backup/mythconverg-1214-20090403154601.sql*
-rw-r--r-- 1 me group 31731304 Apr  3 16:34 /home/me/backup/mythconverg-1214-20090403154601.sql.xz

[*]Cost of drive space at $0.10/GB = $0.0024765146 or at $0.08/GB = $0.001981211680. Assuming 8min of additional 100% CPU usage with a change of 18W between idle and 100% CPU usage, that's 2.4Wh = 0.0024kWh, so at $0.10/kWh, that's $0.00024 to perform the compression. At first glance, the compression seems to be 10 times cheaper than the drive space, but when you realize that you'll be doing frequent backups and deleting/rotating old backups, the drive space turns out to be cheaper over time.