Database Backup Compression

From MythTV Official Wiki
Revision as of 21:47, 16 November 2009 by Sphery (talk | contribs) (Rearrange to make the info easier to see.)

(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search

On my Athlon X2 5200+, compressing the SQL backup:

$ ls -l ~/backup/mythconverg-1214-20090403154601.sql
-rw-r--r-- 1 me group 321488312 Apr  3 16:34 /home/me/backup/mythconverg-1214-20090403154601.sql     

using gzip, bzip2, and xz (from the XZ Utils package, which uses the same LZMA compression algorithms used by 7-zip) gives the following results:

Compressor Compression Time Size After Compression Size Compared to Uncompressed Size Compared to gzip'ed
None (uncompressed) N/A 307 MiB 100% 569.0%
gzip 14.3 seconds 53 MiB 17.6% 100%
bzip2 121.2 seconds (2min 1.2sec) 45 Mib 14.8% 84.1%
xz 539 seconds (8min 59sec) 30 MiB 9.9% 56.2%

So, for mythconverg backup compression, when factoring in both space savings and CPU usage, it seems gzip is the winner (for most users's needs) as it's extremely fast and the space saved by using xz instead of gzip--23.6MiB saved on a rather large 307MiB backup file--is /not/ worth the processor time (let alone the energy cost[*]) of using the better compression algorithm. (For other usage--where files will be compressed infrequently and transmitted many times (per compression) over high-cost bandwidth (where bandwidth is more expensive than CPU time), the same would not hold true.)

gzip

$ time gzip ~/backup/mythconverg-1214-20090403154601.sql

real    0m14.329s
user    0m13.907s
sys     0m0.421s
$ ls -l ~/backup/mythconverg-1214-20090403154601.sql*
-rw-r--r-- 1 me group 56496450 Apr  3 16:34 /home/me/backup/mythconverg-1214-20090403154601.sql.gz

bzip2

$ time bzip2 ~/backup/mythconverg-1214-20090403154601.sql

real    2m1.168s
user    2m0.704s
sys     0m0.445s
$ ls -l ~/backup/mythconverg-1214-20090403154601.sql*
-rw-r--r-- 1 me group 47531259 Apr  3 16:34 /home/me/backup/mythconverg-1214-20090403154601.sql.bz2

xz

$ time xz ~/backup/mythconverg-1214-20090403154601.sql

real    8m59.496s
user    8m58.710s
sys     0m0.654s
$ ls -l ~/backup/mythconverg-1214-20090403154601.sql*
-rw-r--r-- 1 me group 31731304 Apr  3 16:34 /home/me/backup/mythconverg-1214-20090403154601.sql.xz

[*]Cost of drive space at $0.10/GB = $0.0024765146 or at $0.08/GB = $0.001981211680. Assuming 8min of additional 100% CPU usage with a change of 18W between idle and 100% CPU usage, that's 2.4Wh = 0.0024kWh, so at $0.10/kWh, that's $0.00024 to perform the compression. At first glance, the compression seems to be 10 times cheaper than the drive space, but when you realize that you'll be doing frequent backups and deleting/rotating old backups, the drive space turns out to be cheaper over time.