Difference between revisions of "Database Backup Compression"

From MythTV Official Wiki
Jump to: navigation, search
(Initial page.)
 
(Rearrange to make the info easier to see.)
 
(One intermediate revision by the same user not shown)
Line 4: Line 4:
 
  -rw-r--r-- 1 me group 321488312 Apr  3 16:34 /home/me/backup/mythconverg-1214-20090403154601.sql     
 
  -rw-r--r-- 1 me group 321488312 Apr  3 16:34 /home/me/backup/mythconverg-1214-20090403154601.sql     
  
with gzip takes 14.3sec and creates a 53MiB file (approximately 18% the size of the uncompressed backup); while using bzip2 takes 2min 1.2sec and creates a 45MiB file (approximately 84% the size of the gzip-compressed file or 15% the size of the uncompressed backup). However, using xz (from the XZ Utils package) takes 8min 59sec and creates a 30MiB file (approximately 56% the size of the gzip-compressed file or 10% the size of the uncompressed backup)
+
using gzip, bzip2, and xz (from the [http://tukaani.org/xz/ XZ Utils] package, which uses the same LZMA compression algorithms used by 7-zip) gives the following results:
  
So, it seems gzip is still the winner (at least for my needs) as it's extremely fast and the space saved--23.6MiB with xz--is /not/ worth the processor time (let alone the energy cost of using a better compression algorithm[*]).
+
{| border=1 cellspacing=0 cellpadding=5
 +
|'''Compressor'''
 +
|'''Compression Time'''
 +
|'''Size After Compression'''
 +
|'''Size Compared to Uncompressed'''
 +
|'''Size Compared to gzip'ed'''
 +
|-
 +
|None (uncompressed)
 +
|N/A
 +
|307 MiB
 +
|100%
 +
|569.0%
 +
|-
 +
|gzip
 +
|14.3 seconds
 +
|53 MiB
 +
|17.6%
 +
|100%
 +
|-
 +
|bzip2
 +
|121.2 seconds (2min 1.2sec)
 +
|45 Mib
 +
|14.8%
 +
|84.1%
 +
|-
 +
|xz
 +
|539 seconds (8min 59sec)
 +
|30 MiB
 +
|9.9%
 +
|56.2%
 +
|}
 +
 
 +
So, for mythconverg backup compression, when factoring in both space savings and CPU usage, it seems gzip is the winner (for most users's needs) as it's extremely fast and the space saved by using xz instead of gzip--23.6MiB saved on a rather large 307MiB backup file--is /not/ worth the processor time (let alone the energy cost[*]) of using the better compression algorithm.  (For other usage--where files will be compressed infrequently and transmitted many times (per compression) over high-cost bandwidth (where bandwidth is more expensive than CPU time), the same would not hold true.)
  
 
== gzip ==
 
== gzip ==

Latest revision as of 21:47, 16 November 2009

On my Athlon X2 5200+, compressing the SQL backup:

$ ls -l ~/backup/mythconverg-1214-20090403154601.sql
-rw-r--r-- 1 me group 321488312 Apr  3 16:34 /home/me/backup/mythconverg-1214-20090403154601.sql     

using gzip, bzip2, and xz (from the XZ Utils package, which uses the same LZMA compression algorithms used by 7-zip) gives the following results:

Compressor Compression Time Size After Compression Size Compared to Uncompressed Size Compared to gzip'ed
None (uncompressed) N/A 307 MiB 100% 569.0%
gzip 14.3 seconds 53 MiB 17.6% 100%
bzip2 121.2 seconds (2min 1.2sec) 45 Mib 14.8% 84.1%
xz 539 seconds (8min 59sec) 30 MiB 9.9% 56.2%

So, for mythconverg backup compression, when factoring in both space savings and CPU usage, it seems gzip is the winner (for most users's needs) as it's extremely fast and the space saved by using xz instead of gzip--23.6MiB saved on a rather large 307MiB backup file--is /not/ worth the processor time (let alone the energy cost[*]) of using the better compression algorithm. (For other usage--where files will be compressed infrequently and transmitted many times (per compression) over high-cost bandwidth (where bandwidth is more expensive than CPU time), the same would not hold true.)

gzip

$ time gzip ~/backup/mythconverg-1214-20090403154601.sql

real    0m14.329s
user    0m13.907s
sys     0m0.421s
$ ls -l ~/backup/mythconverg-1214-20090403154601.sql*
-rw-r--r-- 1 me group 56496450 Apr  3 16:34 /home/me/backup/mythconverg-1214-20090403154601.sql.gz

bzip2

$ time bzip2 ~/backup/mythconverg-1214-20090403154601.sql

real    2m1.168s
user    2m0.704s
sys     0m0.445s
$ ls -l ~/backup/mythconverg-1214-20090403154601.sql*
-rw-r--r-- 1 me group 47531259 Apr  3 16:34 /home/me/backup/mythconverg-1214-20090403154601.sql.bz2

xz

$ time xz ~/backup/mythconverg-1214-20090403154601.sql

real    8m59.496s
user    8m58.710s
sys     0m0.654s
$ ls -l ~/backup/mythconverg-1214-20090403154601.sql*
-rw-r--r-- 1 me group 31731304 Apr  3 16:34 /home/me/backup/mythconverg-1214-20090403154601.sql.xz

[*]Cost of drive space at $0.10/GB = $0.0024765146 or at $0.08/GB = $0.001981211680. Assuming 8min of additional 100% CPU usage with a change of 18W between idle and 100% CPU usage, that's 2.4Wh = 0.0024kWh, so at $0.10/kWh, that's $0.00024 to perform the compression. At first glance, the compression seems to be 10 times cheaper than the drive space, but when you realize that you'll be doing frequent backups and deleting/rotating old backups, the drive space turns out to be cheaper over time.