Difference between revisions of "RAID"

From MythTV Official Wiki
Jump to: navigation, search
(changed spindown drives section to point to wiki entry)
Line 1: Line 1:
{{Wikipedia}}'''RAID''' or Redundant Array of Inexpensive Disks is a mechanism for using multiple disk drives to provide redundant [[file storage]].
+
{{Wikipedia}}'''RAID''' or Redundant Array of Inexpensive Disks is a mechanism for using multiple disk drives to provide redundant [[file sto
 +
rage]].
  
 
== Quick Overview ==
 
== Quick Overview ==
 
===Performance Expectations===
 
===Performance Expectations===
There are many 'opinions' on what RAID level is best for performance, since it can vary greatly depending on many factors, but with all things being equal, here's the facts about the most common RAID levels:
+
There are many 'opinions' on what RAID level is best for performance, since it can vary greatly depending on many factors, but with all thing
 +
s being equal, here's the facts about the most common RAID levels:
 
*RAID 5 is the slowest*
 
*RAID 5 is the slowest*
 
*RAID 1 is in the middle (speed is equivelent to not using RAID)
 
*RAID 1 is in the middle (speed is equivelent to not using RAID)
 
*RAID 0 (and RAID 0+1, 10, 1+0, etc) is the fastest
 
*RAID 0 (and RAID 0+1, 10, 1+0, etc) is the fastest
 +
 +
For reading big files, like videos, RAID 5 can be quite fast.
  
 
And here's why:
 
And here's why:
 
=====RAID 5=====
 
=====RAID 5=====
RAID 5 (striping with parity) is the slowest because your computer has to calculate parity for every write operation ''and then'' write that parity to disk. This can be considerable overhead with software RAID or budget RAID controllers. ''Parity'' is the distrubed data that allows you to lose any hard drive but still not lose data. Read speads are very good, better than RAID 1. *You can significantly increase RAID 5 performance, sometimes even get it faster than RAID 1 performance, if you do the following:
+
RAID 5 (striping with parity) is the slowest because your computer has to calculate parity for every write operation ''and then'' write that  
 +
parity to disk. This can be considerable overhead with software RAID or budget RAID controllers. ''Parity'' is the distrubed data that allows
 +
you to lose any hard drive but still not lose data. Read speads are very good, better than RAID 1. *You can significantly increase RAID 5 performance, sometimes even get it faster than RAID 1 performance, if you do the following:
 
*Use hardware RAID solution
 
*Use hardware RAID solution
 
*Use a Battery-Backed Write-back Cache, or BBWBC (only available on high-end RAID controllers). These only help data bursts, up to the size of the cache (typically 16-256 MB), not extended write operations.
 
*Use a Battery-Backed Write-back Cache, or BBWBC (only available on high-end RAID controllers). These only help data bursts, up to the size of the cache (typically 16-256 MB), not extended write operations.
 
*Add more disks
 
*Add more disks
 
In general, unless you have server-class hardware and SCSI disks, you can always expect RAID 5 ''writing'' to be slower than RAID 1 or 0. ''Reading'' is very fast, can be similar to RAID 0.
 
In general, unless you have server-class hardware and SCSI disks, you can always expect RAID 5 ''writing'' to be slower than RAID 1 or 0. ''Reading'' is very fast, can be similar to RAID 0.
 +
 +
If you use Linux to create big files sequentially, Linux MD RAID 5 is very fast, as there is no need to read the parity information, the parity information is calculated anew from the data of the new file. RAID 5 is then almost as fast as RAID 0.
  
 
=====RAID 1=====
 
=====RAID 1=====
RAID 1 (mirroring) is typically not faster nor slower than using a single disk for writing. For reading it could use all the disks in parallell an thus improve the performance.
+
RAID 1 (mirroring) is typically not faster nor slower than using a single disk for writing. For reading it could use all the disks in parallel and thus improve the performance.
  
 
=====RAID 0=====
 
=====RAID 0=====
RAID 0 (striping) is the fastest with reading and writing because your computer reads and writes different data to two disks at the same time, theoretically doubling the performance of RAID 1. (Note: RAID 0 ''is not'' the same as disks spanning, or extending, a volume. Spanning is not RAID and provides no performance change or redundancy.) Software RAID 0 can be nearly as fast as hardware RAID 0. You can significantly increase the speed of RAID 0 by:
+
RAID 0 (striping) is the fastest with reading and writing because your computer reads and writes different data to two disks at the same time, theoretically doubling the performance of RAID 1. (Note: RAID 0 ''is not'' the same as disks spanning, or extending, a volume. Spanning is  
 +
not RAID and provides no performance change or redundancy.) Software RAID 0 can be nearly as fast as hardware RAID 0. You can significantly increase the speed of RAID 0 by:
 
*Addings more disks
 
*Addings more disks
  
 
=====RAID 0+1 (or 1+0, 01, 10)=====
 
=====RAID 0+1 (or 1+0, 01, 10)=====
RAID 0+1 (mirroring a striped set) is as fast as RAID 0. Like RAID 0, you can make it even faster by adding more disks. Note: There is some argument over the difference in performance between RAID 10 and RAID 01. In other words, should you stripe a mirror, or mirror a stripe? You can safely ignore those that argue this, as it really doesn't matter on the subject of '''performance'''. Most hardware vendors stripe first; 0+1. However, on the subject of '''redundancy''', RAID 10 has higher chances than RAID 0+1 of surviving certain hardware failures.
+
RAID 0+1 (mirroring a striped set) is half as fast as RAID 0 with the same number of disks. Like RAID 0, you can make it even faster by adding more disks. Note: There is some argument over the difference in performance between RAID 10 and RAID 01. In other words, should you stripe  
 +
a mirror, or mirror a stripe? You can safely ignore those that argue this, as it really doesn't matter on the subject of '''performance'''. Most hardware vendors stripe first; 0+1. However, on the subject of '''redundancy''', RAID 10 has higher chances than RAID 0+1 of surviving ce
 +
rtain hardware failures.
  
Please note that Linux MD raid10 is something different from what is called RAID10 in other places. First, this is not a nested raid of first a RAID1 and then a RAID0; instead this is one set of array only, created by only one mdadm command. Second, you may make a striped and mirrored array consisting of only 2 drives, where traditional RAID10 requires 4 drives. This means that the reading speed of Linux MD raid10 (in the f2 layout) has double the reading speed compared to RAID1, and the same reading speed as RAID10, but with only half the number of drives. You can create a Linux MD raid10 with 4 drives, which would have double the sequential reading speed of traditional RAID10 4-drive arrays.
+
=====Linux MD RAID10 F2=====
 +
Linux MD RAID10 in the F2 layout is as fast as RAID 0 for reading, and half as fast as RAID 0 for writing.
 +
Please note that Linux MD raid10 is something different from what is called RAID10 or RAID1+0 in other places. First, this is not a nested ra
 +
id of first a RAID1 and then a RAID0; instead this is one set of array only, created by only one mdadm command. Second, you may make a stripe
 +
d and mirrored array consisting of only 2 drives, where traditional RAID1+0 requires 4 drives. This means that the reading speed of Linux MD  
 +
raid10 (in the f2 layout) has double the reading speed compared to RAID1 and RAID1+0, given the same number of drives.
  
 
===Capacity===
 
===Capacity===
Line 33: Line 49:
 
*RAID 1 = 1 x C  (Capacity of one disk, efficiency varies, no more than 50% efficient, decreasing as drives are added)
 
*RAID 1 = 1 x C  (Capacity of one disk, efficiency varies, no more than 50% efficient, decreasing as drives are added)
 
*RAID 5 = (N x C) - C  (Total capacity minus one disk, efficiency varies, but no less than 66% and increases as drives are added)
 
*RAID 5 = (N x C) - C  (Total capacity minus one disk, efficiency varies, but no less than 66% and increases as drives are added)
*RAID 0+1 = (N x C) / 2  (Half the total capacity of all disks, or 50% efficient)
+
*RAID 0+1, RAID 1+0 or Linux MD raid10 = (N x C) / 2  (Half the total capacity of all disks, or 50% efficient)
  
 
===Redundancy===
 
===Redundancy===
Line 39: Line 55:
 
*RAID 1 = Lose all but 1 disk without any data loss.
 
*RAID 1 = Lose all but 1 disk without any data loss.
 
*RAID 5 = Lose 1 disk without any data loss.
 
*RAID 5 = Lose 1 disk without any data loss.
*RAID 0+1 = Lose up to half your disks (''if'' they are on the same stripe set) without any data loss. If one disk from each stripe set is lost, all data is lost.
+
*RAID 0+1 = Lose up to half your disks (''if'' they are on the same stripe set) without any data loss. If one disk from each stripe set is lo
 +
st, all data is lost.
 +
*RAID10 (Linux MD) = Lose up to half of your disks, similar to RAID 1+0, but for bigger number of disks the chances of surviving more disk cr
 +
ashes are better, as all disks are in one set of disks, instead of a RAID 0 array of a number of RAID 1 arrays.  
  
RAID 0+1 and RAID 10 may look very similar, but there is a distinct difference when it comes to redundancy. In a 4-disk configuration, both can survive losing 1 disk without any data loss. And both configurations will fail if you lose 3 disks. However, the probabilities of losing the RAID array are different if you lose 2 disks.
+
RAID 0+1 and RAID 1+0 may look very similar, but there is a distinct difference when it comes to redundancy. In a 4-disk configuration, both  
 +
can survive losing 1 disk without any data loss. And both configurations will fail if you lose 3 disks. However, the probabilities of losing  
 +
the RAID array are different if you lose 2 disks. The Linux MD raid10 has the same properties ad RAID 1+0.  
  
The number of combinations of 2-drive losses is C(4,2) = 4!/(2! * 2!) = 6. This means that if 2 drives fail, there only 6 different ways this can happen.
+
The number of combinations of 2-drive losses is C(4,2) = 4!/(2! * 2!) = 6. This means that if 2 drives fail, there only 6 different ways this
 +
can happen.
  
 
To determine the probabilities, you must count the number of configurations that would actually cause the entire RAID volume to go down.
 
To determine the probabilities, you must count the number of configurations that would actually cause the entire RAID volume to go down.
Line 55: Line 77:
 
</pre>
 
</pre>
  
If one (out of two) drives from the first stripe fails '''''AND''''' one (out of two) drives from the second stripe fails, then an entire RAID 0+1 array will go down.  
+
If one (out of two) drives from the first stripe fails '''''AND''''' one (out of two) drives from the second stripe fails, then an entire RAI
 +
D 0+1 array will go down.  
 
* x|ok --- x|ok
 
* x|ok --- x|ok
 
* x|ok --- ok|x
 
* x|ok --- ok|x
 
* ok|x --- x|ok
 
* ok|x --- x|ok
 
* ok|x --- ok|x
 
* ok|x --- ok|x
Since there are 4 ways this can happen, P<sub>failure(0+1)</sub> = 4/6 = 0.67%
+
Since there are 4 ways this can happen, P<sub>failure(0+1)</sub> = 4/6 = 67%
  
==== RAID 10 (4 disks) ====
+
==== RAID 1+0 (4 disks) ====
 
<pre>
 
<pre>
 
       [----RAID 0------]
 
       [----RAID 0------]
Line 73: Line 96:
 
* x|x --- ok|ok
 
* x|x --- ok|ok
 
* ok|ok --- x|x
 
* ok|ok --- x|x
Since there are 2 ways this can happen, P<sub>failure(10)</sub> = 1 + 1  / C(4,2) = 2/6 = 0.33%
+
Since there are 2 ways this can happen, P<sub>failure(10)</sub> = 1 + 1  / C(4,2) = 2/6 = 33%
  
 
This means that RAID 10 has twice the probability of surviving the loss of two drives than RAID 0+1.
 
This means that RAID 10 has twice the probability of surviving the loss of two drives than RAID 0+1.
Line 89: Line 112:
 
</pre>
 
</pre>
  
This layout has the same probability of survival as the RAID 0+1 mentioned above.
+
This layout has the same probability of survival for  2 disk crash as the RAID 1+0 mentioned above, namely 67%.
  
 
===SCSI vs IDE===
 
===SCSI vs IDE===
In general, SCSI outperforms IDE in RAID arrays because it is much better at handling multiple data reads/writes at the same time. If you must use IDE, use the fastest controllers aviailable (SATA) and the fastest disks available. Also, put each disk on it's own controller; avoid placing a disk on both channels of any IDE controller.
+
In general, SCSI outperforms IDE in RAID arrays because it is much better at handling multiple data reads/writes at the same time. If you mus
 +
t use IDE, use the fastest controllers aviailable (SATA) and the fastest disks available. Also, put each disk on it's own controller; avoid p
 +
lacing a disk on both channels of any IDE controller.
  
 
'''Notes'''
 
'''Notes'''
* At a given platter RPM, it's not the drive that performs better, but the subsystem. SCSI has the advantage of greater availability of higher RPM (10K and 15K) drives. Though they can be found with SATA interfaces.
+
* At a given platter RPM, it's not the drive that performs better, but the subsystem. SCSI has the advantage of greater availability of highe
 +
r RPM (10K and 15K) drives. Though they can be found with SATA interfaces.
 
* SATA also provides command queueing, the method used by SCSI to better handle multiple access requests.
 
* SATA also provides command queueing, the method used by SCSI to better handle multiple access requests.
 
* SATA uses an individual cable per drive, which means it handles cable failure better than SCSI.
 
* SATA uses an individual cable per drive, which means it handles cable failure better than SCSI.
  
 
===Which one do I choose?===
 
===Which one do I choose?===
If you don't care about your data, go with RAID 0 for speed.
+
If you don't care about your data, go with RAID 0 for speed.  
If you don't want to lose your data, RAID 1 or MD raid10,f2 for 2 disks. If you have 3 or more disks, it's really a toss up between RAID 5 (minimum 3 disks) and RAID 10 (minimum 4 disks, multiples of two thereafter), or raid10,f2 where you can have also an odd number of drives. It comes down to speed and cost. If you don't need speed and don't have much money, go with RAID 5. If you need speed, get the money and go with RAID 10 or MD raid10,f2. In regards to Myth, several have tried RAID 5 and the results are mixed. If you only have 1 or 2 SD tuners, RAID 5 should be fine. Once you get multiple HD tuners or multiple frontends, RAID 5 often can't keep up. Your results may vary.
+
If you don't want to lose your data, RAID 1 or MD raid10,f2 for 2 disks, raid10,f2 is here double as fast for reading. If you have 3 or more  
 +
disks, it's really a toss up between RAID 5 (minimum 3 disks) and RAID 1+0 (minimum 4 disks, multiples of two thereafter), or raid10,f2 where
 +
you can have also an odd number of drives. It comes down to speed and cost. If you don't need speed and don't have much money, go with RAID  
 +
5. If you need speed, get the money and go with RAID 1+0 or MD raid10,f2 (raid10,f2 is double as fast fore reading than RAID 1+0). In regards
 +
to Myth, several have tried RAID 5 and the results are mixed. If you only have 1 or 2 SD tuners, RAID 5 should be fine. Once you get multipl
 +
e HD tuners or multiple frontends, RAID 5 often can't keep up. Your results may vary.
  
 
== RAID For Recordings Drive ==
 
== RAID For Recordings Drive ==
A few Options exist for using RAID for the recording drives, depending on the goal you have for your recordings; speed, redundency or both. RAID 0 or raid10,f2 will allow you to gain the most speed from your drives (raid10,f2 will also give you redundancy), RAID 01 (or RAID 0+1) will give you speed and 1:1 redundency. RAID 5 gives you the most capacity for your money, but write speeds can be pretty bad.
+
A few Options exist for using RAID for the recording drives, depending on the goal you have for your recordings; speed, redundency or both. R
 +
AID 0 or raid10,f2 will allow you to gain the most speed from your drives (raid10,f2 will also give you redundancy), RAID 01 (or RAID 0+1) wi
 +
ll give you some speed and 1:1 redundency. RAID 5 gives you the most capacity for your money, but write speeds can be pretty bad.
  
 
== RAID For Archives Drive ==
 
== RAID For Archives Drive ==
Having an independent drive array for archival of shows one wishes to keep allows the user to setup a RAID for speed for the recordings drive and a RAID for backup for the archival drive. This way, once a show has been recorded, commercial flagged and possibly even transcoded to another format or for permanent commercial removal, it can be moved to the archive. In such a case, RAID 5 and RAID 10 make the most sense. If you plan on a large amount of access to the archive, a RAID 10 may make more sense as it will most easily keep up with the transfer rate requirement while still allowing for redundency, but at the cost of the price for obtaining the number of drives requried. RAID 5 will have a slight speed advantage over just having numerous drives (JBOD, Just a Bunch Of Disks in hardware RAID, linear in mdadm), but will also have the advantage of getting the most archival bang-for-your-buck while still maintaining parity for the case of a lost drive.
+
Having an independent drive array for archival of shows one wishes to keep allows the user to setup a RAID for speed for the recordings drive
 +
and a RAID for backup for the archival drive. This way, once a show has been recorded, commercial flagged and possibly even transcoded to an
 +
other format or for permanent commercial removal, it can be moved to the archive. In such a case, RAID 5, raid10,f2 and RAID 1+0 make the mos
 +
t sense. If you plan on a large amount of access to the archive, a raid10,f2 or RAID 1+0 may make more sense as it will most easily keep up w
 +
ith the transfer rate requirement while still allowing for redundency, but at the cost of the price for obtaining the number of drives requri
 +
ed. RAID 5 will have a slight speed advantage over just having numerous drives (JBOD, Just a Bunch Of Disks in hardware RAID, linear in mdadm
 +
), but will also have the advantage of getting the most archival bang-for-your-buck while still maintaining parity for the case of a lost dri
 +
ve.
  
 
== Setup (Software RAID) ==
 
== Setup (Software RAID) ==
For setting up hardware RAID, see your RAID controller's documentation. The array will then appear as a single disk within your OS. For software RAID, creating a RAID array with [http://man-wiki.net/index.php/8:mdadm mdadm] is quite easy. The Linux RAID HOWTO [http://linux-raid.osdl.org/index.php/Performance Performance] or Software-RAID HOWTO [http://www.tldp.org/HOWTO/Software-RAID-HOWTO-9.html Performance] section will help here, as different RAID types have different best values for chunk and block sizes. Since we will be dealing with only large files (recorded mpegs. music files, etc) it is recommended to choose the largest chunk and block value that combine for the highest performance.
+
For setting up hardware RAID, see your RAID controller's documentation. The array will then appear as a single disk within your OS. For softw
 +
are RAID, creating a RAID array with [http://man-wiki.net/index.php/8:mdadm mdadm] is quite easy. The Linux RAID HOWTO [http://linux-raid.osd
 +
l.org/index.php/Performance Performance] or Software-RAID HOWTO [http://www.tldp.org/HOWTO/Software-RAID-HOWTO-9.html Performance] section wi
 +
ll help here, as different RAID types have different best values for chunk and block sizes. Since we will be dealing with only large files (r
 +
ecorded mpegs. music files, etc) it is recommended to choose the largest chunk and block value that combine for the highest performance.
  
 
=== Partitioning ===
 
=== Partitioning ===
Before a RAID array can be created on a disk it must be partitioned, you can use cfdisk, fdisk, sfdisk or parted. The easiest way is to create a full drive partition.  
+
Before a RAID array can be created on a disk it must be partitioned, you can use cfdisk, fdisk, sfdisk or parted. The easiest way is to creat
 +
e a full drive partition.  
  
 
{{Note box|You must however, also set the type to "fd" or "Linux raid autodetect"!}}
 
{{Note box|You must however, also set the type to "fd" or "Linux raid autodetect"!}}
Line 130: Line 175:
 
</nowiki></pre>
 
</nowiki></pre>
  
=== RAID 10 ===
+
=== RAID 1+0 ===
  
RAID 10 is really the creation of 2 or more arrays. First you create the number of RAID 1, mirrored, arrays you wish to have,
+
RAID 1+0 is really the creation of 2 or more arrays. First you create the number of RAID 1, mirrored, arrays you wish to have,
 
<pre><nowiki>
 
<pre><nowiki>
 
# mdadm -v --create /dev/md0 --chunk=32 --level=raid1 --raid-devices=2 /dev/sda1 /dev/sdb1
 
# mdadm -v --create /dev/md0 --chunk=32 --level=raid1 --raid-devices=2 /dev/sda1 /dev/sdb1
Line 149: Line 194:
  
 
Note that this can even be done with only 2 drives (-n 2).  
 
Note that this can even be done with only 2 drives (-n 2).  
For newer drives as of 2008 it is recommended by the people on the linux-raid kernel mailing list to use chunk sizes between 256 kiB and 1 MiB.
+
For newer drives as of 2008 it is recommended by the people on the linux-raid kernel mailing list to use chunk sizes between 256 kiB and 1 Mi
 +
B.
  
 
=== RAID Creation Confirmation ===
 
=== RAID Creation Confirmation ===
Line 196: Line 242:
  
 
=== RAID Filesystem Creation ===
 
=== RAID Filesystem Creation ===
Once your RAID array is created you can place a filesystem on it. JFS and XFS are the two recommended filesystems for large file arrays, especially for the recordings drive in Myth. Again, we will use The Software-RAID HOWTO [http://www.tldp.org/HOWTO/Software-RAID-HOWTO-9.html Performance] section and go with a 4K (4096) block size.
+
Once your RAID array is created you can place a filesystem on it. JFS and XFS are the two recommended filesystems for large file arrays, espe
 +
cially for the recordings drive in Myth. Again, we will use The Software-RAID HOWTO [http://www.tldp.org/HOWTO/Software-RAID-HOWTO-9.html Per
 +
formance] section and go with a 4K (4096) block size.  
  
 
For XFS (replace md0 with your final RAID array if using a mixed mode array),
 
For XFS (replace md0 with your final RAID array if using a mixed mode array),
Line 224: Line 272:
  
 
=== Monitoring ===
 
=== Monitoring ===
Most distributions have an init.d daemon setup to monitor your mdadm arrays that will monitor your arrays and allow you to be notified when anything of note occurs.
+
Most distributions have an init.d daemon setup to monitor your mdadm arrays that will monitor your arrays and allow you to be notified when a
 +
nything of note occurs.
  
 
=== Software Raid Online Capacity Expansion (OCE) (for raid 5 with XFS) ===
 
=== Software Raid Online Capacity Expansion (OCE) (for raid 5 with XFS) ===
  
Online Capacity Expansion (OCE) allows you to add another hard drive to an already defined and set raid array. For example adding a fifth drive to a 4 drive raid 5 array. OCE reshapes the data so it will span all 5 drives and then allows you to use a file system grow command to make use of the new space. This is all done while the raid system is active and even allows you to continue to use your drives while you are adding a new drive. Previously this feature was available on high end hardware raid cards only.
+
Online Capacity Expansion (OCE) allows you to add another hard drive to an already defined and set raid array. For example adding a fifth dri
 +
ve to a 4 drive raid 5 array. OCE reshapes the data so it will span all 5 drives and then allows you to use a file system grow command to mak
 +
e use of the new space. This is all done while the raid system is active and even allows you to continue to use your drives while you are add
 +
ing a new drive. Previously this feature was available on high end hardware raid cards only.
  
 
I was able to do a raid 5 disk expansion in mdadm software raid with no ill effects. Following this
 
I was able to do a raid 5 disk expansion in mdadm software raid with no ill effects. Following this
Line 234: Line 286:
 
the array. From 300gb x 4 raid 5 to 300gb x 5 raid 5. No lvm just an md0
 
the array. From 300gb x 4 raid 5 to 300gb x 5 raid 5. No lvm just an md0
 
mdadm device and I was still able to make 2 simultaneous HD recordings
 
mdadm device and I was still able to make 2 simultaneous HD recordings
 +
and watch an HD recording while this was going on.
 +
 +
page used as guide:
 +
http://scotgate.org/?p=107
 +
 +
for me the process was:
 +
mdadm --add /dev/md0 /dev/sde1
 +
This would be different for each user based on the name of the raid filesystem and the drive you are wanting to add to it.
 +
 +
then:
 +
mdadm --grow /dev/md0 --raid-devices=5
 +
 +
to find details about your raid reshaping status use:
 +
cat /proc/mdstat
 +
 +
to speed up reshaping use (fill in whatever speed you want where the
 +
100000 is. Default is 10000 :
 +
echo -n 100000 > /proc/sys/dev/raid/speed_limit_max
 +
The speed_limit_max entry there controls how fast the raid array
 +
rebuilds (how much of the array's bandwidth is available to the rebuild
 +
process). Make that bandwidth number higher and it goes faster but uses
 +
more of the throughput of the hard drives (leaving less available to say
 +
Myth recording on the degraded array). The array will be in a degraded state until reshaping is finished.
 +
 +
Now its time to grow your xfs filesystem (or substitute for your growing file system):
 
and watch an HD recording while this was going on.
 
and watch an HD recording while this was going on.
  
Line 266: Line 343:
 
= Links =
 
= Links =
 
Great page with information on the different hardware and software raid chipsets, their current linux support
 
Great page with information on the different hardware and software raid chipsets, their current linux support
* [http://linuxmafia.com/faq/Hardware/sata.html Serial ATA (SATA) chipsets Linux support status]
+
* [http://linuxmafia.com/faq/Hardware/sata.html Serial ATA (SATA) chipsets Linux support status]
 
Wikipedia Entry for RAID
 
Wikipedia Entry for RAID
 
* [[Wikipedia:Redundant array of independent disks|Redundant array of independent disks]]
 
* [[Wikipedia:Redundant array of independent disks|Redundant array of independent disks]]

Revision as of 23:40, 1 July 2008

Wikipedia-logo-en.png
Wikipedia has an article on:
RAID or Redundant Array of Inexpensive Disks is a mechanism for using multiple disk drives to provide redundant [[file sto

rage]].

Quick Overview

Performance Expectations

There are many 'opinions' on what RAID level is best for performance, since it can vary greatly depending on many factors, but with all thing s being equal, here's the facts about the most common RAID levels:

  • RAID 5 is the slowest*
  • RAID 1 is in the middle (speed is equivelent to not using RAID)
  • RAID 0 (and RAID 0+1, 10, 1+0, etc) is the fastest

For reading big files, like videos, RAID 5 can be quite fast.

And here's why:

RAID 5

RAID 5 (striping with parity) is the slowest because your computer has to calculate parity for every write operation and then write that parity to disk. This can be considerable overhead with software RAID or budget RAID controllers. Parity is the distrubed data that allows

you to lose any hard drive but still not lose data. Read speads are very good, better than RAID 1. *You can significantly increase RAID 5 performance, sometimes even get it faster than RAID 1 performance, if you do the following:
  • Use hardware RAID solution
  • Use a Battery-Backed Write-back Cache, or BBWBC (only available on high-end RAID controllers). These only help data bursts, up to the size of the cache (typically 16-256 MB), not extended write operations.
  • Add more disks

In general, unless you have server-class hardware and SCSI disks, you can always expect RAID 5 writing to be slower than RAID 1 or 0. Reading is very fast, can be similar to RAID 0.

If you use Linux to create big files sequentially, Linux MD RAID 5 is very fast, as there is no need to read the parity information, the parity information is calculated anew from the data of the new file. RAID 5 is then almost as fast as RAID 0.

RAID 1

RAID 1 (mirroring) is typically not faster nor slower than using a single disk for writing. For reading it could use all the disks in parallel and thus improve the performance.

RAID 0

RAID 0 (striping) is the fastest with reading and writing because your computer reads and writes different data to two disks at the same time, theoretically doubling the performance of RAID 1. (Note: RAID 0 is not the same as disks spanning, or extending, a volume. Spanning is not RAID and provides no performance change or redundancy.) Software RAID 0 can be nearly as fast as hardware RAID 0. You can significantly increase the speed of RAID 0 by:

  • Addings more disks
RAID 0+1 (or 1+0, 01, 10)

RAID 0+1 (mirroring a striped set) is half as fast as RAID 0 with the same number of disks. Like RAID 0, you can make it even faster by adding more disks. Note: There is some argument over the difference in performance between RAID 10 and RAID 01. In other words, should you stripe a mirror, or mirror a stripe? You can safely ignore those that argue this, as it really doesn't matter on the subject of performance. Most hardware vendors stripe first; 0+1. However, on the subject of redundancy, RAID 10 has higher chances than RAID 0+1 of surviving ce rtain hardware failures.

Linux MD RAID10 F2

Linux MD RAID10 in the F2 layout is as fast as RAID 0 for reading, and half as fast as RAID 0 for writing. Please note that Linux MD raid10 is something different from what is called RAID10 or RAID1+0 in other places. First, this is not a nested ra id of first a RAID1 and then a RAID0; instead this is one set of array only, created by only one mdadm command. Second, you may make a stripe d and mirrored array consisting of only 2 drives, where traditional RAID1+0 requires 4 drives. This means that the reading speed of Linux MD raid10 (in the f2 layout) has double the reading speed compared to RAID1 and RAID1+0, given the same number of drives.

Capacity

Assuming all disk are of equal size (If this isn't the case, use the size of your smallest disk), where N = number of disks and C = Disk Capacity:

  • RAID 0 = N x C (Total capacity of all disks, or 100% efficient)
  • RAID 1 = 1 x C (Capacity of one disk, efficiency varies, no more than 50% efficient, decreasing as drives are added)
  • RAID 5 = (N x C) - C (Total capacity minus one disk, efficiency varies, but no less than 66% and increases as drives are added)
  • RAID 0+1, RAID 1+0 or Linux MD raid10 = (N x C) / 2 (Half the total capacity of all disks, or 50% efficient)

Redundancy

  • RAID 0 = No redundancy. Losing any disk results in total data loss
  • RAID 1 = Lose all but 1 disk without any data loss.
  • RAID 5 = Lose 1 disk without any data loss.
  • RAID 0+1 = Lose up to half your disks (if they are on the same stripe set) without any data loss. If one disk from each stripe set is lo

st, all data is lost.

  • RAID10 (Linux MD) = Lose up to half of your disks, similar to RAID 1+0, but for bigger number of disks the chances of surviving more disk cr

ashes are better, as all disks are in one set of disks, instead of a RAID 0 array of a number of RAID 1 arrays.

RAID 0+1 and RAID 1+0 may look very similar, but there is a distinct difference when it comes to redundancy. In a 4-disk configuration, both can survive losing 1 disk without any data loss. And both configurations will fail if you lose 3 disks. However, the probabilities of losing the RAID array are different if you lose 2 disks. The Linux MD raid10 has the same properties ad RAID 1+0.

The number of combinations of 2-drive losses is C(4,2) = 4!/(2! * 2!) = 6. This means that if 2 drives fail, there only 6 different ways this

can happen.

To determine the probabilities, you must count the number of configurations that would actually cause the entire RAID volume to go down.

RAID 0+1 (4 disks)

      [----RAID 1------]
  ____|____        ____|____
 | RAID 0  |      | RAID 0  |
 |____|____|      |____|____|

If one (out of two) drives from the first stripe fails AND one (out of two) drives from the second stripe fails, then an entire RAI D 0+1 array will go down.

  • x|ok --- x|ok
  • x|ok --- ok|x
  • ok|x --- x|ok
  • ok|x --- ok|x

Since there are 4 ways this can happen, Pfailure(0+1) = 4/6 = 67%

RAID 1+0 (4 disks)

      [----RAID 0------]
  ____|____        ____|____
 | RAID 1  |      | RAID 1  |
 |____|____|      |____|____|

If both drives on the first mirror fail OR both drives on the second mirror fail, then an entire RAID 10 array will go down.

  • x|x --- ok|ok
  • ok|ok --- x|x

Since there are 2 ways this can happen, Pfailure(10) = 1 + 1 / C(4,2) = 2/6 = 33%

This means that RAID 10 has twice the probability of surviving the loss of two drives than RAID 0+1.

The Linux MD raid10 has another layout, for an array of 4 drives created with

  mdadm -C /dev/md0 -n 4 -l 10 -p f2 /dev/sda1 /dev/sdb1 /dev/sdc1 /dev/sdd1

will give the following layout:

     [------------MD RAID10-----------]
  ___|___   ____|___    ___|___    ___|___
 | disk1 |  | disk2 |  | disk3 |  | disk4 |
 |___|___|  |___|___|  |___|___|  |___|___|

This layout has the same probability of survival for 2 disk crash as the RAID 1+0 mentioned above, namely 67%.

SCSI vs IDE

In general, SCSI outperforms IDE in RAID arrays because it is much better at handling multiple data reads/writes at the same time. If you mus t use IDE, use the fastest controllers aviailable (SATA) and the fastest disks available. Also, put each disk on it's own controller; avoid p lacing a disk on both channels of any IDE controller.

Notes

  • At a given platter RPM, it's not the drive that performs better, but the subsystem. SCSI has the advantage of greater availability of highe

r RPM (10K and 15K) drives. Though they can be found with SATA interfaces.

  • SATA also provides command queueing, the method used by SCSI to better handle multiple access requests.
  • SATA uses an individual cable per drive, which means it handles cable failure better than SCSI.

Which one do I choose?

If you don't care about your data, go with RAID 0 for speed. If you don't want to lose your data, RAID 1 or MD raid10,f2 for 2 disks, raid10,f2 is here double as fast for reading. If you have 3 or more disks, it's really a toss up between RAID 5 (minimum 3 disks) and RAID 1+0 (minimum 4 disks, multiples of two thereafter), or raid10,f2 where

you can have also an odd number of drives. It comes down to speed and cost. If you don't need speed and don't have much money, go with RAID 

5. If you need speed, get the money and go with RAID 1+0 or MD raid10,f2 (raid10,f2 is double as fast fore reading than RAID 1+0). In regards

to Myth, several have tried RAID 5 and the results are mixed. If you only have 1 or 2 SD tuners, RAID 5 should be fine. Once you get multipl

e HD tuners or multiple frontends, RAID 5 often can't keep up. Your results may vary.

RAID For Recordings Drive

A few Options exist for using RAID for the recording drives, depending on the goal you have for your recordings; speed, redundency or both. R AID 0 or raid10,f2 will allow you to gain the most speed from your drives (raid10,f2 will also give you redundancy), RAID 01 (or RAID 0+1) wi ll give you some speed and 1:1 redundency. RAID 5 gives you the most capacity for your money, but write speeds can be pretty bad.

RAID For Archives Drive

Having an independent drive array for archival of shows one wishes to keep allows the user to setup a RAID for speed for the recordings drive

and a RAID for backup for the archival drive. This way, once a show has been recorded, commercial flagged and possibly even transcoded to an

other format or for permanent commercial removal, it can be moved to the archive. In such a case, RAID 5, raid10,f2 and RAID 1+0 make the mos t sense. If you plan on a large amount of access to the archive, a raid10,f2 or RAID 1+0 may make more sense as it will most easily keep up w ith the transfer rate requirement while still allowing for redundency, but at the cost of the price for obtaining the number of drives requri ed. RAID 5 will have a slight speed advantage over just having numerous drives (JBOD, Just a Bunch Of Disks in hardware RAID, linear in mdadm ), but will also have the advantage of getting the most archival bang-for-your-buck while still maintaining parity for the case of a lost dri ve.

Setup (Software RAID)

For setting up hardware RAID, see your RAID controller's documentation. The array will then appear as a single disk within your OS. For softw are RAID, creating a RAID array with mdadm is quite easy. The Linux RAID HOWTO [http://linux-raid.osd l.org/index.php/Performance Performance] or Software-RAID HOWTO Performance section wi ll help here, as different RAID types have different best values for chunk and block sizes. Since we will be dealing with only large files (r ecorded mpegs. music files, etc) it is recommended to choose the largest chunk and block value that combine for the highest performance.

Partitioning

Before a RAID array can be created on a disk it must be partitioned, you can use cfdisk, fdisk, sfdisk or parted. The easiest way is to creat e a full drive partition.


Important.png Note: You must however, also set the type to "fd" or "Linux raid autodetect"!

RAID 5

The following line will create a RAID array with the following characteristics:

  • RAID 5 on /dev/md0
  • 3 drives, /dev/sda1, /dev/sdb1, and dev/sdc1
  • chunk size = 32K
  • no spare
  • verbose level of output
# mdadm -v --create /dev/md0 --force --chunk=32 --level=raid5 \
      --spare-devices=0 --raid-devices=3 /dev/sda1 /dev/sdb1 /dev/sdc1

RAID 1+0

RAID 1+0 is really the creation of 2 or more arrays. First you create the number of RAID 1, mirrored, arrays you wish to have,

# mdadm -v --create /dev/md0 --chunk=32 --level=raid1 --raid-devices=2 /dev/sda1 /dev/sdb1
# mdadm -v --create /dev/md1 --chunk=32 --level=raid1 --raid-devices=2 /dev/sdc1 /dev/sdd1

and so on until you have the number of drives you wish to concatenate into a RAID 0.

Once these have completed building (see below), you can create the RAID 0, striped, array,

# mdadm -v --create /dev/md2 --chunk=32 --level=raid0 --raid-devices=2 /dev/md0 /dev/md1

The Linux MD raid10 has another way to be created - it uses only one mdadm command. For an array of 4 drives use:

# mdadm -C /dev/md0 --chunk=256 -n 4 -l 10 -p f2 /dev/sda1 /dev/sdb1 /dev/sdc1 /dev/sdd1

Note that this can even be done with only 2 drives (-n 2). For newer drives as of 2008 it is recommended by the people on the linux-raid kernel mailing list to use chunk sizes between 256 kiB and 1 Mi B.

RAID Creation Confirmation

You will be prompted with the RAID parameters, and asked to continue,

mdadm: layout defaults to left-symmetric
mdadm: /dev/sdc1 appears to contain a reiserfs file system
    size = -192K
mdadm: size set to 293049600K
Continue creating array?

Status of RAID Creation

Upon confirmation you will only see

mdadm: array /dev/md0 started.

Once you run the command to create the RAID array, if you want to see the progress run,

# cat /proc/mdstat

and you will see something along the lines of,

# cat /proc/mdstat
Personalities : [linear] [raid0] [raid1] [raid5] [multipath] [raid6] [raid10]
md0 : active raid5 sdc1[3] sdb1[1] sda1[0]
      586099200 blocks level 5, 32k chunk, algorithm 2 [3/2] [UU_]
      [=>...................]  recovery =  5.8% (17266560/293049600) finish=69.8min speed=65760K/sec

unused devices: <none>

Generate Config File

Now we need to setup '/etc/mdadm.conf', this can be done by copying the output of

# mdadm --detail --scan

to '/etc/mdadm.conf', which should end up looking similar to,

ARRAY /dev/md0 level=raid5 num-devices=3 UUID=2d918524:a32c7867:11db7af5:0053440d
devices=/dev/sda1,/dev/sdb1,/dev/sdc1

RAID Filesystem Creation

Once your RAID array is created you can place a filesystem on it. JFS and XFS are the two recommended filesystems for large file arrays, espe cially for the recordings drive in Myth. Again, we will use The Software-RAID HOWTO [http://www.tldp.org/HOWTO/Software-RAID-HOWTO-9.html Per formance] section and go with a 4K (4096) block size.

For XFS (replace md0 with your final RAID array if using a mixed mode array),

mkfs.xfs -b size=4096 -L Recordings /dev/md0 -f

or for JFS,

mkfs.jfs -c -L Recordings /dev/md0 -f

(It is not recommended to use a JFS partition for your boot drive when using GRUB)

Mounting

Thats it, now you are ready to mount the filesystem! You can add a line to your /etc/fstab similar to,

/dev/md0       /MythTV/tv             xfs     defaults        0       0

or

/dev/md0       /MythTV/tv             jfs     defaults        0       0

which will mount the filesystem upon boot and allow the automount option of mount to work, so go ahead and mount the filesystem,

# mount -a

Monitoring

Most distributions have an init.d daemon setup to monitor your mdadm arrays that will monitor your arrays and allow you to be notified when a nything of note occurs.

Software Raid Online Capacity Expansion (OCE) (for raid 5 with XFS)

Online Capacity Expansion (OCE) allows you to add another hard drive to an already defined and set raid array. For example adding a fifth dri ve to a 4 drive raid 5 array. OCE reshapes the data so it will span all 5 drives and then allows you to use a file system grow command to mak e use of the new space. This is all done while the raid system is active and even allows you to continue to use your drives while you are add ing a new drive. Previously this feature was available on high end hardware raid cards only.

I was able to do a raid 5 disk expansion in mdadm software raid with no ill effects. Following this page as a guide and it worked perfectly. Took about 6 hours to reshape the array. From 300gb x 4 raid 5 to 300gb x 5 raid 5. No lvm just an md0 mdadm device and I was still able to make 2 simultaneous HD recordings and watch an HD recording while this was going on.

page used as guide:

http://scotgate.org/?p=107

for me the process was:

mdadm --add /dev/md0 /dev/sde1

This would be different for each user based on the name of the raid filesystem and the drive you are wanting to add to it.

then:

mdadm --grow /dev/md0 --raid-devices=5

to find details about your raid reshaping status use:

cat /proc/mdstat

to speed up reshaping use (fill in whatever speed you want where the 100000 is. Default is 10000 :

echo -n 100000 > /proc/sys/dev/raid/speed_limit_max

The speed_limit_max entry there controls how fast the raid array rebuilds (how much of the array's bandwidth is available to the rebuild process). Make that bandwidth number higher and it goes faster but uses more of the throughput of the hard drives (leaving less available to say Myth recording on the degraded array). The array will be in a degraded state until reshaping is finished.

Now its time to grow your xfs filesystem (or substitute for your growing file system): and watch an HD recording while this was going on.

page used as guide:

http://scotgate.org/?p=107

for me the process was:

mdadm --add /dev/md0 /dev/sde1

This would be different for each user based on the name of the raid filesystem and the drive you are wanting to add to it.

then:

mdadm --grow /dev/md0 --raid-devices=5

to find details about your raid reshaping status use:

cat /proc/mdstat

to speed up reshaping use (fill in whatever speed you want where the 100000 is. Default is 10000 :

echo -n 100000 > /proc/sys/dev/raid/speed_limit_max

The speed_limit_max entry there controls how fast the raid array rebuilds (how much of the array's bandwidth is available to the rebuild process). Make that bandwidth number higher and it goes faster but uses more of the throughput of the hard drives (leaving less available to say Myth recording on the degraded array). The array will be in a degraded state until reshaping is finished.

Now its time to grow your xfs filesystem (or substitute for your growing file system):

xfs_growfs (path to mounted raid filesystem) 

Spinning down hard drives

Area is outdated. See wiki entry on spindown drives here: http://mythtv.org/wiki/index.php/Spindown_drives

Links

Great page with information on the different hardware and software raid chipsets, their current linux support

Wikipedia Entry for RAID

mdadm MAN page (via man-wiki)

Software-RAID HOWTO

Linux MD RAID HOWTO