Difference between revisions of "File storage"

From MythTV Official Wiki
Jump to: navigation, search
(Storage Groups)
(Storage Groups: Redundant)
(32 intermediate revisions by 12 users not shown)
Line 1: Line 1:
 
'''File storage''' refers to the broad topic of hardware, software and the methodology behind keeping MythTV recordings on a computer (and computer network).
 
'''File storage''' refers to the broad topic of hardware, software and the methodology behind keeping MythTV recordings on a computer (and computer network).
  
As everyone knows, storing copious amounts of video requires bucketloads of hard drive space. MythTV is exceptionally good at producing the requisite video, so we are going to need somewhere to store it — the question is how much, and for how long?
+
Pretty much any reasonably modern hard drive will be more than adequate both in space and speed for MythTV. With the introduction of [[Storage Groups]] in 0.21 you can use as many drives as you like without the hassle of LVM or Raid.
 
+
I do not watch a huge amount of TV, so my immediate storage requirements are not that colossal, and all of my MythTV space was cobbled together from whatever IDE drives I had lying around. However, I am a complete hoarder, and do not like deleting things unless I have to, or they are rubbish. As such, my storage requirements have started out basic and will eventually turn into multiple petabyte boxes over a fibre-channel SCSI SAN array. Well, maybe ;)
+
 
+
My chosen hardware (I will use my PVR-250 as an example, since it is probably the most common TV card in use that I have access to) records in MPEG2 format, and takes up about 1GB an hour. This can of course be increased or decreased by altering the quality, but we are still talking about a lot of space over time.
+
 
+
Pretty much any reasonably modern hard drive will be more than adequate both in space and speed for MythTV, but my advice to you is to buy the biggest hard drive(s) you can afford. Currently in the UK, the "sweet spot" is around the 200GB barrier; these drives offer the best GB/$ ratio.
+
  
 
==Manufacturers==
 
==Manufacturers==
[[Image:Important.png|left]] Please note that this section will be full of hearsay, personal bias and much that is probably apocryphal, so take it with a pinch of salt, and always ask around. If you are adding anecdotal datapoints about a specific manufacturer, please indent one level and ''sign'' your comments.<br style="clear:left;"/>
+
{{Note box|This section will be full of hearsay, personal bias and much that is probably apocryphal, so take it with a pinch of salt, and always ask around. If you are adding anecdotal datapoints about a specific manufacturer, please indent one level and ''sign'' your comments.}}
  
 
The current [Mar-07] major manufacturers of consumer hard drives are Seagate, Western Digital, Maxtor (Now owned by Seagate), IBM/Hitachi and Samsung who all make IDE devices ranging up to 400GB, available in parallel (PATA) or the newer serial (SATA) interfaces, or as SCSI, though those have not caught up on the size front.
 
The current [Mar-07] major manufacturers of consumer hard drives are Seagate, Western Digital, Maxtor (Now owned by Seagate), IBM/Hitachi and Samsung who all make IDE devices ranging up to 400GB, available in parallel (PATA) or the newer serial (SATA) interfaces, or as SCSI, though those have not caught up on the size front.
Line 16: Line 10:
 
*Western Digital drives are slightly better performers than the Seagates, at the expense of being noisier
 
*Western Digital drives are slightly better performers than the Seagates, at the expense of being noisier
 
*All of the Maxtor drives I have used have been ''very'' noisy, and not particularly reliable  
 
*All of the Maxtor drives I have used have been ''very'' noisy, and not particularly reliable  
**Note: All of the Maxtor drives ''I'' have used have been extremely reliable, and not very noisy at all. I have had only 2 fail in the six (?) years I have been using them in my computers and computers I build for customers. One was due to overheating (in a tiny amount of space in a hot area). The other -- I think was just a bad manufacture. Both times, Maxtor replaced the drives (and upgraded them, for free) for me with no hassle, and quite quickly, I might add. Just another personal opinion =) --[[Tyler Drake]]
+
**Note: All of the Maxtor drives ''I'' have used have been extremely reliable, and not very noisy at all. I have had only 2 fail in the six (?) years I have been using them in my computers and computers I build for customers. One was due to overheating (in a tiny amount of space in a hot area). The other -- I think was just a bad manufacture. Both times, Maxtor replaced the drives (and upgraded them, for free) for me with no hassle, and quite quickly, I might add. Just another personal opinion =) --[[User:TylerDrake|Tyler Drake]]
 
**My Maxtor [[Diamond Max]] 10 250GB is as quiet as a church mouse. --DavidC
 
**My Maxtor [[Diamond Max]] 10 250GB is as quiet as a church mouse. --DavidC
 
**I've had about 80% failure rate with Maxtors (40-250GB PATA) in the couple of dozen I've put in customer's boxes. They seem quiet though. My failure rate with Maxtors approaches 100% when they are attached to a TX2000 Promise RAID controller. They do normally last around a year so that they are out of warranty. I've had excellent luck with Seagates and you've got to love that 5yr warranty.
 
**I've had about 80% failure rate with Maxtors (40-250GB PATA) in the couple of dozen I've put in customer's boxes. They seem quiet though. My failure rate with Maxtors approaches 100% when they are attached to a TX2000 Promise RAID controller. They do normally last around a year so that they are out of warranty. I've had excellent luck with Seagates and you've got to love that 5yr warranty.
 
**Since Seagate bought Maxtor and has turned them into their "lower" line of drives with a 3 year warranty instead of their usual 5 year.  But it also has seemed to improve their quality.
 
**Since Seagate bought Maxtor and has turned them into their "lower" line of drives with a 3 year warranty instead of their usual 5 year.  But it also has seemed to improve their quality.
 +
**My Maxtor 6L060L3 (A 60GB 7200 U/133 D740X with FDB; Fluid Dynamic Bearings) was extremely silent and at the time fast.  The only time it squeaked was as it was dying (infantile; warranty replaced) but my replacement was equal.  My Maxtor 96147U8 (60G 5400 U/66) was noticably slower and made reasonable amount of noise.  I also had a I had borrowed a WD450AA from a friend and that was extremely loud for a slow drive (<10k RPM).  It was comparable to some 10ks I dealt with at work.  For reliability across the board (laptop to enterprise drives), I prefer Seagate.  -Gene
 
*IBM/Hitachi are recovering from a somewhat tarnished reputation from their "Deathstar" line of hard drives, and are producing some very good SATA drives, although I've never used any myself
 
*IBM/Hitachi are recovering from a somewhat tarnished reputation from their "Deathstar" line of hard drives, and are producing some very good SATA drives, although I've never used any myself
 
*Samsung Spinpoint drives are getting rave reviews (fast, quiet ''and'' reliable) from a lot of users here in the UK, although again I've never used them myself.
 
*Samsung Spinpoint drives are getting rave reviews (fast, quiet ''and'' reliable) from a lot of users here in the UK, although again I've never used them myself.
  
 
Based on my current bias, I can recommend the Seagate Barracuda drives for situations where you want quiet drives, although I prefer Western Digital for situations where noise isn't much of a problem.
 
Based on my current bias, I can recommend the Seagate Barracuda drives for situations where you want quiet drives, although I prefer Western Digital for situations where noise isn't much of a problem.
 +
 +
Comparing cheap economy drives with a short warranty to higher end more expensive drives with longer warranty is nothing less than inept. Those drives are not comparable. They are intended for different markets and so are built to different standards. Using that comparison as a basis to criticize the brand as a whole is even more ridiculous. The same thinking follows in these statements: My Chevy Vega kept breaking and was junk, therefore all GM cars are junk. My Lincoln Town Car was fantastic and never ever broke, therefore all Fords cars are great and last forever.
 +
 +
.
  
 
==Interfaces==
 
==Interfaces==
Line 31: Line 30:
 
*SCSI
 
*SCSI
 
*SAS
 
*SAS
 +
 +
Bear in mind that a drive's performance as a video source depends on the 'sustained transfer rate' of the drive which has nothing to do with the interface type or speed. The sustained transfer rate is limited by the 'media transfer rate'. The media transfer rate is the rate of data transfer between the head and the disc surfaces. It is a ''physical'' limitation that is shared by all hard drives regardless of the interface. The only things that affect it are: the data density on the disc, the physical size (read/write area) of the heads, and the rotational speed of the drive. All drives with the same rotational speed and data density will have approximately the same media transfer rate. A high end 7200 RPM drive can achieve a max media transfer rate of approximately 80 MB/sec ''regardless'' of the interface being SATA, PATA, or SCSI.
 +
-
 +
Drive manufacturers don't want you to know this and divert attention from it by emphasizing the interface's speed in their ads. The interface's speed is only an advantage during data bursts. Most manufacturers went so far as to stop listing the media transfer rates in their specification tables.
 +
-
 +
With drive of a given rotational velocity and data density, the only way to improve overall system performance is to use a form of RAID that uses Stripping. This effectively uses two drives simultaneously so that the total media transfer rate is doubled.
  
 
===(P)ATA===
 
===(P)ATA===
As a much older standard, PATA is universally supported on most x86 hardware. This interface was originally called ATA but when Serial ATA (SATA) was introduced it was renamed Parallel ATA.
+
As a much older standard, PATA is universally supported on most x86 hardware. This interface was originally called ATA but when Serial ATA (SATA) was introduced it was renamed Parallel ATA.  Recently, this function on motherboards has been shifted to a 3rd party controller, and boards only offer one port (two drives).
  
 
===SATA===
 
===SATA===
Most new hard drives and motherboards come with support for the newer SATA interface. Although SATA is a superior standard (it supports a lot of the SCSI subset, and features much smaller, thinner cables than PATA, amongst other improvements), it has been somewhat plagued in Linux by closed source SATA controller drivers. This has resulted in many Linux-based systems being unable to use SATA adequately due to poorly functioning controllers.  
+
Most new hard drives and motherboards come with support for the newer Serial ATA (SATA) interface. Although SATA is a superior standard (it supports a lot of the SCSI subset, and features much smaller, thinner cables than PATA, amongst other improvements), some SATA controllers have closed-source or no Linux drivers. This has resulted in some Linux-based systems being unable to use SATA adequately due to poorly functioning controllers. This situation is no longer as serious as it once was, but you should check your hardware driver support to be sure.
  
For the current status of SATA support under Linux you can check these two pages:
+
For the current status of SATA support under Linux you can check [http://linuxmafia.com/faq/Hardware/sata.html Serial ATA (SATA) on Linux].
*[http://linuxmafia.com/faq/Hardware/sata.html Serial ATA (SATA) on Linux]
+
*[http://linux-ata.org/driver-status.html Serial ATA (SATA) Linux status report]
+
  
A quick gander through my kernel config shows me that the following controllers are supported under Linux 2.6.8:
+
Please bear in mind that the built-in software RAID functions on SATA chips will usually not work in Linux without extensive fooling around with the kernel (if at all). Because Linux provides its own software RAID features, this isn't a big loss for a dedicated Linux box (such as a MythTV system), but if you dual-boot, you may not be able to use the controller's software RAID.
*Intel ICH5
+
*nVidia SATA (nForce chipsets)
+
*Promise SX4, TX2 and TX4
+
*Silicon Image SATA controllers (very common - presumably both the 3112 and 3114 are supported now)
+
*SiS 964/180
+
*VIA SATA (VIA chipsets)
+
*Vitesse VSC7174
+
 
+
Please bear in mind that the built-in software RAID functions on SATA chips will usually not work in Linux without extensive fooling around with the kernel (if at all), and that performance using the open source drivers may be less of that than the closed source proprietary drivers. If you wish for better SATA support under Linux, write to the device manufacturer and ask them to provide some open specs for the kernel hacking team!
+
  
 
===SCSI===
 
===SCSI===
SCSI stands for Small Computer Systems Interface, and is/was a competing hard drive interface to IDE/ATA. However, back in the mists of time, SCSI was designated to the "high end hard drive" side of things, and is now much more expensive than IDE technology. Though, if you look at things like raw drive MTBF hours, you will see that cheaper IDE drives are only now barely catching up to the SCSI drive specs.
+
SCSI stands for Small Computer Systems Interface, and is/was a competing hard drive interface to IDE/ATA. However, back in the mists of time, SCSI was designated to the "high end hard drive" side of things, and is now much more expensive than ATA technology. Though, if you look at things like raw drive MTBF hours, you will see that cheaper ATA drives are only now barely catching up to the SCSI drive specs.
  
None but the highest end server and workstation motherboards come with inbuilt SCSI controllers, so these usually have to be added by means of a PCI card, which in themselves are not cheap. The cost of the hard drives themselves are very high indeed, and they offer much reduced storage capacity compared to a modern PATA or SATA drive. However, SCSI discs are incredibly fast and very reliable - but as we can see, it comes at a huge price. To be honest, there is very little chance of even an extensive MythTV setup requiring a SCSI system - SCSI excels in massive multi-user environments like databases and web/mail servers, but the advantages under a single user setup are hard to distinguish. With the recent addition of Western Digital's enterprise-class "Raptor" SATA drives, you can approach SCSI speeds without shelling out a kings ransom, although their size is limited to 74GB at the time of writing.
+
None but the highest end server and workstation motherboards come with built-in SCSI host adapters, so these usually have to be added by means of a PCI card, which in themselves are not cheap. The cost of the hard drives are very high indeed, and they offer much reduced storage capacity compared to a modern PATA or SATA drive. However, SCSI disks are incredibly fast and very reliable -- but as we can see, it comes at a huge price. To be honest, there is very little chance of even an extensive MythTV setup requiring a SCSI system -- SCSI excels in massive multi-user environments like databases and web/mail servers, but the advantages under a single user setup are hard to distinguish. With the recent addition of Western Digital's enterprise-class "Raptor" SATA drives, you can approach SCSI speeds without shelling out a kings ransom, although their size is limited to 74GB at the time of writing.
  
 
One thing of note is that SCSI drives are very ''very'' loud due to their very high rotation speed (10,000 or 15,000rpm) and so are going to be relegated to the backend under the stairs pretty quickly. Raptor drives are quieter, but still far louder than your average IDE drive.
 
One thing of note is that SCSI drives are very ''very'' loud due to their very high rotation speed (10,000 or 15,000rpm) and so are going to be relegated to the backend under the stairs pretty quickly. Raptor drives are quieter, but still far louder than your average IDE drive.
  
[ Editorial comment: SCSI's not ''that'' bad a choice, particularly if you can get used drives cheaply on eBay, and you are building an [[Under The Stairs]] backend box -- instead of the 2 or 4 drives you can put on most IDE controllers, you can put 15 on a SCSI controller -- and multiple channel controllers are available. So it is a matter of scale and buying savvy as much as anything else. -- [[Bay Link]] [[(2004-10-01T18:06:44Z)]]
+
[ Editorial comment: SCSI's not ''that'' bad a choice, particularly if you can get used drives cheaply on eBay, and you are building an [[Under The Stairs]] backend box -- instead of the 2 or 4 drives you can put on most IDE controllers, you can put 15 on a SCSI controller -- and multiple channel controllers are available. So it is a matter of scale and buying savvy as much as anything else. -- [[User:Baylink|Bay Link]] [[(2004-10-01T18:06:44Z)]]
  
*The problem comes that 15 drives are only useful for mass storage reasons, and the price/size ratio attainable through SATA is much better than with SCSI. For MythTV purposes SATA make the most sense technically and financially, with PATA a close second if you are not concerned with overall speed (i.e. as an archive array). That said I do have a 4xHDD U320 SCSI setup as my personal desktop... --[[User:Steveadeff|Steve Adeff]] 16:23, 8 June 2006 (UTC)]
+
*The problem comes that 15 drives are only useful for mass storage reasons, and the price/size ratio attainable through SATA is much better than with SCSI. For MythTV purposes SATA make the most sense technically and financially, with PATA a close second if you are not concerned with overall speed (i.e. as an archive array). <strike>That said I do have a 4xHDD U320 SCSI setup as my personal desktop...</strike> --[[User:Steveadeff|Steve Adeff]] 16:23, 8 June 2006 (UTC)]
  
 
===SAS===
 
===SAS===
Line 72: Line 66:
 
*SAS Expanders provide the ability to hook up drives the same way we network computers using a switch, although over shorter distances (several meters).
 
*SAS Expanders provide the ability to hook up drives the same way we network computers using a switch, although over shorter distances (several meters).
 
*2.5" and 3.5" drives are available
 
*2.5" and 3.5" drives are available
 +
 +
- Seagate Barracuda ES2 Serial Attached SCSI one terabyte drives can be found for around $270 - maybe even $250. They spin at 7200 rpm. Not a bad choice for MythTV systems that are going to be always on. [[User:RedmondTux|RedmondTux]]
  
 
===External Links===
 
===External Links===
Line 86: Line 82:
  
 
Note that, particularly if you are prone to monkey with CVS Myth or advanced beta and alpha test drivers, you will be ''much'' happier if you put /var/log on its own partition.
 
Note that, particularly if you are prone to monkey with CVS Myth or advanced beta and alpha test drivers, you will be ''much'' happier if you put /var/log on its own partition.
 +
 +
Most partitions, in the sense just described, can exist instead as LVM volumes (see the section on LVM below).
 +
 +
When partitioning a disk, you must first decide on a partitioning scheme. For x86 and x86-64 systems, the Master Boot Record (MBR) partitioning system has long been the standard. The MBR system, however, uses data structures that top out at 2TB. If you use a hardware RAID configuration, your virtual disks may exceed this size. Even single disks exceeding 2TB are likely to be available by the end of 2009. Therefore, you may need to use the newer GUID Partition Table (GPT) system if you plan to use lots of storage. GPT is already the standard on Intel-based Macintoshes. Using GPT requires partitioning with GPT-aware utilities, such as GNU Parted rather than fdisk. You may also need to track down a patched version of the GRUB boot loader. Check that your distribution supports installation to GPT disks if you intend to use this system. In some cases it may be simpler to install Linux on a (relatively) small MBR-partitioned disk and reserve the GPT system for the disk or RAID array that holds your recordings. If your individual disks or RAID arrays are smaller than 2TB, chances are the older MBR system will work fine.
  
 
==File systems==
 
==File systems==
 
As you probably know, Linux has a bewildering array of file systems available, most of which excel at a particular task. You are of course free to format your drives with whatever file system you choose, but here is some general info about the most popular file systems:
 
As you probably know, Linux has a bewildering array of file systems available, most of which excel at a particular task. You are of course free to format your drives with whatever file system you choose, but here is some general info about the most popular file systems:
*'''ext2''' is the "old standard" file system. It is fairly speedy, but does not come with journalling to protect your data from corruption, and can take an age to run though a file system check (fsck), although it can be seamlessly upgraded to ext3
+
*'''ext2''' is the "old standard" file system. It is fairly speedy, but does not come with journaling to speed up filesystem checks after a power loss or system crash. This means the system can take an age to run though a file system check (fsck), although ext2 can be seamlessly upgraded to ext3. (All the below filesystems include a journal.) These days, ext2 is best reserved for use on very small partitions, such as a 50-200MB (note MB, not GB) /boot partition, if you create one.
*'''ext3''' is an extension to the ext2 file system which introduced journalling as well as other improvements. It is a bit of a jack-of-all-trades of a file system, and does not excel at anything in particular, apart from very thorough testing!
+
*'''ext3''' is an extension to the ext2 file system which introduced journaling as well as other improvements. It is a bit of a jack-of-all-trades of a file system, and does not excel at anything in particular, apart from very thorough testing!
*'''ReiserFS''' is a high performance file system that is especially good at dealing with directories with lots of small files, which makes it a good choice for your system partitions, although it does not perform as well with large files
+
*'''ext4''' is the next-generation filesystem in this line. It adds features enabling support for larger files and filesystems, as well as better performance with large files. Ext4 has only moved out of "experimental" status with the 2.6.28 kernel, and its stability and suitability for use on a MythTV box have yet to be extensively explored by the community, as of early 2009.
 +
*'''ReiserFS''' is a high performance file system that is especially good at dealing with directories with lots of small files, which makes it a good choice for your system partitions, although it does not perform as well with large files. On partitions bigger than 200GB on which are by MythTV continually removed files (expiring shows) and filled by new files (recorded shows) performance lowers in time (problems in finding free block).  ReiserFS (Reiser3) is considered stable and feature-complete.  However, [[http://en.wikipedia.org/wiki/Namesys Namesys]], the company which develops ReiserFS has ceased commercial activities.
 
*'''[[JFS]]''' was originally developed by IBM for their AIX operating system, and was later donated to Linux. JFS is incredibly good at dealing with the huge files that MythTV generates, and can delete pretty much any file in under a second (ext3 can take as long as 15 seconds to delete really big files). JFS is a very good file system to use for storing your videos on, and it is very conservative with CPU usage.
 
*'''[[JFS]]''' was originally developed by IBM for their AIX operating system, and was later donated to Linux. JFS is incredibly good at dealing with the huge files that MythTV generates, and can delete pretty much any file in under a second (ext3 can take as long as 15 seconds to delete really big files). JFS is a very good file system to use for storing your videos on, and it is very conservative with CPU usage.
*'''[[XFS]]''' is another "foreign" filesystem, developed by SGI for their IRIX operating system, and once again donated to Linux. Like JFS, it is exceptionally good at dealing with large files, and has the highest throughput of any Linux filesystem, albeit at a higher CPU loading. XFS also makes an excellent choice as storage for your movie files. (Note that XFS filesystems can be ''grown'', but not shrunk, at the present time; this can occasionally be problematic. Note also that file system cleanings are forced using xfs_repair, not fsck; if you are going to use XFS, and [[Bay Link]] recommends that you do, ''read'' about it first.)
+
*'''[[XFS]]''' is another "foreign" filesystem, developed by SGI for their IRIX operating system, and once again donated to Linux. Like JFS, it is exceptionally good at dealing with large files, and has the highest throughput of any Linux filesystem, albeit at a higher CPU loading. XFS also makes an excellent choice as storage for your movie files. (Note that XFS filesystems can be ''grown'', but not shrunk, at the present time; this can occasionally be problematic. Note also that file system cleanings are forced using xfs_repair, not fsck; if you are going to use XFS, and [[User:Baylink|Bay Link]] recommends that you do, ''read'' about it first.)
 +
*'''[[Btrfs]]''' (pronounced "butter-eff-ess") is the up-and-coming Linux filesystem. It's Linux's answer to ZFS, which is popular on Solaris. Although Btrfs has many advanced features, such as copy-on-write operation, online defragmentation, and snapshots, it's still very new and has not been extensively tested by the MythTV community, as of October 2009.
  
To use any of these file systems, you'll need support for them compiled into the kernel along with the relevant userland utilities. The file system driver(s) of your partitions should always be compiled statically into the kernel (which makes things much easier!) or into an initrd, and not as a module, otherwise your newly booted kernel won't be able to load the modules you need to understand the filesystem to load the module you need (made my head spin too, but if you re-read it enough, it makes sense).
+
To use any of these file systems, you'll need support for them compiled into the kernel along with the relevant userland utilities. The file system driver(s) of your partitions must either be compiled directly into the kernel (not as modules) ''or'' compiled as modules and included in an initial RAM disk (initrd). The former approach is usually easier to set up; initrd configuration adds steps to the kernel compilation process and can sometimes go wrong. If you build your filesystem drivers as modules and don't build an initrd, the kernel won't be able to read the filesystems on which the filesystem drivers are stored! If you use your distribution's standard precompiled kernel, you don't need to worry about this.
  
Some distributions come with a choice of only one or two filesystems, although if you rebuild your kernel it is usually possible to enable all of them (including support for windows FAT32 and NTFS if you need it!). New, exotic and improved filesystems are cropping up all the time; hot on the horizon is Reiser4, which promises to be a very high performing and flexible system, although it is far from stable yet.
+
Some distributions come with a choice of only one or two filesystems, although if you rebuild your kernel it is possible to enable all of them (including support for windows FAT32 and NTFS if you need it!). New, exotic and improved filesystems are cropping up all the time; hot on the horizon is Reiser4, which promises to be a very high performing and flexible system, although it is far from stable yet.
  
 
Many filesystems allow tweaking of the block size at format time - selecting a large block size will make more efficient use of your hard drive space when dealing with large files, whereas a small block size is better suited for your system partitions. If in doubt, read the manual thoroughly or just go with the defaults, since you cannot change the block size without reformatting the drive.
 
Many filesystems allow tweaking of the block size at format time - selecting a large block size will make more efficient use of your hard drive space when dealing with large files, whereas a small block size is better suited for your system partitions. If in doubt, read the manual thoroughly or just go with the defaults, since you cannot change the block size without reformatting the drive.
  
In short, a good choice is ext3/Reiser for your system partitions and JFS or XFS for your MythTV storage. Note that the XFS implementations on SuSE 9.0 and 9.1 were both a bit flaky, this can make installations and upgrades difficult if you do not know the magic. (I will put the magic here when I relocate it. --[[Bay Link]])
+
Filesystem mount options can sometimes affect performance. For instance, when using XFS, the allocsize option can be used to set the size of the blocks that the filesystem uses when allocating new disk space. Setting this to a large value (as in allocsize=512m) can reduce fragmentation and therefore improve performance when large files are stored on the filesystem.
 +
 
 +
In short, a good choice is ext3 or ReiserFS for your system partitions and JFS or XFS for your MythTV storage. If you have a separate /boot partition, ext2 is a good option, since ext3's journal provides little benefit for a partition of this size but consumes a lot of disk space. Note that the XFS implementations on SuSE 9.0 and 9.1 were both a bit flaky, this can make installations and upgrades difficult if you do not know the magic. (I will put the magic here when I relocate it. --[[User:Baylink|Bay Link]])
  
 
==Advanced storage==
 
==Advanced storage==
 +
 +
 +
===Storage Groups===
 +
 +
[[Storage Groups]] is a feature, introduced in version 0.21, allowing the use of multiple hard drives for the storage of recordings and other media. It provides an easier, cheaper and safer alternative to LVM. It may also replace Raid in certain setups.
 +
 
===LVM===
 
===LVM===
LVM stands for the Logical Volume Manager, and you can use it to make two or more separate hard drives (or partitions on those drives) appear as one huge hard drive to the operating system. LVM can stripe the partitions together, and you then make one or more big filesystems on the entire thing.
+
LVM stands for the Logical Volume Manager. It provides two basic advantages over conventional partitions:
  
You will need to have LVM enabled in your kernel to do this, as well as having the userland LVM utilities installed. Users of 2.6 will be able to utilise the much improved LVM2.
+
* You can use it to make two or more separate hard drives (or partitions on those drives) appear as one huge hard drive to the operating system. LVM can optionally stripe the partitions together, meaning that accesses to the two disks are interleaved. This can improve performance in a manner similar to some RAID configurations.
  
The terminology of LVM is perhaps a little advanced to go into here, so if you want a good explanation of it you can read the [http://www.tldp.org/HOWTO/LVM-HOWTO/ LVM HOWTO]. In short, you can dedicate either individual partitions or entire hard drives (the "physical volumes") for use by the LVM, which allows you to map them into one or more "volume groups", from which you then carve out "logical volumes" to install filesystems upon. Probably the best thing about LVM is that, as long as the filesystem you pick is capable of being resized, you can extend the volume group(s) and logical volume(s) over bigger and more hard drives without losing or having to copy any data, which makes it a great choice if you want to keep your expandability options open.
+
* Filesystems are stored in logical volumes within the partitions used by LVM. These logical volumes may be resized, added, and deleted without regard for their locations or precisely where the data you allocate will be stored. (The logical volumes act much like files in a filesystem.) This feature makes it easy to add storage space to the filesystems that need it. You can, for instance, add a new disk to an existing system and then grow your MythTV recordings filesystem without having to copy data or otherwise disrupt your existing recordings.
  
===RAID===
+
LVM has certain drawbacks, of course:
Originally, [[RAID]] stood for Redundant Array of Independant Discs, although now the word Independent has been substituted for Inexpensive (probably because most RAID setups use ''very'' expensive SCSI discs ;). What this basically means is that data is spread across multiple hard drives in such a way that if one of the hard drives explodes or is eaten by the cat, you will be able to reconstruct the lost data from the other hard drives. One of the lesser functions of RAID is to produce higher performance filesystems by spreading read/write load across multiple discs as well. For a very clear and concise RAID tutorial, you can read these pages http://www.acnc.com/04_00.html, but in the meantime here is a brief rundown of the most common RAID levels along with examples of storage capacity:
+
  
*'''RAID0''', also known as striping, is not true RAID, in that it offers no redundancy. If one of the discs in the array fails, all of the data in the array is lost. RAID0 scales linearly with every drive added; two 80GB drives will produce a single 160GB filesystem. Please note that RAID0 is distinctly different from LVM!
+
* It adds complexity. In addition to creating partitions in a conventional way, you must use several utilities to build up the LVM data structures before you can begin using your disks.
  
*'''RAID1''', also known as mirroring, involves copying data to two identical hard drives rather than just one. If one drive dies, the other will remain fully functional with all of your data intact. Two 80GB drives will produce a single 80GB filesystem.
+
* Not all distributions provide easy support for LVM. Some versions of Ubuntu lack LVM support "out of the box," for instance. (You can work around this problem, but doing so requires additional expertise.)
  
*'''RAID0+1''' is a combination of mirroring and striping using four drives offering very fast read/write speeds. Four 80GB drives would combine to form a 160GB RAID0+1 filesystem.
+
* If you use LVM to span multiple physical disks, your data becomes more prone to damage should one disk fail -- the breakdown of one physical disk may make data stored on the good disk inaccessible.
  
Most of you will have seen these RAID levels advertised as being built into almost all modern motherboards; unfortunately this kind of RAID is achieved under proprietary drivers (all RAID calculations being done in software), typically available only for Windows. Fear not however, because the Linux kernel contains its own software RAID drivers, which (if the rumours I hear are true) perform even better than the proprietary software RAID drivers. Again, you can enable these in your kernel.
+
* Emergency recovery becomes more complex. Your recovery tools must support LVM (most modern recovery CDs/DVDs do, fortunately), but you may need to execute extra commands to access your data.
  
You will also note more exotic RAID levels such as RAID5 and RAID10. These are quite tricky to do in software (although it is possible), and are usually left to high-end dedicated RAID controllers. Previously, these were only available in high-end SCSI RAID setups, although recently 3ware have released an excellent series of cards that allow high end RAID features on much cheaper and larger capacity SATA discs. These cards are fully supported under Linux and offer excellent performance, and I use two of them at home. If you are looking for a relatively cheap and hassle-free IDE RAID setup under Linux, 3ware's are a very good choice. (P.S. thanks for the cheque, 3ware!)
+
* Booting Linux can become more complex, because you must either have LVM support on an initial RAM disk (initrd) or you must provide the basic LVM drivers and tools on a non-LVM partition. (Note that you can install your basic Linux system on a non-LVM disk and reserve LVM for your MythTV recordings and database filesystems alone, if you like. This configuration will minimize this drawback of LVM.)
  
*'''RAID5''' is a good compromise on RAID10, and spans data and parity data over three or more drives, giving redundancy and good read/write performance.
+
Despite these drawbacks, LVM's advantages make LVM appealing for many users. MythTV 0.21's storage groups are another option for increasing storage flexibility. You will need to have LVM enabled in your kernel to use LVM, as well as having the userland LVM utilities installed. The 2.6 kernel series implements the much improved LVM2.
  
*'''RAID10''' is very like the high performing RAID0+1, but with better redundancy (it can survive up to two simultaneous drive failures, whereas 0+1 can sustain only one hard drive failure).
+
The setup details of LVM are a little advanced to go into here, so if you want a good explanation of it you can read the [http://www.tldp.org/HOWTO/LVM-HOWTO/ LVM HOWTO]. In short, you can dedicate either individual partitions or entire hard drives (the "physical volumes") for use by the LVM, which allows you to map them into one or more "volume groups", from which you then carve out "logical volumes" to install filesystems upon.
  
In the end, if you are not that worried about losing your data (or if you keep good backups), any kind of RAID is overkill. A good compromise can be reached if you place all your system directories on a RAID of some sort (which will protect all of your time consuming configuration &mdash; my workstation is in the process of being switched over to RAID1 on two Western Digital Raptors) whilst placing the TV storage on a single disc. But if you have enough money and inclination, you can RAID your whole setup &mdash; I am particularly paranoid, and plan to upgrade my backend to using a 3ware and four 250GB drives in RAID10 to (hopefully) put an end to my currently non-existent storage problems.
+
===RAID===
 +
Originally, [[RAID]] stood for Redundant Array of Independant Discs, although now the word Independent has been substituted for Inexpensive (probably because most RAID setups use ''very'' expensive SCSI discs ;). What this basically means is that data is spread across multiple hard drives in such a way that if one of the hard drives explodes or is eaten by the cat, you will be able to reconstruct the lost data from the other hard drives. One of the lesser functions of RAID is to produce higher performance filesystems by spreading read/write load across multiple discs as well. For a very clear and concise RAID tutorial, you can read these pages http://www.acnc.com/04_00.html, but in the meantime here is a brief rundown of the most common RAID levels along with examples of storage capacity:
  
===Network filesystems===
+
*'''RAID0''', also known as striping, is not true RAID, in that it offers no redundancy. If one of the discs in the array fails, all of the data in the array is lost. '''RAID0''' scales linearly with every drive added; two 80GB drives will produce a single 160GB filesystem. Please note that '''RAID0''' is distinctly different from LVM!
  
As the name implies, these are mechanisms for accessing a files system (and therefore files) across a [[network]].
+
*'''RAID1''', also known as mirroring, involves copying data to two identical hard drives rather than just one. If one drive dies, the other will remain fully functional with all of your data intact. Two 80GB drives will produce a single 80GB filesystem.
  
Wireless LANs have come a long way, but if you're going to use remote files 100Mbit/s or better is probably the minimal requirement.
+
*'''RAID0+1''' and '''RAID10''' are two basic forms of nested arrays.  '''0+1''' is a mirror of stripes, while '''10''' is a stripe of mirrors.  While both methods are equally simple to execute, '''0+1''' is more commonly found on inexpensive software RAID included with consumer motherboards.  Conversely, '''10''' is the more reliable mode, requiring only one functional drive of each mirror set, while '''0+1''' requires one fully functional stripe.
  
As with anything else, there are several network file system protocols. To confuse things a bit, the one most useful to MythTV users the long established and generic NFS (which, itself is an acronym for Network File System). I guess you could use Samba (the Windows compatible sharing system), or perhaps both in combination but unless you've got a very good reason too (you are using an [[Xbox_Frontend|Xbox Frontend]] for example) then stay away.
+
*'''RAID5''' and '''RAID6''' are more complex forms of redundancy, and as such are typically only found on higher end cards. Similar to '''RAID0''', each stripe includes one redundant block of parity (two in the case of '''RAID6'''), used to calculate the missing data in the event of a failed drive.  Traditionally, this is very intensive, with high end cards having custom ASICs to handle the calculations, however modern CPUs, and particularly those with multiple cores, have no problem performing this function in software. Due to the use of parity that must be calculated across the entire stripe, this form of RAID suffers from poor write performance when executing multiple writes smaller than one stripe size.  Read performance is nearly as high as '''RAID0'''.
  
In a pure Linux environment NFS can be doubly useful as one can setup a [[Diskless_Frontend|diskless frontend]] (or indeed, a [[Diskless_Backend|diskless backend]] or combination of the two). Removing all physical drives (aside from maybe DVD, etc) from will render the box much quieter.
+
In the end, if you are not that worried about losing your data (or if you keep good backups), any kind of RAID is overkill. A good compromise can be reached if you place all your system directories on a RAID of some sort (which will protect all of your time consuming configuration &mdash; my workstation is in the process of being switched over to RAID1 on two Western Digital Raptors) whilst placing the TV storage on a single disc. But if you have enough money and inclination, you can RAID your whole setup &mdash; I am particularly paranoid, and plan to upgrade my backend to using a 3ware and four 250GB drives in RAID10 to (hopefully) put an end to my currently non-existent storage problems.
  
===Supermount===
+
===Network filesystems===
  
===Storage Groups===
+
As the name implies, these are mechanisms for locally accessing a remote file system (and therefore files) across a [[network]]. MythTV will internally stream content from backends to remote frontends, so for most purposes, so these will be unnecessary. This capability is limited to content defined on the backend using [[Storage_Groups|Storage Directories]], which currently limits it to the recording and video libraries. Music and artwork have not yet been migrated to this new design, and require filesystem access on each frontend. Backends can only record to locally mounted file systems, and will not stream a new recording to a remote backend for storage.
Storage Groups are a feature introduced in version 0.21. The user designates a list of directories in mythtv-setup, which myth uses for storage. These directories may exist on the same or different drive partitions, drives, logical volumes, RAID arrays, or network storage locations. The coder, Chris Pinkham explains his thought process:
+
  
MythTV's Storage Groups will search for a recording in all directories
+
If one does require a networked file system, the two common options are [[NFS]] and [[CIFS]]. NFS is the native protocol used by Linux and other POSIX compliant operating systems. CIFS is more commonly known as Windows File Sharing. CIFS offers much more configurability in terms of security and access restrictions, while NFS will be lightweight and faster. More importantly however, NFS is designed around the same filesystem properties as other Linux filesystems, while CIFS has a very foreign design, and incurs some complications in places where there is no direct translation from one parameter to another. NFS should be preferred over CIFS unless there are specific requirements that demand the use of CIFS.
referenced in the Storage Groups config. It first searches for the
+
file in the Storage Group you put the file in originally, but it will
+
search all directories in the Storage Groups table eventually if it
+
does not find the file where it is supposed to be.
+
  
One of the things in the back of my mind when coding Storage Groups was
+
===Supermount===
archival.  You could have an Archive Storage Group that points to a
+
directory or directories that are used for archiving recordings.  If
+
you wanted to move a recording from the Default Storage Group into
+
the Archive Group, just mv the file and Myth will find it.  Since the
+
Archive Group isn't referenced in any of your recording profiles, then
+
no new recordings would go there, only recordings you moved there
+
yourself.  Since Myth falls back to searching all directories, it would
+
find these recordings without you having to touch the database at
+
all after moving the file.  I started writing a builtin 'Migrate' job
+
in the JobQueue that will be used for moving recordings between
+
Storage Groups, but put it on the back burner a while back.  Eventually
+
you'll be able to do this mv within Myth from the JobQueue menu.
+
The AutoExpirer might be able to take advantage of this eventually as
+
well, so it could migrate a recording instead of deleting it or migrate
+
recordings to keep free disk space equally spread out over the directories
+
in a Storage Group.
+
- Chris Pinkham
+
(from mythtv-users list, April 25, 2007)
+
  
 
===Example setup===
 
===Example setup===

Revision as of 22:33, 15 September 2012

File storage refers to the broad topic of hardware, software and the methodology behind keeping MythTV recordings on a computer (and computer network).

Pretty much any reasonably modern hard drive will be more than adequate both in space and speed for MythTV. With the introduction of Storage Groups in 0.21 you can use as many drives as you like without the hassle of LVM or Raid.

Manufacturers

Important.png Note: This section will be full of hearsay, personal bias and much that is probably apocryphal, so take it with a pinch of salt, and always ask around. If you are adding anecdotal datapoints about a specific manufacturer, please indent one level and sign your comments.

The current [Mar-07] major manufacturers of consumer hard drives are Seagate, Western Digital, Maxtor (Now owned by Seagate), IBM/Hitachi and Samsung who all make IDE devices ranging up to 400GB, available in parallel (PATA) or the newer serial (SATA) interfaces, or as SCSI, though those have not caught up on the size front.

  • Seagate Barracudas are generally renowned to be quiet and reliable, having recently returned to offering a standard 5yr warranty
  • Western Digital drives are slightly better performers than the Seagates, at the expense of being noisier
  • All of the Maxtor drives I have used have been very noisy, and not particularly reliable
    • Note: All of the Maxtor drives I have used have been extremely reliable, and not very noisy at all. I have had only 2 fail in the six (?) years I have been using them in my computers and computers I build for customers. One was due to overheating (in a tiny amount of space in a hot area). The other -- I think was just a bad manufacture. Both times, Maxtor replaced the drives (and upgraded them, for free) for me with no hassle, and quite quickly, I might add. Just another personal opinion =) --Tyler Drake
    • My Maxtor Diamond Max 10 250GB is as quiet as a church mouse. --DavidC
    • I've had about 80% failure rate with Maxtors (40-250GB PATA) in the couple of dozen I've put in customer's boxes. They seem quiet though. My failure rate with Maxtors approaches 100% when they are attached to a TX2000 Promise RAID controller. They do normally last around a year so that they are out of warranty. I've had excellent luck with Seagates and you've got to love that 5yr warranty.
    • Since Seagate bought Maxtor and has turned them into their "lower" line of drives with a 3 year warranty instead of their usual 5 year. But it also has seemed to improve their quality.
    • My Maxtor 6L060L3 (A 60GB 7200 U/133 D740X with FDB; Fluid Dynamic Bearings) was extremely silent and at the time fast. The only time it squeaked was as it was dying (infantile; warranty replaced) but my replacement was equal. My Maxtor 96147U8 (60G 5400 U/66) was noticably slower and made reasonable amount of noise. I also had a I had borrowed a WD450AA from a friend and that was extremely loud for a slow drive (<10k RPM). It was comparable to some 10ks I dealt with at work. For reliability across the board (laptop to enterprise drives), I prefer Seagate. -Gene
  • IBM/Hitachi are recovering from a somewhat tarnished reputation from their "Deathstar" line of hard drives, and are producing some very good SATA drives, although I've never used any myself
  • Samsung Spinpoint drives are getting rave reviews (fast, quiet and reliable) from a lot of users here in the UK, although again I've never used them myself.

Based on my current bias, I can recommend the Seagate Barracuda drives for situations where you want quiet drives, although I prefer Western Digital for situations where noise isn't much of a problem.

Comparing cheap economy drives with a short warranty to higher end more expensive drives with longer warranty is nothing less than inept. Those drives are not comparable. They are intended for different markets and so are built to different standards. Using that comparison as a basis to criticize the brand as a whole is even more ridiculous. The same thinking follows in these statements: My Chevy Vega kept breaking and was junk, therefore all GM cars are junk. My Lincoln Town Car was fantastic and never ever broke, therefore all Fords cars are great and last forever.

.

Interfaces

There are four types of interfaces that are most commonly used these days. These interface are:

  • (P)ATA
  • SATA
  • SCSI
  • SAS

Bear in mind that a drive's performance as a video source depends on the 'sustained transfer rate' of the drive which has nothing to do with the interface type or speed. The sustained transfer rate is limited by the 'media transfer rate'. The media transfer rate is the rate of data transfer between the head and the disc surfaces. It is a physical limitation that is shared by all hard drives regardless of the interface. The only things that affect it are: the data density on the disc, the physical size (read/write area) of the heads, and the rotational speed of the drive. All drives with the same rotational speed and data density will have approximately the same media transfer rate. A high end 7200 RPM drive can achieve a max media transfer rate of approximately 80 MB/sec regardless of the interface being SATA, PATA, or SCSI. - Drive manufacturers don't want you to know this and divert attention from it by emphasizing the interface's speed in their ads. The interface's speed is only an advantage during data bursts. Most manufacturers went so far as to stop listing the media transfer rates in their specification tables. - With drive of a given rotational velocity and data density, the only way to improve overall system performance is to use a form of RAID that uses Stripping. This effectively uses two drives simultaneously so that the total media transfer rate is doubled.

(P)ATA

As a much older standard, PATA is universally supported on most x86 hardware. This interface was originally called ATA but when Serial ATA (SATA) was introduced it was renamed Parallel ATA. Recently, this function on motherboards has been shifted to a 3rd party controller, and boards only offer one port (two drives).

SATA

Most new hard drives and motherboards come with support for the newer Serial ATA (SATA) interface. Although SATA is a superior standard (it supports a lot of the SCSI subset, and features much smaller, thinner cables than PATA, amongst other improvements), some SATA controllers have closed-source or no Linux drivers. This has resulted in some Linux-based systems being unable to use SATA adequately due to poorly functioning controllers. This situation is no longer as serious as it once was, but you should check your hardware driver support to be sure.

For the current status of SATA support under Linux you can check Serial ATA (SATA) on Linux.

Please bear in mind that the built-in software RAID functions on SATA chips will usually not work in Linux without extensive fooling around with the kernel (if at all). Because Linux provides its own software RAID features, this isn't a big loss for a dedicated Linux box (such as a MythTV system), but if you dual-boot, you may not be able to use the controller's software RAID.

SCSI

SCSI stands for Small Computer Systems Interface, and is/was a competing hard drive interface to IDE/ATA. However, back in the mists of time, SCSI was designated to the "high end hard drive" side of things, and is now much more expensive than ATA technology. Though, if you look at things like raw drive MTBF hours, you will see that cheaper ATA drives are only now barely catching up to the SCSI drive specs.

None but the highest end server and workstation motherboards come with built-in SCSI host adapters, so these usually have to be added by means of a PCI card, which in themselves are not cheap. The cost of the hard drives are very high indeed, and they offer much reduced storage capacity compared to a modern PATA or SATA drive. However, SCSI disks are incredibly fast and very reliable -- but as we can see, it comes at a huge price. To be honest, there is very little chance of even an extensive MythTV setup requiring a SCSI system -- SCSI excels in massive multi-user environments like databases and web/mail servers, but the advantages under a single user setup are hard to distinguish. With the recent addition of Western Digital's enterprise-class "Raptor" SATA drives, you can approach SCSI speeds without shelling out a kings ransom, although their size is limited to 74GB at the time of writing.

One thing of note is that SCSI drives are very very loud due to their very high rotation speed (10,000 or 15,000rpm) and so are going to be relegated to the backend under the stairs pretty quickly. Raptor drives are quieter, but still far louder than your average IDE drive.

[ Editorial comment: SCSI's not that bad a choice, particularly if you can get used drives cheaply on eBay, and you are building an Under The Stairs backend box -- instead of the 2 or 4 drives you can put on most IDE controllers, you can put 15 on a SCSI controller -- and multiple channel controllers are available. So it is a matter of scale and buying savvy as much as anything else. -- Bay Link (2004-10-01T18:06:44Z)

  • The problem comes that 15 drives are only useful for mass storage reasons, and the price/size ratio attainable through SATA is much better than with SCSI. For MythTV purposes SATA make the most sense technically and financially, with PATA a close second if you are not concerned with overall speed (i.e. as an archive array). That said I do have a 4xHDD U320 SCSI setup as my personal desktop... --Steve Adeff 16:23, 8 June 2006 (UTC)]

SAS

SAS, or Serial Attached SCSI, is a new technology that takes the best of SCSI and SATA and in many ways it compares to Fiber Channel (e.g. SAN technology) and USB as well. Most modern servers already ship with SAS instead of SCSI, and they're eventually expected to be the desktop standard. As of today it has a bus bandwidth if 3 Gbps, and is on target to increase to 12 Gbps by 2011. Individual SAS drives today have a transfer rate of 300 MB/sec, just under the SCSI rate of 320 MB/sec, but each drive gets the full 300 MB/sec to the host, instead of shared as with SCSI, SATA and PATA. Current benchmarks show comparable performance to the best 15K Ultra320 SCSI drives and in some areas SAS far surpasses SCSI performance. Some other cool features are:

  • The SAS interface is backwards compatible with all SATA drives.
  • It can support over 16,000 devices on a single bus, compared to 16 with SCSI and 1 with SATA
  • SAS Expanders provide the ability to hook up drives the same way we network computers using a switch, although over shorter distances (several meters).
  • 2.5" and 3.5" drives are available

- Seagate Barracuda ES2 Serial Attached SCSI one terabyte drives can be found for around $270 - maybe even $250. They spin at 7200 rpm. Not a bad choice for MythTV systems that are going to be always on. RedmondTux

External Links

Partitions

The Unix model of filesystems is much more flexible than that under windows, and Linux is no exception, allowing you to seamlessly integrate hard drives and partitions and different formats here, there and everywhere. Personally, I'm a big fan of multiple partition setups because, amongst other things, it allows you to tailor the filesystem (see below) to the files that are going to live on it. At the very least I would advocate at least three partitions:

  • /boot is where the kernel and bootloader things live. I usually format this ext2 since it is very rarely written to.
  • / is the root of your filesystem, pretty much the equivalent of "C:\" under windows; all of your programs and files will live on the root filesystem somewhere (such as /usr and /var/log), unless you specify a particular directory tree to live under a separate filesystem (like the /boot shown above).
  • The third partition would be where you store all of your MythTV files (as well as music and external video, if that is where you want it). If you want to see my partitioning setup for one of my backends, you can look at the "Advanced storage: example setup" section below.

Note that, particularly if you are prone to monkey with CVS Myth or advanced beta and alpha test drivers, you will be much happier if you put /var/log on its own partition.

Most partitions, in the sense just described, can exist instead as LVM volumes (see the section on LVM below).

When partitioning a disk, you must first decide on a partitioning scheme. For x86 and x86-64 systems, the Master Boot Record (MBR) partitioning system has long been the standard. The MBR system, however, uses data structures that top out at 2TB. If you use a hardware RAID configuration, your virtual disks may exceed this size. Even single disks exceeding 2TB are likely to be available by the end of 2009. Therefore, you may need to use the newer GUID Partition Table (GPT) system if you plan to use lots of storage. GPT is already the standard on Intel-based Macintoshes. Using GPT requires partitioning with GPT-aware utilities, such as GNU Parted rather than fdisk. You may also need to track down a patched version of the GRUB boot loader. Check that your distribution supports installation to GPT disks if you intend to use this system. In some cases it may be simpler to install Linux on a (relatively) small MBR-partitioned disk and reserve the GPT system for the disk or RAID array that holds your recordings. If your individual disks or RAID arrays are smaller than 2TB, chances are the older MBR system will work fine.

File systems

As you probably know, Linux has a bewildering array of file systems available, most of which excel at a particular task. You are of course free to format your drives with whatever file system you choose, but here is some general info about the most popular file systems:

  • ext2 is the "old standard" file system. It is fairly speedy, but does not come with journaling to speed up filesystem checks after a power loss or system crash. This means the system can take an age to run though a file system check (fsck), although ext2 can be seamlessly upgraded to ext3. (All the below filesystems include a journal.) These days, ext2 is best reserved for use on very small partitions, such as a 50-200MB (note MB, not GB) /boot partition, if you create one.
  • ext3 is an extension to the ext2 file system which introduced journaling as well as other improvements. It is a bit of a jack-of-all-trades of a file system, and does not excel at anything in particular, apart from very thorough testing!
  • ext4 is the next-generation filesystem in this line. It adds features enabling support for larger files and filesystems, as well as better performance with large files. Ext4 has only moved out of "experimental" status with the 2.6.28 kernel, and its stability and suitability for use on a MythTV box have yet to be extensively explored by the community, as of early 2009.
  • ReiserFS is a high performance file system that is especially good at dealing with directories with lots of small files, which makes it a good choice for your system partitions, although it does not perform as well with large files. On partitions bigger than 200GB on which are by MythTV continually removed files (expiring shows) and filled by new files (recorded shows) performance lowers in time (problems in finding free block). ReiserFS (Reiser3) is considered stable and feature-complete. However, [Namesys], the company which develops ReiserFS has ceased commercial activities.
  • JFS was originally developed by IBM for their AIX operating system, and was later donated to Linux. JFS is incredibly good at dealing with the huge files that MythTV generates, and can delete pretty much any file in under a second (ext3 can take as long as 15 seconds to delete really big files). JFS is a very good file system to use for storing your videos on, and it is very conservative with CPU usage.
  • XFS is another "foreign" filesystem, developed by SGI for their IRIX operating system, and once again donated to Linux. Like JFS, it is exceptionally good at dealing with large files, and has the highest throughput of any Linux filesystem, albeit at a higher CPU loading. XFS also makes an excellent choice as storage for your movie files. (Note that XFS filesystems can be grown, but not shrunk, at the present time; this can occasionally be problematic. Note also that file system cleanings are forced using xfs_repair, not fsck; if you are going to use XFS, and Bay Link recommends that you do, read about it first.)
  • Btrfs (pronounced "butter-eff-ess") is the up-and-coming Linux filesystem. It's Linux's answer to ZFS, which is popular on Solaris. Although Btrfs has many advanced features, such as copy-on-write operation, online defragmentation, and snapshots, it's still very new and has not been extensively tested by the MythTV community, as of October 2009.

To use any of these file systems, you'll need support for them compiled into the kernel along with the relevant userland utilities. The file system driver(s) of your partitions must either be compiled directly into the kernel (not as modules) or compiled as modules and included in an initial RAM disk (initrd). The former approach is usually easier to set up; initrd configuration adds steps to the kernel compilation process and can sometimes go wrong. If you build your filesystem drivers as modules and don't build an initrd, the kernel won't be able to read the filesystems on which the filesystem drivers are stored! If you use your distribution's standard precompiled kernel, you don't need to worry about this.

Some distributions come with a choice of only one or two filesystems, although if you rebuild your kernel it is possible to enable all of them (including support for windows FAT32 and NTFS if you need it!). New, exotic and improved filesystems are cropping up all the time; hot on the horizon is Reiser4, which promises to be a very high performing and flexible system, although it is far from stable yet.

Many filesystems allow tweaking of the block size at format time - selecting a large block size will make more efficient use of your hard drive space when dealing with large files, whereas a small block size is better suited for your system partitions. If in doubt, read the manual thoroughly or just go with the defaults, since you cannot change the block size without reformatting the drive.

Filesystem mount options can sometimes affect performance. For instance, when using XFS, the allocsize option can be used to set the size of the blocks that the filesystem uses when allocating new disk space. Setting this to a large value (as in allocsize=512m) can reduce fragmentation and therefore improve performance when large files are stored on the filesystem.

In short, a good choice is ext3 or ReiserFS for your system partitions and JFS or XFS for your MythTV storage. If you have a separate /boot partition, ext2 is a good option, since ext3's journal provides little benefit for a partition of this size but consumes a lot of disk space. Note that the XFS implementations on SuSE 9.0 and 9.1 were both a bit flaky, this can make installations and upgrades difficult if you do not know the magic. (I will put the magic here when I relocate it. --Bay Link)

Advanced storage

Storage Groups

Storage Groups is a feature, introduced in version 0.21, allowing the use of multiple hard drives for the storage of recordings and other media. It provides an easier, cheaper and safer alternative to LVM. It may also replace Raid in certain setups.

LVM

LVM stands for the Logical Volume Manager. It provides two basic advantages over conventional partitions:

  • You can use it to make two or more separate hard drives (or partitions on those drives) appear as one huge hard drive to the operating system. LVM can optionally stripe the partitions together, meaning that accesses to the two disks are interleaved. This can improve performance in a manner similar to some RAID configurations.
  • Filesystems are stored in logical volumes within the partitions used by LVM. These logical volumes may be resized, added, and deleted without regard for their locations or precisely where the data you allocate will be stored. (The logical volumes act much like files in a filesystem.) This feature makes it easy to add storage space to the filesystems that need it. You can, for instance, add a new disk to an existing system and then grow your MythTV recordings filesystem without having to copy data or otherwise disrupt your existing recordings.

LVM has certain drawbacks, of course:

  • It adds complexity. In addition to creating partitions in a conventional way, you must use several utilities to build up the LVM data structures before you can begin using your disks.
  • Not all distributions provide easy support for LVM. Some versions of Ubuntu lack LVM support "out of the box," for instance. (You can work around this problem, but doing so requires additional expertise.)
  • If you use LVM to span multiple physical disks, your data becomes more prone to damage should one disk fail -- the breakdown of one physical disk may make data stored on the good disk inaccessible.
  • Emergency recovery becomes more complex. Your recovery tools must support LVM (most modern recovery CDs/DVDs do, fortunately), but you may need to execute extra commands to access your data.
  • Booting Linux can become more complex, because you must either have LVM support on an initial RAM disk (initrd) or you must provide the basic LVM drivers and tools on a non-LVM partition. (Note that you can install your basic Linux system on a non-LVM disk and reserve LVM for your MythTV recordings and database filesystems alone, if you like. This configuration will minimize this drawback of LVM.)

Despite these drawbacks, LVM's advantages make LVM appealing for many users. MythTV 0.21's storage groups are another option for increasing storage flexibility. You will need to have LVM enabled in your kernel to use LVM, as well as having the userland LVM utilities installed. The 2.6 kernel series implements the much improved LVM2.

The setup details of LVM are a little advanced to go into here, so if you want a good explanation of it you can read the LVM HOWTO. In short, you can dedicate either individual partitions or entire hard drives (the "physical volumes") for use by the LVM, which allows you to map them into one or more "volume groups", from which you then carve out "logical volumes" to install filesystems upon.

RAID

Originally, RAID stood for Redundant Array of Independant Discs, although now the word Independent has been substituted for Inexpensive (probably because most RAID setups use very expensive SCSI discs ;). What this basically means is that data is spread across multiple hard drives in such a way that if one of the hard drives explodes or is eaten by the cat, you will be able to reconstruct the lost data from the other hard drives. One of the lesser functions of RAID is to produce higher performance filesystems by spreading read/write load across multiple discs as well. For a very clear and concise RAID tutorial, you can read these pages http://www.acnc.com/04_00.html, but in the meantime here is a brief rundown of the most common RAID levels along with examples of storage capacity:

  • RAID0, also known as striping, is not true RAID, in that it offers no redundancy. If one of the discs in the array fails, all of the data in the array is lost. RAID0 scales linearly with every drive added; two 80GB drives will produce a single 160GB filesystem. Please note that RAID0 is distinctly different from LVM!
  • RAID1, also known as mirroring, involves copying data to two identical hard drives rather than just one. If one drive dies, the other will remain fully functional with all of your data intact. Two 80GB drives will produce a single 80GB filesystem.
  • RAID0+1 and RAID10 are two basic forms of nested arrays. 0+1 is a mirror of stripes, while 10 is a stripe of mirrors. While both methods are equally simple to execute, 0+1 is more commonly found on inexpensive software RAID included with consumer motherboards. Conversely, 10 is the more reliable mode, requiring only one functional drive of each mirror set, while 0+1 requires one fully functional stripe.
  • RAID5 and RAID6 are more complex forms of redundancy, and as such are typically only found on higher end cards. Similar to RAID0, each stripe includes one redundant block of parity (two in the case of RAID6), used to calculate the missing data in the event of a failed drive. Traditionally, this is very intensive, with high end cards having custom ASICs to handle the calculations, however modern CPUs, and particularly those with multiple cores, have no problem performing this function in software. Due to the use of parity that must be calculated across the entire stripe, this form of RAID suffers from poor write performance when executing multiple writes smaller than one stripe size. Read performance is nearly as high as RAID0.

In the end, if you are not that worried about losing your data (or if you keep good backups), any kind of RAID is overkill. A good compromise can be reached if you place all your system directories on a RAID of some sort (which will protect all of your time consuming configuration — my workstation is in the process of being switched over to RAID1 on two Western Digital Raptors) whilst placing the TV storage on a single disc. But if you have enough money and inclination, you can RAID your whole setup — I am particularly paranoid, and plan to upgrade my backend to using a 3ware and four 250GB drives in RAID10 to (hopefully) put an end to my currently non-existent storage problems.

Network filesystems

As the name implies, these are mechanisms for locally accessing a remote file system (and therefore files) across a network. MythTV will internally stream content from backends to remote frontends, so for most purposes, so these will be unnecessary. This capability is limited to content defined on the backend using Storage Directories, which currently limits it to the recording and video libraries. Music and artwork have not yet been migrated to this new design, and require filesystem access on each frontend. Backends can only record to locally mounted file systems, and will not stream a new recording to a remote backend for storage.

If one does require a networked file system, the two common options are NFS and CIFS. NFS is the native protocol used by Linux and other POSIX compliant operating systems. CIFS is more commonly known as Windows File Sharing. CIFS offers much more configurability in terms of security and access restrictions, while NFS will be lightweight and faster. More importantly however, NFS is designed around the same filesystem properties as other Linux filesystems, while CIFS has a very foreign design, and incurs some complications in places where there is no direct translation from one parameter to another. NFS should be preferred over CIFS unless there are specific requirements that demand the use of CIFS.

Supermount

Example setup

Below I have detailed the partitioning setup for my own main backend, which has just been rejigged. My main rig consists of 80GB (hda) and 120GB (hdb) Seagate Barracuda's:

  • hda1 is used for the /boot partition, and is 25MB is size, ext2 formatted
  • hda2 is used for the / partition, and is 580MB in size, ReiserFS formatted
  • hda3 is the swap partition
  • hda5 is used for the /usr partition, is 6GB in size and ReiserFS formatted
  • hda6 is /var and is 2.5GB in size (way too big!), ReiserFS formatted
  • hda7 and hdb are joined into a JFS formatted LVM under /dev/mythvg/lvm0 of approximately 174GB, sitting on /home (TV data is stored in /home/mythtv/tvstore)
  • /home/mythtv/music and /home/mythtv/movies are read-only NFS mounts from my file server (which has everything stored on RAID1 on a 3ware 8506) which keep all my movies and music available to MythMusic and MythVideo
  • /usr/portage is also NFS mounted (read-write) to the file server, and used to distribute the portage cache and downloaded tarballs to my other Gentoo boxes, allowing for faster compiles and less rsyncing off the Gentoo servers, which will also allow me to use a smaller /usr partition.

It is probably a lot more complicated than is really necessary, but I felt like having a good experiment with things, and this is what I came up with.

There is also a walkthrough setting up an XFS on LVM on RAID5 system here: LVM on RAID.