File storage refers to the broad topic of hardware, software and the methodology behind keeping MythTV recordings on a computer (and computer network).
Pretty much any reasonably modern hard drive will be more than adequate both in space and speed for MythTV. With the introduction of Storage Groups in 0.21 you can use as many drives as you like without the hassle of LVM or Raid.
The current  major manufacturers of consumer hard drives are Seagate, Western Digital, Hitachi, Toshiba and Samsung who all make HDD devices ranging up to 6TB, available in serial (SATA) interfaces, or as SCSI, though those have not caught up on the size front.
Here are some user thoughts on hard drives. Please do not extrapolate these results. For meaningful results check this reliability study.
- Seagate Barracudas are generally renowned to be quiet and reliable, having recently returned to offering a standard 5yr warranty
- Western Digital drives are slightly better performers than the Seagates, at the expense of being noisier
- Hitachi are recovering from a somewhat tarnished reputation from their "Deathstar" line of hard drives, and are producing some very good SATA drives, although I've never used any myself
- Samsung Spinpoint drives are getting rave reviews (fast, quiet and reliable) from a lot of users here in the UK, although again I've never used them myself.
There are three types of interfaces that are most commonly used these days. These interface are:
Bear in mind that a drive's performance as a video source depends on the 'sustained transfer rate' of the drive which has nothing to do with the interface type or speed. The sustained transfer rate is limited by the 'media transfer rate'. The media transfer rate is the rate of data transfer between the head and the disc surfaces. It is a physical limitation that is shared by all hard drives regardless of the interface. The only things that affect it are: the data density on the disc, the physical size (read/write area) of the heads, and the rotational speed of the drive. All drives with the same rotational speed and data density will have approximately the same media transfer rate. A high end 7200 RPM drive can achieve a max media transfer rate of approximately 80 MB/sec regardless of the interface being SATA or SCSI. - Drive manufacturers don't want you to know this and divert attention from it by emphasizing the interface's speed in their ads. The interface's speed is only an advantage during data bursts. Most manufacturers went so far as to stop listing the media transfer rates in their specification tables. - With drive of a given rotational velocity and data density, the only way to improve overall system performance is to use a form of RAID that uses Stripping. This effectively uses two drives simultaneously so that the total media transfer rate is doubled.
Most new hard drives and motherboards come with support for the newer Serial ATA (SATA) interface. Although SATA is a superior standard (it supports a lot of the SCSI subset, and features much smaller, thinner cables than PATA, amongst other improvements), some SATA controllers have closed-source or no Linux drivers. This has resulted in some Linux-based systems being unable to use SATA adequately due to poorly functioning controllers. This situation is no longer as serious as it once was, but you should check your hardware driver support to be sure.
For the current status of SATA support under Linux you can check Serial ATA (SATA) on Linux.
Please bear in mind that the built-in software RAID functions on SATA chips will usually not work in Linux without extensive fooling around with the kernel (if at all). Because Linux provides its own software RAID features, this isn't a big loss for a dedicated Linux box (such as a MythTV system), but if you dual-boot, you may not be able to use the controller's software RAID.
SCSI stands for Small Computer Systems Interface, and is/was a competing hard drive interface to IDE/ATA. However, back in the mists of time, SCSI was designated to the "high end hard drive" side of things, and is now much more expensive than ATA technology. Though, if you look at things like raw drive MTBF hours, you will see that cheaper ATA drives are only now barely catching up to the SCSI drive specs.
None but the highest end server and workstation motherboards come with built-in SCSI host adapters, so these usually have to be added by means of a PCI card, which in themselves are not cheap. The cost of the hard drives are very high indeed, and they offer much reduced storage capacity compared to a modern PATA or SATA drive. However, SCSI disks are incredibly fast and very reliable -- but as we can see, it comes at a huge price. To be honest, there is very little chance of even an extensive MythTV setup requiring a SCSI system -- SCSI excels in massive multi-user environments like databases and web/mail servers, but the advantages under a single user setup are hard to distinguish. With the recent addition of Western Digital's enterprise-class "Raptor" SATA drives, you can approach SCSI speeds without shelling out a kings ransom, although their size is limited to 74GB at the time of writing.
One thing of note is that SCSI drives are very very loud due to their very high rotation speed (10,000 or 15,000rpm) and so are going to be relegated to the backend under the stairs pretty quickly. Raptor drives are quieter, but still far louder than your average IDE drive.
[ Editorial comment: SCSI's not that bad a choice, particularly if you can get used drives cheaply on eBay, and you are building an Under The Stairs backend box -- instead of the 2 or 4 drives you can put on most IDE controllers, you can put 15 on a SCSI controller -- and multiple channel controllers are available. So it is a matter of scale and buying savvy as much as anything else. -- Bay Link (2004-10-01T18:06:44Z)
- The problem comes that 15 drives are only useful for mass storage reasons, and the price/size ratio attainable through SATA is much better than with SCSI. For MythTV purposes SATA make the most sense technically and financially, with PATA a close second if you are not concerned with overall speed (i.e. as an archive array).
That said I do have a 4xHDD U320 SCSI setup as my personal desktop...--Steve Adeff 16:23, 8 June 2006 (UTC)]
SAS, or Serial Attached SCSI, is a new technology that takes the best of SCSI and SATA and in many ways it compares to Fiber Channel (e.g. SAN technology) and USB as well. Most modern servers already ship with SAS instead of SCSI, and they're eventually expected to be the desktop standard. As of today it has a bus bandwidth if 3 Gbps, and is on target to increase to 12 Gbps by 2011. Individual SAS drives today have a transfer rate of 300 MB/sec, just under the SCSI rate of 320 MB/sec, but each drive gets the full 300 MB/sec to the host, instead of shared as with SCSI, SATA and PATA. Current benchmarks show comparable performance to the best 15K Ultra320 SCSI drives and in some areas SAS far surpasses SCSI performance. Some other cool features are:
- The SAS interface is backwards compatible with all SATA drives.
- It can support over 16,000 devices on a single bus, compared to 16 with SCSI and 1 with SATA
- SAS Expanders provide the ability to hook up drives the same way we network computers using a switch, although over shorter distances (several meters).
- 2.5" and 3.5" drives are available
- Seagate Barracuda ES2 Serial Attached SCSI one terabyte drives can be found for around $270 - maybe even $250. They spin at 7200 rpm. Not a bad choice for MythTV systems that are going to be always on. RedmondTux
The Unix model of filesystems is much more flexible than that under windows, and Linux is no exception, allowing you to seamlessly integrate hard drives and partitions and different formats here, there and everywhere. Personally, I'm a big fan of multiple partition setups because, amongst other things, it allows you to tailor the filesystem (see below) to the files that are going to live on it. At the very least I would advocate at least three partitions:
/bootis where the kernel and bootloader things live. I usually format this ext2 since it is very rarely written to.
/is the root of your filesystem, pretty much the equivalent of "C:\" under windows; all of your programs and files will live on the root filesystem somewhere (such as
/var/log), unless you specify a particular directory tree to live under a separate filesystem (like the /boot shown above).
- The third partition would be where you store all of your MythTV files (as well as music and external video, if that is where you want it). If you want to see my partitioning setup for one of my backends, you can look at the "Advanced storage: example setup" section below.
Note that, particularly if you are prone to monkey with CVS Myth or advanced beta and alpha test drivers, you will be much happier if you put /var/log on its own partition.
Most partitions, in the sense just described, can exist instead as LVM volumes (see the section on LVM below).
When partitioning a disk, you must first decide on a partitioning scheme. For x86 and x86-64 systems, the Master Boot Record (MBR) partitioning system has long been the standard. The MBR system, however, uses data structures that top out at 2TB. If you use a hardware RAID configuration, your virtual disks may exceed this size. Even single disks exceeding 2TB are likely to be available by the end of 2009. Therefore, you may need to use the newer GUID Partition Table (GPT) system if you plan to use lots of storage. GPT is already the standard on Intel-based Macintoshes. Using GPT requires partitioning with GPT-aware utilities, such as GNU Parted rather than fdisk. You may also need to track down a patched version of the GRUB boot loader. Check that your distribution supports installation to GPT disks if you intend to use this system. In some cases it may be simpler to install Linux on a (relatively) small MBR-partitioned disk and reserve the GPT system for the disk or RAID array that holds your recordings. If your individual disks or RAID arrays are smaller than 2TB, chances are the older MBR system will work fine.
As you probably know, Linux has a bewildering array of file systems available, most of which excel at a particular task. You are of course free to format your drives with whatever file system you choose, but here is some general info about the most popular file systems:
- ext4 is the standard filesystem in Fedora, amongst other distributions. It includes features enabling support for larger files and filesystems, as well as better performance with large files. Its stability and suitability for use on a MythTV box is ideal.
- JFS was originally developed by IBM for their AIX operating system, and was later donated to Linux. JFS is incredibly good at dealing with the huge files that MythTV generates, and can delete pretty much any file in under a second (ext3 can take as long as 15 seconds to delete really big files). JFS is a very good file system to use for storing your videos on, and it is very conservative with CPU usage.
- XFS is another "foreign" filesystem, developed by SGI for their IRIX operating system, and once again donated to Linux. Like JFS, it is exceptionally good at dealing with large files, and has the highest throughput of any Linux filesystem, albeit at a higher CPU loading. XFS also makes an excellent choice as storage for your movie files. (Note that XFS filesystems can be grown, but not shrunk, at the present time; this can occasionally be problematic. Note also that file system cleanings are forced using xfs_repair, not fsck; if you are going to use XFS, and Bay Link recommends that you do, read about it first.)
- Btrfs (pronounced "butter-eff-ess") is the up-and-coming Linux filesystem. It's Linux's answer to ZFS, which is popular on Solaris. Although Btrfs has many advanced features, such as copy-on-write operation, online defragmentation, and snapshots, it's still very new and has not been extensively tested by the MythTV community, as of October 2009.
To use any of these file systems, you'll need support for them compiled into the kernel along with the relevant userland utilities. The file system driver(s) of your partitions must either be compiled directly into the kernel (not as modules) or compiled as modules and included in an initial RAM disk (initrd). The former approach is usually easier to set up; initrd configuration adds steps to the kernel compilation process and can sometimes go wrong. If you build your filesystem drivers as modules and don't build an initrd, the kernel won't be able to read the filesystems on which the filesystem drivers are stored! If you use your distribution's standard precompiled kernel, you don't need to worry about this.
Some distributions come with a choice of only one or two filesystems, although if you rebuild your kernel it is possible to enable all of them (including support for windows FAT32 and NTFS if you need it!). New, exotic and improved filesystems are cropping up all the time; hot on the horizon is Reiser4, which promises to be a very high performing and flexible system, although it is far from stable yet.
Many filesystems allow tweaking of the block size at format time - selecting a large block size will make more efficient use of your hard drive space when dealing with large files, whereas a small block size is better suited for your system partitions. If in doubt, read the manual thoroughly or just go with the defaults, since you cannot change the block size without reformatting the drive.
Filesystem mount options can sometimes affect performance. For instance, when using XFS, the allocsize option can be used to set the size of the blocks that the filesystem uses when allocating new disk space. Setting this to a large value (as in allocsize=512m) can reduce fragmentation and therefore improve performance when large files are stored on the filesystem.
In short, a good choice is ext3 or ReiserFS for your system partitions and JFS or XFS for your MythTV storage. If you have a separate /boot partition, ext2 is a good option, since ext3's journal provides little benefit for a partition of this size but consumes a lot of disk space. Note that the XFS implementations on SuSE 9.0 and 9.1 were both a bit flaky, this can make installations and upgrades difficult if you do not know the magic. (I will put the magic here when I relocate it. --Bay Link)
Storage Groups is a feature, introduced in version 0.21, allowing the use of multiple hard drives for the storage of recordings and other media. It provides an easier, cheaper and safer alternative to LVM. It may also replace Raid in certain setups.
LVM stands for the Logical Volume Manager. It provides two basic advantages over conventional partitions:
- You can use it to make two or more separate hard drives (or partitions on those drives) appear as one huge hard drive to the operating system. LVM can optionally stripe the partitions together, meaning that accesses to the two disks are interleaved. This can improve performance in a manner similar to some RAID configurations.
- Filesystems are stored in logical volumes within the partitions used by LVM. These logical volumes may be resized, added, and deleted without regard for their locations or precisely where the data you allocate will be stored. (The logical volumes act much like files in a filesystem.) This feature makes it easy to add storage space to the filesystems that need it. You can, for instance, add a new disk to an existing system and then grow your MythTV recordings filesystem without having to copy data or otherwise disrupt your existing recordings.
LVM has certain drawbacks, of course:
- It adds complexity. In addition to creating partitions in a conventional way, you must use several utilities to build up the LVM data structures before you can begin using your disks.
- Not all distributions provide easy support for LVM. Some versions of Ubuntu lack LVM support "out of the box," for instance. (You can work around this problem, but doing so requires additional expertise.)
- If you use LVM to span multiple physical disks, your data becomes more prone to damage should one disk fail -- the breakdown of one physical disk may make data stored on the good disk inaccessible.
- Emergency recovery becomes more complex. Your recovery tools must support LVM (most modern recovery CDs/DVDs do, fortunately), but you may need to execute extra commands to access your data.
- Booting Linux can become more complex, because you must either have LVM support on an initial RAM disk (initrd) or you must provide the basic LVM drivers and tools on a non-LVM partition. (Note that you can install your basic Linux system on a non-LVM disk and reserve LVM for your MythTV recordings and database filesystems alone, if you like. This configuration will minimize this drawback of LVM.)
Despite these drawbacks, LVM's advantages make LVM appealing for many users. MythTV 0.21's storage groups are another option for increasing storage flexibility. You will need to have LVM enabled in your kernel to use LVM, as well as having the userland LVM utilities installed. The 2.6 kernel series implements the much improved LVM2.
The setup details of LVM are a little advanced to go into here, so if you want a good explanation of it you can read the LVM HOWTO. In short, you can dedicate either individual partitions or entire hard drives (the "physical volumes") for use by the LVM, which allows you to map them into one or more "volume groups", from which you then carve out "logical volumes" to install filesystems upon.
Originally, RAID stood for Redundant Array of Independant Discs, although now the word Independent has been substituted for Inexpensive (probably because most RAID setups use very expensive SCSI discs ;). What this basically means is that data is spread across multiple hard drives in such a way that if one of the hard drives explodes or is eaten by the cat, you will be able to reconstruct the lost data from the other hard drives. One of the lesser functions of RAID is to produce higher performance filesystems by spreading read/write load across multiple discs as well. For a very clear and concise RAID tutorial, you can read these pages http://www.acnc.com/04_00.html, but in the meantime here is a brief rundown of the most common RAID levels along with examples of storage capacity:
- RAID0, also known as striping, is not true RAID, in that it offers no redundancy. If one of the discs in the array fails, all of the data in the array is lost. RAID0 scales linearly with every drive added; two 80GB drives will produce a single 160GB filesystem. Please note that RAID0 is distinctly different from LVM!
- RAID1, also known as mirroring, involves copying data to two identical hard drives rather than just one. If one drive dies, the other will remain fully functional with all of your data intact. Two 80GB drives will produce a single 80GB filesystem.
- RAID0+1 and RAID10 are two basic forms of nested arrays. 0+1 is a mirror of stripes, while 10 is a stripe of mirrors. While both methods are equally simple to execute, 0+1 is more commonly found on inexpensive software RAID included with consumer motherboards. Conversely, 10 is the more reliable mode, requiring only one functional drive of each mirror set, while 0+1 requires one fully functional stripe.
- RAID5 and RAID6 are more complex forms of redundancy, and as such are typically only found on higher end cards. Similar to RAID0, each stripe includes one redundant block of parity (two in the case of RAID6), used to calculate the missing data in the event of a failed drive. Traditionally, this is very intensive, with high end cards having custom ASICs to handle the calculations, however modern CPUs, and particularly those with multiple cores, have no problem performing this function in software. Due to the use of parity that must be calculated across the entire stripe, this form of RAID suffers from poor write performance when executing multiple writes smaller than one stripe size. Read performance is nearly as high as RAID0.
In the end, if you are not that worried about losing your data (or if you keep good backups), any kind of RAID is overkill. A good compromise can be reached if you place all your system directories on a RAID of some sort (which will protect all of your time consuming configuration) whilst placing the TV storage on a single disc. But if you have enough money and inclination, you can RAID your whole setup.
As the name implies, these are mechanisms for locally accessing a remote file system (and therefore files) across a network. MythTV will internally stream content from backends to remote frontends, so for most purposes, so these will be unnecessary. This capability is limited to content defined on the backend using Storage Directories, which currently limits it to the recording and video libraries. Music and artwork have not yet been migrated to this new design, and require filesystem access on each frontend. Backends can only record to locally mounted file systems, and will not stream a new recording to a remote backend for storage.
If one does require a networked file system, the two common options are NFS and CIFS. NFS is the native protocol used by Linux and other POSIX compliant operating systems. CIFS is more commonly known as Windows File Sharing. CIFS offers much more configurability in terms of security and access restrictions, while NFS will be lightweight and faster. More importantly however, NFS is designed around the same filesystem properties as other Linux filesystems, while CIFS has a very foreign design, and incurs some complications in places where there is no direct translation from one parameter to another. NFS should be preferred over CIFS unless there are specific requirements that demand the use of CIFS.
Below I have detailed the partitioning setup for my own main backend, which has just been rejigged. My main rig consists of 80GB (hda) and 120GB (hdb) Seagate Barracuda's:
- hda1 is used for the /boot partition, and is 25MB is size, ext2 formatted
- hda2 is used for the / partition, and is 580MB in size, ReiserFS formatted
- hda3 is the swap partition
- hda5 is used for the /usr partition, is 6GB in size and ReiserFS formatted
- hda6 is /var and is 2.5GB in size (way too big!), ReiserFS formatted
- hda7 and hdb are joined into a JFS formatted LVM under /dev/mythvg/lvm0 of approximately 174GB, sitting on /home (TV data is stored in /home/mythtv/tvstore)
- /home/mythtv/music and /home/mythtv/movies are read-only NFS mounts from my file server (which has everything stored on RAID1 on a 3ware 8506) which keep all my movies and music available to MythMusic and MythVideo
- /usr/portage is also NFS mounted (read-write) to the file server, and used to distribute the portage cache and downloaded tarballs to my other Gentoo boxes, allowing for faster compiles and less rsyncing off the Gentoo servers, which will also allow me to use a smaller /usr partition.
It is probably a lot more complicated than is really necessary, but I felt like having a good experiment with things, and this is what I came up with.
There is also a walkthrough setting up an XFS on LVM on RAID5 system here: LVM on RAID.