Difference between revisions of "File storage"
(→Network filesystems: Clarify)
(Add derogation of CIFS back in. MythTV is POSIX software. POSIX-native filesystem will always be superior to ones adapted from other architectures, unless they are used to fulfill very specific needs.)
|Line 152:||Line 152:|
As the name implies, these are mechanisms for accessing a
As the name implies, these are mechanisms for accessing a system (and therefore files) across a [[network]].
a , the are [[NFS]] and [[CIFS]]. Linux NFS be as , a , from the .
Revision as of 01:27, 6 August 2012
File storage refers to the broad topic of hardware, software and the methodology behind keeping MythTV recordings on a computer (and computer network).
Pretty much any reasonably modern hard drive will be more than adequate both in space and speed for MythTV. With the introduction of Storage Groups in 0.21 you can use as many drives as you like without the hassle of LVM or Raid.
The current [Mar-07] major manufacturers of consumer hard drives are Seagate, Western Digital, Maxtor (Now owned by Seagate), IBM/Hitachi and Samsung who all make IDE devices ranging up to 400GB, available in parallel (PATA) or the newer serial (SATA) interfaces, or as SCSI, though those have not caught up on the size front.
- Seagate Barracudas are generally renowned to be quiet and reliable, having recently returned to offering a standard 5yr warranty
- Western Digital drives are slightly better performers than the Seagates, at the expense of being noisier
- All of the Maxtor drives I have used have been very noisy, and not particularly reliable
- Note: All of the Maxtor drives I have used have been extremely reliable, and not very noisy at all. I have had only 2 fail in the six (?) years I have been using them in my computers and computers I build for customers. One was due to overheating (in a tiny amount of space in a hot area). The other -- I think was just a bad manufacture. Both times, Maxtor replaced the drives (and upgraded them, for free) for me with no hassle, and quite quickly, I might add. Just another personal opinion =) --Tyler Drake
- My Maxtor Diamond Max 10 250GB is as quiet as a church mouse. --DavidC
- I've had about 80% failure rate with Maxtors (40-250GB PATA) in the couple of dozen I've put in customer's boxes. They seem quiet though. My failure rate with Maxtors approaches 100% when they are attached to a TX2000 Promise RAID controller. They do normally last around a year so that they are out of warranty. I've had excellent luck with Seagates and you've got to love that 5yr warranty.
- Since Seagate bought Maxtor and has turned them into their "lower" line of drives with a 3 year warranty instead of their usual 5 year. But it also has seemed to improve their quality.
- My Maxtor 6L060L3 (A 60GB 7200 U/133 D740X with FDB; Fluid Dynamic Bearings) was extremely silent and at the time fast. The only time it squeaked was as it was dying (infantile; warranty replaced) but my replacement was equal. My Maxtor 96147U8 (60G 5400 U/66) was noticably slower and made reasonable amount of noise. I also had a I had borrowed a WD450AA from a friend and that was extremely loud for a slow drive (<10k RPM). It was comparable to some 10ks I dealt with at work. For reliability across the board (laptop to enterprise drives), I prefer Seagate. -Gene
- IBM/Hitachi are recovering from a somewhat tarnished reputation from their "Deathstar" line of hard drives, and are producing some very good SATA drives, although I've never used any myself
- Samsung Spinpoint drives are getting rave reviews (fast, quiet and reliable) from a lot of users here in the UK, although again I've never used them myself.
Based on my current bias, I can recommend the Seagate Barracuda drives for situations where you want quiet drives, although I prefer Western Digital for situations where noise isn't much of a problem.
Comparing cheap economy drives with a short warranty to higher end more expensive drives with longer warranty is nothing less than inept. Those drives are not comparable. They are intended for different markets and so are built to different standards. Using that comparison as a basis to criticize the brand as a whole is even more ridiculous. The same thinking follows in these statements: My Chevy Vega kept breaking and was junk, therefore all GM cars are junk. My Lincoln Town Car was fantastic and never ever broke, therefore all Fords cars are great and last forever.
There are four types of interfaces that are most commonly used these days. These interface are:
Bear in mind that a drive's performance as a video source depends on the 'sustained transfer rate' of the drive which has nothing to do with the interface type or speed. The sustained transfer rate is limited by the 'media transfer rate'. The media transfer rate is the rate of data transfer between the head and the disc surfaces. It is a physical limitation that is shared by all hard drives regardless of the interface. The only things that affect it are: the data density on the disc, the physical size (read/write area) of the heads, and the rotational speed of the drive. All drives with the same rotational speed and data density will have approximately the same media transfer rate. A high end 7200 RPM drive can achieve a max media transfer rate of approximately 80 MB/sec regardless of the interface being SATA, PATA, or SCSI. - Drive manufacturers don't want you to know this and divert attention from it by emphasizing the interface's speed in their ads. The interface's speed is only an advantage during data bursts. Most manufacturers went so far as to stop listing the media transfer rates in their specification tables. - With drive of a given rotational velocity and data density, the only way to improve overall system performance is to use a form of RAID that uses Stripping. This effectively uses two drives simultaneously so that the total media transfer rate is doubled.
As a much older standard, PATA is universally supported on most x86 hardware. This interface was originally called ATA but when Serial ATA (SATA) was introduced it was renamed Parallel ATA. Recently, this function on motherboards has been shifted to a 3rd party controller, and boards only offer one port (two drives).
Most new hard drives and motherboards come with support for the newer Serial ATA (SATA) interface. Although SATA is a superior standard (it supports a lot of the SCSI subset, and features much smaller, thinner cables than PATA, amongst other improvements), some SATA controllers have closed-source or no Linux drivers. This has resulted in some Linux-based systems being unable to use SATA adequately due to poorly functioning controllers. This situation is no longer as serious as it once was, but you should check your hardware driver support to be sure.
For the current status of SATA support under Linux you can check Serial ATA (SATA) on Linux.
Please bear in mind that the built-in software RAID functions on SATA chips will usually not work in Linux without extensive fooling around with the kernel (if at all). Because Linux provides its own software RAID features, this isn't a big loss for a dedicated Linux box (such as a MythTV system), but if you dual-boot, you may not be able to use the controller's software RAID.
SCSI stands for Small Computer Systems Interface, and is/was a competing hard drive interface to IDE/ATA. However, back in the mists of time, SCSI was designated to the "high end hard drive" side of things, and is now much more expensive than ATA technology. Though, if you look at things like raw drive MTBF hours, you will see that cheaper ATA drives are only now barely catching up to the SCSI drive specs.
None but the highest end server and workstation motherboards come with built-in SCSI host adapters, so these usually have to be added by means of a PCI card, which in themselves are not cheap. The cost of the hard drives are very high indeed, and they offer much reduced storage capacity compared to a modern PATA or SATA drive. However, SCSI disks are incredibly fast and very reliable -- but as we can see, it comes at a huge price. To be honest, there is very little chance of even an extensive MythTV setup requiring a SCSI system -- SCSI excels in massive multi-user environments like databases and web/mail servers, but the advantages under a single user setup are hard to distinguish. With the recent addition of Western Digital's enterprise-class "Raptor" SATA drives, you can approach SCSI speeds without shelling out a kings ransom, although their size is limited to 74GB at the time of writing.
One thing of note is that SCSI drives are very very loud due to their very high rotation speed (10,000 or 15,000rpm) and so are going to be relegated to the backend under the stairs pretty quickly. Raptor drives are quieter, but still far louder than your average IDE drive.
[ Editorial comment: SCSI's not that bad a choice, particularly if you can get used drives cheaply on eBay, and you are building an Under The Stairs backend box -- instead of the 2 or 4 drives you can put on most IDE controllers, you can put 15 on a SCSI controller -- and multiple channel controllers are available. So it is a matter of scale and buying savvy as much as anything else. -- Bay Link (2004-10-01T18:06:44Z)
- The problem comes that 15 drives are only useful for mass storage reasons, and the price/size ratio attainable through SATA is much better than with SCSI. For MythTV purposes SATA make the most sense technically and financially, with PATA a close second if you are not concerned with overall speed (i.e. as an archive array).
That said I do have a 4xHDD U320 SCSI setup as my personal desktop...--Steve Adeff 16:23, 8 June 2006 (UTC)]
SAS, or Serial Attached SCSI, is a new technology that takes the best of SCSI and SATA and in many ways it compares to Fiber Channel (e.g. SAN technology) and USB as well. Most modern servers already ship with SAS instead of SCSI, and they're eventually expected to be the desktop standard. As of today it has a bus bandwidth if 3 Gbps, and is on target to increase to 12 Gbps by 2011. Individual SAS drives today have a transfer rate of 300 MB/sec, just under the SCSI rate of 320 MB/sec, but each drive gets the full 300 MB/sec to the host, instead of shared as with SCSI, SATA and PATA. Current benchmarks show comparable performance to the best 15K Ultra320 SCSI drives and in some areas SAS far surpasses SCSI performance. Some other cool features are:
- The SAS interface is backwards compatible with all SATA drives.
- It can support over 16,000 devices on a single bus, compared to 16 with SCSI and 1 with SATA
- SAS Expanders provide the ability to hook up drives the same way we network computers using a switch, although over shorter distances (several meters).
- 2.5" and 3.5" drives are available
- Seagate Barracuda ES2 Serial Attached SCSI one terabyte drives can be found for around $270 - maybe even $250. They spin at 7200 rpm. Not a bad choice for MythTV systems that are going to be always on. RedmondTux
- wikipedia:Advanced Technology Attachment
- wikipedia:Serial ATA
- wikipedia:Serial Attached SCSI
The Unix model of filesystems is much more flexible than that under windows, and Linux is no exception, allowing you to seamlessly integrate hard drives and partitions and different formats here, there and everywhere. Personally, I'm a big fan of multiple partition setups because, amongst other things, it allows you to tailor the filesystem (see below) to the files that are going to live on it. At the very least I would advocate at least three partitions:
/bootis where the kernel and bootloader things live. I usually format this ext2 since it is very rarely written to.
/is the root of your filesystem, pretty much the equivalent of "C:\" under windows; all of your programs and files will live on the root filesystem somewhere (such as
/var/log), unless you specify a particular directory tree to live under a separate filesystem (like the /boot shown above).
- The third partition would be where you store all of your MythTV files (as well as music and external video, if that is where you want it). If you want to see my partitioning setup for one of my backends, you can look at the "Advanced storage: example setup" section below.
Note that, particularly if you are prone to monkey with CVS Myth or advanced beta and alpha test drivers, you will be much happier if you put /var/log on its own partition.
Most partitions, in the sense just described, can exist instead as LVM volumes (see the section on LVM below).
When partitioning a disk, you must first decide on a partitioning scheme. For x86 and x86-64 systems, the Master Boot Record (MBR) partitioning system has long been the standard. The MBR system, however, uses data structures that top out at 2TB. If you use a hardware RAID configuration, your virtual disks may exceed this size. Even single disks exceeding 2TB are likely to be available by the end of 2009. Therefore, you may need to use the newer GUID Partition Table (GPT) system if you plan to use lots of storage. GPT is already the standard on Intel-based Macintoshes. Using GPT requires partitioning with GPT-aware utilities, such as GNU Parted rather than fdisk. You may also need to track down a patched version of the GRUB boot loader. Check that your distribution supports installation to GPT disks if you intend to use this system. In some cases it may be simpler to install Linux on a (relatively) small MBR-partitioned disk and reserve the GPT system for the disk or RAID array that holds your recordings. If your individual disks or RAID arrays are smaller than 2TB, chances are the older MBR system will work fine.
As you probably know, Linux has a bewildering array of file systems available, most of which excel at a particular task. You are of course free to format your drives with whatever file system you choose, but here is some general info about the most popular file systems:
- ext2 is the "old standard" file system. It is fairly speedy, but does not come with journaling to speed up filesystem checks after a power loss or system crash. This means the system can take an age to run though a file system check (fsck), although ext2 can be seamlessly upgraded to ext3. (All the below filesystems include a journal.) These days, ext2 is best reserved for use on very small partitions, such as a 50-200MB (note MB, not GB) /boot partition, if you create one.
- ext3 is an extension to the ext2 file system which introduced journaling as well as other improvements. It is a bit of a jack-of-all-trades of a file system, and does not excel at anything in particular, apart from very thorough testing!
- ext4 is the next-generation filesystem in this line. It adds features enabling support for larger files and filesystems, as well as better performance with large files. Ext4 has only moved out of "experimental" status with the 2.6.28 kernel, and its stability and suitability for use on a MythTV box have yet to be extensively explored by the community, as of early 2009.
- ReiserFS is a high performance file system that is especially good at dealing with directories with lots of small files, which makes it a good choice for your system partitions, although it does not perform as well with large files. On partitions bigger than 200GB on which are by MythTV continually removed files (expiring shows) and filled by new files (recorded shows) performance lowers in time (problems in finding free block). ReiserFS (Reiser3) is considered stable and feature-complete. However, [Namesys], the company which develops ReiserFS has ceased commercial activities.
- JFS was originally developed by IBM for their AIX operating system, and was later donated to Linux. JFS is incredibly good at dealing with the huge files that MythTV generates, and can delete pretty much any file in under a second (ext3 can take as long as 15 seconds to delete really big files). JFS is a very good file system to use for storing your videos on, and it is very conservative with CPU usage.
- XFS is another "foreign" filesystem, developed by SGI for their IRIX operating system, and once again donated to Linux. Like JFS, it is exceptionally good at dealing with large files, and has the highest throughput of any Linux filesystem, albeit at a higher CPU loading. XFS also makes an excellent choice as storage for your movie files. (Note that XFS filesystems can be grown, but not shrunk, at the present time; this can occasionally be problematic. Note also that file system cleanings are forced using xfs_repair, not fsck; if you are going to use XFS, and Bay Link recommends that you do, read about it first.)
- Btrfs (pronounced "butter-eff-ess") is the up-and-coming Linux filesystem. It's Linux's answer to ZFS, which is popular on Solaris. Although Btrfs has many advanced features, such as copy-on-write operation, online defragmentation, and snapshots, it's still very new and has not been extensively tested by the MythTV community, as of October 2009.
To use any of these file systems, you'll need support for them compiled into the kernel along with the relevant userland utilities. The file system driver(s) of your partitions must either be compiled directly into the kernel (not as modules) or compiled as modules and included in an initial RAM disk (initrd). The former approach is usually easier to set up; initrd configuration adds steps to the kernel compilation process and can sometimes go wrong. If you build your filesystem drivers as modules and don't build an initrd, the kernel won't be able to read the filesystems on which the filesystem drivers are stored! If you use your distribution's standard precompiled kernel, you don't need to worry about this.
Some distributions come with a choice of only one or two filesystems, although if you rebuild your kernel it is possible to enable all of them (including support for windows FAT32 and NTFS if you need it!). New, exotic and improved filesystems are cropping up all the time; hot on the horizon is Reiser4, which promises to be a very high performing and flexible system, although it is far from stable yet.
Many filesystems allow tweaking of the block size at format time - selecting a large block size will make more efficient use of your hard drive space when dealing with large files, whereas a small block size is better suited for your system partitions. If in doubt, read the manual thoroughly or just go with the defaults, since you cannot change the block size without reformatting the drive.
Filesystem mount options can sometimes affect performance. For instance, when using XFS, the allocsize option can be used to set the size of the blocks that the filesystem uses when allocating new disk space. Setting this to a large value (as in allocsize=512m) can reduce fragmentation and therefore improve performance when large files are stored on the filesystem.
In short, a good choice is ext3 or ReiserFS for your system partitions and JFS or XFS for your MythTV storage. If you have a separate /boot partition, ext2 is a good option, since ext3's journal provides little benefit for a partition of this size but consumes a lot of disk space. Note that the XFS implementations on SuSE 9.0 and 9.1 were both a bit flaky, this can make installations and upgrades difficult if you do not know the magic. (I will put the magic here when I relocate it. --Bay Link)
Storage Groups is a feature, introduced in version 0.21, allowing the use of multiple hard drives for the storage of recordings and other media. It provides an easier, cheaper and safer alternative to LVM. It may also replace Raid in certain setups. See the Storage Groups page for more information.
LVM stands for the Logical Volume Manager. It provides two basic advantages over conventional partitions:
- You can use it to make two or more separate hard drives (or partitions on those drives) appear as one huge hard drive to the operating system. LVM can optionally stripe the partitions together, meaning that accesses to the two disks are interleaved. This can improve performance in a manner similar to some RAID configurations.
- Filesystems are stored in logical volumes within the partitions used by LVM. These logical volumes may be resized, added, and deleted without regard for their locations or precisely where the data you allocate will be stored. (The logical volumes act much like files in a filesystem.) This feature makes it easy to add storage space to the filesystems that need it. You can, for instance, add a new disk to an existing system and then grow your MythTV recordings filesystem without having to copy data or otherwise disrupt your existing recordings.
LVM has certain drawbacks, of course:
- It adds complexity. In addition to creating partitions in a conventional way, you must use several utilities to build up the LVM data structures before you can begin using your disks.
- Not all distributions provide easy support for LVM. Some versions of Ubuntu lack LVM support "out of the box," for instance. (You can work around this problem, but doing so requires additional expertise.)
- If you use LVM to span multiple physical disks, your data becomes more prone to damage should one disk fail -- the breakdown of one physical disk may make data stored on the good disk inaccessible.
- Emergency recovery becomes more complex. Your recovery tools must support LVM (most modern recovery CDs/DVDs do, fortunately), but you may need to execute extra commands to access your data.
- Booting Linux can become more complex, because you must either have LVM support on an initial RAM disk (initrd) or you must provide the basic LVM drivers and tools on a non-LVM partition. (Note that you can install your basic Linux system on a non-LVM disk and reserve LVM for your MythTV recordings and database filesystems alone, if you like. This configuration will minimize this drawback of LVM.)
Despite these drawbacks, LVM's advantages make LVM appealing for many users. MythTV 0.21's storage groups are another option for increasing storage flexibility. You will need to have LVM enabled in your kernel to use LVM, as well as having the userland LVM utilities installed. The 2.6 kernel series implements the much improved LVM2.
The setup details of LVM are a little advanced to go into here, so if you want a good explanation of it you can read the LVM HOWTO. In short, you can dedicate either individual partitions or entire hard drives (the "physical volumes") for use by the LVM, which allows you to map them into one or more "volume groups", from which you then carve out "logical volumes" to install filesystems upon.
Originally, RAID stood for Redundant Array of Independant Discs, although now the word Independent has been substituted for Inexpensive (probably because most RAID setups use very expensive SCSI discs ;). What this basically means is that data is spread across multiple hard drives in such a way that if one of the hard drives explodes or is eaten by the cat, you will be able to reconstruct the lost data from the other hard drives. One of the lesser functions of RAID is to produce higher performance filesystems by spreading read/write load across multiple discs as well. For a very clear and concise RAID tutorial, you can read these pages http://www.acnc.com/04_00.html, but in the meantime here is a brief rundown of the most common RAID levels along with examples of storage capacity:
- RAID0, also known as striping, is not true RAID, in that it offers no redundancy. If one of the discs in the array fails, all of the data in the array is lost. RAID0 scales linearly with every drive added; two 80GB drives will produce a single 160GB filesystem. Please note that RAID0 is distinctly different from LVM!
- RAID1, also known as mirroring, involves copying data to two identical hard drives rather than just one. If one drive dies, the other will remain fully functional with all of your data intact. Two 80GB drives will produce a single 80GB filesystem.
- RAID0+1 and RAID10 are two basic forms of nested arrays. 0+1 is a mirror of stripes, while 10 is a stripe of mirrors. While both methods are equally simple to execute, 0+1 is more commonly found on inexpensive software RAID included with consumer motherboards. Conversely, 10 is the more reliable mode, requiring only one functional drive of each mirror set, while 0+1 requires one fully functional stripe.
- RAID5 and RAID6 are more complex forms of redundancy, and as such are typically only found on higher end cards. Similar to RAID0, each stripe includes one redundant block of parity (two in the case of RAID6), used to calculate the missing data in the event of a failed drive. Traditionally, this is very intensive, with high end cards having custom ASICs to handle the calculations, however modern CPUs, and particularly those with multiple cores, have no problem performing this function in software. Due to the use of parity that must be calculated across the entire stripe, this form of RAID suffers from poor write performance when executing multiple writes smaller than one stripe size. Read performance is nearly as high as RAID0.
In the end, if you are not that worried about losing your data (or if you keep good backups), any kind of RAID is overkill. A good compromise can be reached if you place all your system directories on a RAID of some sort (which will protect all of your time consuming configuration — my workstation is in the process of being switched over to RAID1 on two Western Digital Raptors) whilst placing the TV storage on a single disc. But if you have enough money and inclination, you can RAID your whole setup — I am particularly paranoid, and plan to upgrade my backend to using a 3ware and four 250GB drives in RAID10 to (hopefully) put an end to my currently non-existent storage problems.
As the name implies, these are mechanisms for locally accessing a remote file system (and therefore files) across a network. MythTV will internally stream content from backends to remote frontends, so for most purposes, so these will be unnecessary. This capability is limited to content defined on the backend using Storage Directories, which currently limits it to the recording and video libraries. Music and artwork have not yet been migrated to this new design, and require filesystem access on each frontend. Backends can only record to locally mounted file systems, and will not stream a new recording to a remote backend for storage.
If one does require a networked file system, the two common options are NFS and CIFS. NFS is the native protocol used by Linux and other POSIX compliant operating systems. CIFS is more commonly known as Windows File Sharing. CIFS offers much more configurability in terms of security and access restrictions, while NFS will be lightweight and faster. More importantly however, NFS is designed around the same filesystem properties as other Linux filesystems, while CIFS has a very foreign design, and incurs some complications in places where there is no direct translation from one parameter to another. NFS should be preferred over CIFS unless there are specific requirements that demand the use of CIFS.
Below I have detailed the partitioning setup for my own main backend, which has just been rejigged. My main rig consists of 80GB (hda) and 120GB (hdb) Seagate Barracuda's:
- hda1 is used for the /boot partition, and is 25MB is size, ext2 formatted
- hda2 is used for the / partition, and is 580MB in size, ReiserFS formatted
- hda3 is the swap partition
- hda5 is used for the /usr partition, is 6GB in size and ReiserFS formatted
- hda6 is /var and is 2.5GB in size (way too big!), ReiserFS formatted
- hda7 and hdb are joined into a JFS formatted LVM under /dev/mythvg/lvm0 of approximately 174GB, sitting on /home (TV data is stored in /home/mythtv/tvstore)
- /home/mythtv/music and /home/mythtv/movies are read-only NFS mounts from my file server (which has everything stored on RAID1 on a 3ware 8506) which keep all my movies and music available to MythMusic and MythVideo
- /usr/portage is also NFS mounted (read-write) to the file server, and used to distribute the portage cache and downloaded tarballs to my other Gentoo boxes, allowing for faster compiles and less rsyncing off the Gentoo servers, which will also allow me to use a smaller /usr partition.
It is probably a lot more complicated than is really necessary, but I felt like having a good experiment with things, and this is what I came up with.
There is also a walkthrough setting up an XFS on LVM on RAID5 system here: LVM on RAID.