Software-RAID mini-HOWTO <author>Linas Vepstas, <tt>linas@fc.net</tt> <date>v0.18 14 August 1997 <abstract> This document is a tutorial/HOWTO/FAQ for users of the Linux MD kernel extension, the associated tools, and their use. The MD extension implements RAID-0 (stripping), RAID-1 (mirroring), RAID-4 and RAID-5 in software. That is, with MD, no special hardware or disk controllers are required to get many of the benefits of RAID. This document is <bf>NOT</bf> an introduction to RAID; you must find this elsewhere. </abstract> <!-- Table of contents --> <toc> <!-- Begin the document --> <p> <descrip> <tag>Preamble</tag> This document is copylefted by Linas Vepstas (<htmlurl url="mailto:linas@fc.net" name="linas@fc.net">). Permission to use, copy, distribute this document for any purpose is hereby granted, provided that the author's / editor's name and this notice appear in all copies and/or supporting documents; and that an unmodified version of this document is made freely available. This document is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY, either expressed or implied. While every effort has been taken to ensure the accuracy of the information documented herein, the author / editor / maintainer assumes NO RESPONSIBILITY for any errors, or for any damages, direct or consequential, as a result of the use of the information documented herein. RAID, although designed to improve system reliability by adding redundancy, can also lead to a false sense of security and confidence when used improperly. This false confidence can lead to even greater disasters. Know what you are doing, test, be knowledgeable and aware! </descrip> </p> <sect>Introduction <p> <enum> <item> <itemize> <item><bf>Q</bf>: What is RAID? <item><bf>A</bf>: RAID stands for ``Redundant Array of Inexpensive Disks'', and is meant to be a way of creating a fast and reliable disk-drive subsystem out of individual disks. </itemize> <item> <itemize> <item><bf>Q</bf>: What is this document? <item><bf>A</bf>: This document is a tutorial/HOWTO/FAQ for users of the Linux MD kernel extension, the associated tools, and their use. The MD extension implements RAID-0 (stripping), RAID-1 (mirroring), RAID-4 and RAID-5 in software. That is, with MD, no special hardware or disk controllers are required to get many of the benefits of RAID. This document is <bf>NOT</bf> an introduction to RAID; you must find this elsewhere. </itemize> <item> <itemize> <item><bf>Q</bf>: Where do I get it? <item><bf>A</bf>: Software-RAID is available in the Linux kernel & the md-patch from <url url="http://luthien.nuclecu.unam.mx/˜miguel/raid">. See also <url url="ftp://linux.kernel.org/pub/linux/daemons/raid/"> for tools and other goodies. </itemize> <item> <itemize> <item><bf>Q</bf>: Are there other Linux RAID references? <item><bf>A</bf>: <itemize> <item>Generic RAID overview: <url url="http://www.dpt.com/uraiddoc.html">. <item>General Linux RAID options: <url url="http://linas.org/linux/raid.html">. <item>Linux-RAID mailing list archive: <url url="http://www.linuxhq.com/lnxlists">. <item>Linux Software RAID Home Page: <url url="http://luthien.nuclecu.unam.mx/˜miguel/raid">. <item>Linux Software RAID tools: <url url="ftp://linux.kernel.org/pub/linux/daemons/raid/">. <item>Linux RAID-Geschichten: <url url="http://www.infodrom.north.de/˜joey/Linux/raid/">. </itemize> </itemize> <item> <itemize> <item><bf>Q</bf>: Who do I blame for this document? <item><bf>A</bf>: Linas Vepstas slapped this thing together. However, most of the information, and some of the words were supplied by <itemize> <item>Bradley Ward Allen <<htmlurl url="mailto:ulmo@Q.Net" name="ulmo@Q.Net">> <item>Luca Berra <<htmlurl url="mailto:bluca@comedia.it" name="bluca@comedia.it">> <item>Anton Hristozov <<htmlurl url="mailto:anton@intransco.com" name="anton@intransco.com">> <item>Miguel de Icaza <<htmlurl url="mailto:miguel@luthien.nuclecu.unam.mx" name="miguel@luthien.nuclecu.unam.mx">> <item>Ingo Molnar <<htmlurl url="mailto:mingo@pc7537.hil.siemens.at" name="mingo@pc7537.hil.siemens.at">> <item>Alvin Oga <<htmlurl url="mailto:alvin@planet.fef.com" name="alvin@planet.fef.com">> <item>Gadi Oxman <<htmlurl url="mailto:gadio@netvision.net.il" name="gadio@netvision.net.il">> <item>Martin Schulze <<htmlurl url="mailto:joey@finlandia.infodrom.north.de" name="joey@finlandia.infodrom.north.de">> <item>Geoff Thompson <<htmlurl url="mailto:geofft@cs.waikato.ac.nz" name="geofft@cs.waikato.ac.nz">> <item>Edward Welbon <<htmlurl url="mailto:welbon@bga.com" name="welbon@bga.com">> <item>Rod Wilkens <<htmlurl url="mailto:rwilkens@border.net" name="rwilkens@border.net">> <item>Leonard N. Zubkoff <<htmlurl url="mailto:lnz@dandelion.com" name="lnz@dandelion.com">> </itemize> Thanks all for being there! </itemize> </enum> </p> <sect>Setup & Installation Considerations <p> <enum> <item> <itemize> <item><bf>Q</bf>: I must soon install Linux on new system, one requirement is to have RAID1. Now I'm wondering what is the easiest way to do it. <item><bf>A</bf>: I keep rediscovering that file-system planning is one of the more difficult Unix configuration tasks. To answer your question, I can describe what we did. We planned the following setup: <itemize> <item>two EIDE disks, 2.1.gig each. <tscreen> <verb> disk partition mount pt. size device 1 1 / 300M /dev/hda1 1 2 swap 64M /dev/hda2 1 3 /home 800M /dev/hda3 1 4 /var 900M /dev/hda4 2 1 /root 300M /dev/hdc1 2 2 swap 64M /dev/hdc2 2 3 /home 800M /dev/hdc3 2 4 /var 900M /dev/hdc4 </verb> </tscreen> each disk is on a separate controller (& ribbon cable). The theory is that a controller failure and/or ribbon failure won't disable both disks. Possibly get performance boost from parallel operations? <item>Install linux on <tt>/</tt> in <tt>/dev/hda1</tt> this will allow booting and subsequent installation of raid patches, etc. <item><tt>/dev/hdc1</tt> will contain a ``cold'' copy of <tt>/dev/hda1</tt>. This is NOT a raid copy, just a copy-copy. It's there just in case disk1 fails completely; we can use a rescue disk, mark <tt>/dev/hdc1</tt> as bootable, and use that to keep going, without having to reinstall the system. The theory here is that in case of severe failure, I can still boot the system without worrying about raid superblock-corruption or other raid failure modes & gotchas that I don't understand. <item><tt>/dev/hda3</tt> and <tt>/dev/hdc3</tt> will be mirrors <tt>/dev/md0</tt>. <item><tt>/dev/hda4</tt> and <tt>/dev/hdc4</tt> will be mirrors <tt>/dev/md1</tt>. <item>we picked <tt>/var</tt> and <tt>/home</tt> to be mirrored, and in separate partitions, under the following (convoluted ???) logic: <itemize> <item><tt>/</tt> will contain non-changing data — for all practical purposes, it will be read-only without actually being read-only. <item><tt>/home</tt> will contain slowly changing data — an almost-read-only system. <item><tt>/var</tt> will contain rapidly changing data, including mail spools, database contents and web server logs. </itemize> The theory is that <bf>if</bf> for some bizarre reason, the operating system goes wild, corruption is limited to one partition. Thus, if for some unlikely, hypothetical reason, the database starts scribbling everywhere, it might clobber mail and log files, but not <tt>/home</tt>. I am not entirely satisfied with my logic & reasoning, but it was the best I could do on short notice. I would like to have some scheme that verifies that files in <tt>/usr</tt> and <tt>/home</tt> are not changed, e.g. some MD5 signature scheme that is run regularly. The idea is to detect hacker intrusion as well as corruption. Similarly, the database contents are quite valuable, and I don't have a fault-tolerant plan for that that will let me sleep well at night. </itemize> So, to complete the answer to your question: <itemize> <item>install redhat on disk 1, partition 1. do NOT mount any of the other partitions. <item>install raid per instructions. <item>configure <tt>md0</tt> and <tt>md1</tt>. <item>convince yourself that you know what to do in case of a disk failure! Discover sysadmin mistakes now, and not durin an actual crisis. Experiment! (we turned off power during disk activity — this proved to be ugly but informative). <item>do some ugly mount/copy/unmount/rename/reboot scheme to move <tt>/var</tt> over to the <tt>/dev/md1</tt>. Done carefully, this is not dangerous. <item>enjoy! </itemize> </itemize> <item> <itemize> <item><bf>Q</bf>: Can I strip/mirror the root partition (<tt>/</tt>)? Why can't I boot Linux directly from the <tt>md</tt> disks? <item><bf>A</bf>: Both Lilo and Loadlin need an non-stripped/mirrored partition to read the kernel image from. If you want to strip/mirror the root partition (<tt>/</tt>), then create an unstriped/mirrored partition. Typically, this is <tt>/boot</tt>. Then you either use the initial ramdisk support, or some old patches that were posted a while back, to allow your root device to be striped. Alternately, use <tt>mkinitrd</tt> to build the ramdisk image, see below. Edward Welbon <<htmlurl url="mailto:welbon@bga.com" name="welbon@bga.com">> writes: <quote> ... all that is needed is a script to manage the boot setup. To mount an <tt>md</tt> filesystem as root, the main thing is to build an initial file system image that has the needed modules and md tools to start <tt>md</tt>. I have a simple script that does this. </quote> <quote> For boot media, I have a small <bf>cheap</bf> scsi disk (170MB I got it used for $20). This disk runs on a AHA1452, but it could just as well be an inexpensive IDE disk on the native IDE. The disk need not be very fast since it is mainly for boot. </quote> <quote> This disk has a small file system which contains the kernel and the file system image for <tt>initrd</tt>. The initial file system image has just enough stuff to allow me to load the raid scsi device driver module and start the raid partition that will become root. I then do an <tscreen> <verb> echo 0x900 > /proc/sys/kernel/real-root-dev </verb> </tscreen> (<tt>0x900</tt> is for <tt>/dev/md0</tt>) and exit <tt>linuxrc</tt>. The boot proceeds normally from there. </quote> <quote> I have built most support as a module except for the AHA1452 driver that brings in the <tt>initrd</tt> filesystem. So I have a fairly small kernel. The method is perfectly reliable, I have been doing this since before 2.1.26 and have never had a problem that I could not easily recover from. The file systems even survived several 2.1.4[45] hard crashes with no real problems. </quote> <quote> At one time I had partitioned the raid disks so that the initial cylinders of the first raid disk held the kernel and the initial cylinders of the second raid disk hold the initial file system image, instead I made the initial cylinders of the raid disks swap since they are the fastest cylinders (why waste them on boot?). </quote> <quote> The nice thing about having an inexpensive device dedicated to boot is that it is easy to boot from and can also serve as a rescue disk if necessary. If you are interested, you can take a look at the script that builds my initial ram disk image and then runs <tt>lilo</tt>. <tscreen> <url url="http://www.realtime.net/˜welbon/initrd.md.tar.gz"> </tscreen> It is current enough to show the picture. It isn't especially pretty and it could certainly build a much smaller filesystem image for the initial ram disk. It would be easy to a make it more efficient. But it uses <tt>lilo</tt> as is. If you make any improvements, please forward a copy to me. 8-) </quote> </itemize> <item> <itemize> <item><bf>Q</bf>: <quote> i just tried to create some software raid with raidtools 0.3. but i'm not able to <tt>mdrun</tt> a <tt>raid1</tt>, <tt>raid4</tt> or <tt>raid5</tt> device, just <tt>linear</tt> and <tt>raid0</tt> devices (although i applied the <tt>raid145</tt> kernel patch and the personalities are implemented). if i try to <tt>mdrun</tt> a <tt>mdadd</tt>'ed device i get something like invalid raid superblock magic. thats happens for two 100% matching partitions as well as for a larger secondary partition. </quote> Alex Pakter wrote: <quote> I have a similar situation, except that when I try to <tt>mdrun</tt> my device, I get the error: <tscreen> <verb> /dev/md0: Invalid argument </verb> </tscreen> </quote> <quote> Which I think is coming from inside the <tt>ioctl()</tt> call. I have applied the patch for raid1/4/5 functionality, and the <tt>md</tt> <tt>ioctl</tt> call should be able to support the RAID-1 functionality, but no luck. </quote> <item><bf>A</bf>: Geoff Thompson wrote: <quote> When I first encountered <tt>md</tt>, I built the RAID 0 and Linear personalities as modules, and then scratched my head when I got a similar error message. I finally clicked that I hadn't loaded the ``personality'' module that I was trying to use, and hence the error message. </quote> </itemize> </enum> </p> <sect>Error Recovery <p> <enum> <item> <itemize> <item><bf>Q</bf>: I have a RAID-1 (mirroring) setup, and lost power while there was disk activity. Now what do I do? <item><bf>A</bf>: The redundancy of RAID levels is designed to protect against a <bf>disk</bf> failure, not against a <bf>power</bf> failure. To recover from this situation, you should do the following... xxx yyy zzz </itemize> <item> <itemize> <item><bf>Q</bf>: My RAID-1 device, <tt>/dev/md0</tt> consists of two hard drive partitions: <tt>/dev/hda3</tt> and <tt>/dev/hdc3</tt>. Recently, the disk with <tt>/dev/hdc3</tt> failed, and was replaced with a new disk. My best friend, who doesn't understand RAID, said that the correct thing to do now is to ``<tt>dd if=/dev/hda3 of=/dev/hdc3</tt>''. I tried this, but things still don't work. <item><bf>A</bf>: You should keep your best friend away from you computer. Fortunately, no serious damage has been done. You can recover from this by running: <tscreen> <verb> mkraid raid1.conf -f −−only-superblock </verb> </tscreen> By using <tt>dd</tt>, two identical copies of the partition were created. This is almost correct, except that the RAID-1 kernel extension expects the RAID superblocks to be different. Thus, when you try to reactive RAID, the software will notice the problem, and deactivate one of the two partitions. By re-creating the superblock, you should have a fully usable system. </itemize> <item> <itemize> <item><bf>Q</bf>: My RAID-1 device, <tt>/dev/md0</tt> consists of two hard drive partitions: <tt>/dev/hda3</tt> and <tt>/dev/hdc3</tt>. My best (girl?)friend, who doesn't understand RAID, ran <tt>fsck</tt> on <tt>/dev/hda3</tt> while I wasn't looking, and now the RAID won't work. What should I do? <item><bf>A</bf>: You should re-examine your concept of ``best friend''. In general, <tt>fsck</tt> should never be run on the individual partitions that compose a RAID array. Assuming that neither of the partitions are/were heavily damaged, no data loss has occurred, and the RAID-1 device can be recovered as follows: <enum> <item>make a backup of the file system on <tt>/dev/hda3</tt> <item><tt>dd if=/dev/hda3 of=/dev/hdc3</tt> <item><tt>mkraid raid1.conf -f −−only-superblock</tt> </enum> This should leave you with a working disk mirror. </itemize> <item> <itemize> <item><bf>Q</bf>: If a disk is slowly failing, will it be obvious which one it is? I am concerned that it won't be, and this confusion could lead to some dangerous decisions by a sysadmin. <item><bf>A</bf>: Once a disk fails, an error code will be returned from the low level driver to the RAID driver. The RAID driver will mark it as ``bad'' in the RAID superblocks of the ``good'' disks (so we will later know which mirrors are good and which aren't), and continue RAID operation on the remaining operational mirrors. This, of course, assumes that the disk and the low level driver can detect a read/write error, and will not silently corrupt data, for example. This is true of current drives (error detection schemes are being used internally), and is the basis of RAID operation. </itemize> <item> <itemize> <item><bf>Q</bf>: What about hot-repair? <item><bf>A</bf>: There is a plan to add ``hot reconstruction'' at some point. With this feature, we can add several ``spare'' disks to the RAID set (be it level 1 or 4/5), and once a disk fails, we will reconstruct it on one of the spare disks in run time, without ever needing to shut down the array. Gadi Oxman <<htmlurl url="mailto:gadio@netvision.net.il" name="gadio@netvision.net.il">> writes: <quote> Currently, once the first disk is removed, the RAID set will be running in degraded mode. To restore full operation mode, you need to: <itemize> <item>stop the array (<tt>mdstop /dev/md0</tt>) <item>replace the failed drive <item>run <tt>ckraid raid.conf</tt> to reconstruct its contents <item>run the array again (<tt>mdadd</tt>, <tt>mdrun</tt>). </itemize> At this point, the array will be running with all the drives, and again protects against a failure of a single drive. </quote> As of 22 July 97, there is an alpha version of MD that allows <itemize> <item>hot reconstruction/resyncing for RAID-1 <item>a spare disk to be hot-added to an already running RAID-1 array </itemize> </itemize> <item> <itemize> <item><bf>Q</bf>: I would like to have an audible alarm for ``you schmuck, one disk in the mirror is down'', so that the novice sysadmin knows that there is a problem. <item><bf>A</bf>: The kernel is logging the event with a ``<tt>KERN_ALERT</tt>'' priority — Find the xxx software package for the error log files... </itemize> <item> <itemize> <item><bf>Q</bf>: Why is there no question 13? <item><bf>A</bf>: If you are concerned about RAID, High Availability, and UPS, then its probably a good idea to be superstitious as well. </itemize> <item> <itemize> <item><bf>Q</bf>: How do I run RAID-5 in degraded mode (with one disk failed, and not yet replaced)? <item><bf>A</bf>: Gadi Oxman <<htmlurl url="mailto:gadio@netvision.net.il" name="gadio@netvision.net.il">> writes: <quote> Normally, to run a RAID-5 set of n drives you have to: <tscreen> <verb> mdadd /dev/md0 /dev/disk1 ... /dev/disk(n-1) mdrun -p5 /dev/md0 </verb> </tscreen> </quote> Even if one of the disks has failed, you still have to <tt>mdadd</tt> it as you would in a normal setup. Then, <quote> The array will be active in degraded mode with (n - 1) drives. If ``<tt>mdrun</tt>'' fails, the kernel has noticed an error (for example, several faulty drives, or an unclean shutdown). Use ``<tt>dmesg</tt>'' to display the kernel error messages from ``<tt>mdrun</tt>''. </quote> If the raid-5 set is corrupted due to a power loss, rather than a disk crash, one can try to recover by creating a new RAID superblock: <tscreen> <verb> mkraid -f −−only-superblock raid5.conf </verb> </tscreen> A RAID array doesn't provide protection against a power failure or a kernel crash, and can't guarantee correct recovery. Rebuilding the superblock will simply cause the system to ignore the condition by marking all the drives as ``OK'', as if nothing happened. </itemize> </enum> </p> <sect>Performance, Tools & General Bone-headed Questions <p> <enum> <item> <itemize> <item><bf>Q</bf>: I have 2 Brand X super-duper hard disks and a Brand Y controller. and am considering using <tt>md</tt>. Does it significantly increase the throughput? Is the performance really noticeable? <item><bf>A</bf>: <descrip> <tag>Linux MD RAID-0 (striping) performance:</tag> Must wait for all disks to read/write the stripe. <tag>Linux MD RAID-1 (mirroring) read performance:</tag> MD implements read balancing. In a low-IO situation, this won't change performance. But, with two disks in a high-IO environment, this could as much as double the read performance. For N disks in the mirror, this could improve performance N-fold. <tag>Linux MD RAID-1 (mirroring) write performance:</tag> Must wait for the write to occur to all of the disks in the mirror. </descrip> </itemize> <item> <itemize> <item><bf>Q</bf>: Are linear MD's expandable? Can a new hard-drive/partition be added, and the size of the existing file system expanded? <item><bf>A</bf>: Miguel de Icaza <<htmlurl url="mailto:miguel@luthien.nuclecu.unam.mx" name="miguel@luthien.nuclecu.unam.mx">> writes: <quote> I changed the ext2fs code to be aware of multiple-devices instead of the regular one device per file system assumption. <quote> </quote> So, when you want to extend a file system, you run a utility program that makes the appropriate changes on the new device (your extra partition) and then you just tell the system to extend the fs using the specified device. <quote> </quote> You can extend a file system with new devices at system operation time, no need to bring the system down (and whenever I get some extra time, you will be able to remove devices from the ext2 volume set, again without even having to go to single-user mode or any hack like that). </quote> <quote> You can get the patch for 2.1.x kernel from my web page: <tscreen> <url url="http://www.nuclecu.unam.mx/˜miguel/ext2-volume"> </tscreen> </quote> </itemize> <item> <itemize> <item><bf>Q</bf>: Where can I put the <tt>md</tt> commands in the startup scripts, so that everything will start automatically at boot time? <item><bf>A</bf>: Rod Wilkens <<htmlurl url="mailto:rwilkens@border.net" name="rwilkens@border.net">> writes: <quote> What I did is put ``<tt>mdadd -ar</tt>'' in the ``<tt>/etc/rc.d/rc.sysinit</tt>'' right after the kernel loads the modules, and before the ``<tt>fsck</tt>'' disk check. This way, you can put the ``<tt>/dev/md?</tt>'' device in the ``<tt>/etc/fstab</tt>''. Then I put the ``<tt>mdstop -a</tt>'' right after the ``<tt>umount -a</tt>'' unmounting the disks, in the ``<tt>/etc/rc.d/init.d/halt</tt>'' file. </quote> For raid-5, you will want to look at the return code for <tt>mdadd</tt>, and if it failed, do a <tscreen> <verb> ckraid −−fix /etc/raid5.conf </verb> </tscreen> to repair any damage. </itemize> <item> <itemize> <item><bf>Q</bf>: I have SCSI adapter brand XYZ (with or without several channels), and disk brand(s) PQR and LMN, will these work with md to create a linear/stripped/mirrored personality? <item><bf>A</bf>: Yes! </itemize> <item> <itemize> <item><bf>Q</bf>: I was wondering if it's possible to setup stripping with more than 2 devices in <tt>md0</tt>? This is for a news server, and I have 9 drives... Needless to say I need much more than two. Is this possible? <item><bf>A</bf>: Yes. (describe how to do this) </itemize> </enum> </p> <sect>Questions waiting for answers: <p> <enum> <item><bf>Q</bf>: <itemize> <item>What are the option you have used for formating the (raid) disks? I used: <tscreen> <verb> mke2fs -b 4096 -R stride=4 ... blah </verb> </tscreen> or is it supposed to be 64K × 4 drives: <tscreen> <verb> mke2fs -b 4096 -R stride=262000 ... blah </verb> </tscreen> are there any other options ? </itemize> <item><bf>Q</bf>: <itemize> <item>For testing the raw disk thru put... is there a character device for raw read/raw writes instead of <tt>/dev/sdaxx</tt> that we can use to measure performance on the raid drives?? is there a GUI based tool to use to watch the disk thru-put?? </itemize> </enum> </p> <sect>Wish list of enhancements to MD and related s/w <p> Bradley Ward Allen <<htmlurl url="mailto:ulmo@Q.Net" name="ulmo@Q.Net">> wrote: <quote> Ideas include: <itemize> <item>Bootup parameters to tell the kernel which devices are to be MD devices (no more ``<tt>mdadd</tt>'') <item>Making MD transparent to ``<tt>mount</tt>''/``<tt>umount</tt>'' such that there is no ``<tt>mdrun</tt>'' and ``<tt>mdstop</tt>'' <item>Integrating ``<tt>ckraid</tt>'' entirely into the kernel, and letting it run as needed </itemize> (So far, all I've done is suggest getting rid of the tools and putting them into the kernel; that's how I feel about it, this is a filesystem, not a toy.) <itemize> <item>Deal with arrays that can easily survive N disks going out simultaneously or at seperate moments, where N is a whole number > 0 settable by the administrator <item>Handle kernel freezes, power outages, and other abrupt shutdowns better <item>Don't disable a whole disk if only parts of it have failed, e.g., if the secter errors are confined to less than 50% of access over the attempts of 20 dissimilar requests, then it continues just ignoring those sectors of that particular disk. <item>Bad secters: <itemize> <item>A mechanism for saving which secters are bad, someplace onto the disk. <item>If there is a generalized mechanism for marking degraded bad blocks that upper filesystem levels can recognize, use that. Program it if not. <item>Perhaps alternatively a mechanism for telling the upper layer that the size of the disk got smaller, even arranging for the upper layer to move out stuff from the areas being eliminated. This would help with a degraded blocks as well. <item>Failing the above ideas, keeping a small (admin settable) amount of space aside for bad blocks (distributed evenly across disk?), and using them (nearby if possible) instead of the bad blocks when it does happen. Of course, this is inefficient. Furthermore, the kernel ought to log every time the RAID array starts each bad secter and what is being done about it with a ``<tt>crit</tt>'' level warning, just to get the administrator to realize that his disk has a piece of dust burrowing into it (or a head with platter sickness). </itemize> <item>Software-switchable disks: <descrip> <tag>``disable this disk''</tag> would block until kernel has completed making sure there is no data on the disk being shut down that is needed (e.g., to complete an XOR/ECC/other error correction), then release the disk from use (so it could be removed, etc.); <tag>``enable this disk''</tag> would <tt>mkraid</tt> a new disk if appropriate and then start using it for ECC/whatever operations, enlargening the RAID5 array as it goes; <tag>``resize array''</tag> would respecify the total number of disks and the number of redundant disks, and the result would often be to resize the size of the array; where no data loss would result, doing this as needed would be nice, but I have a hard time figuring out how it would do that; in any case, a mode where it would block (for possibly hours (kernel ought to log something every ten seconds if so)) would be necessary; <tag>``enable this disk while saving data''</tag> which would save the data on a disk as-is and move it to the RAID5 system as needed, so that a horrific save and restore would not have to happen every time someone brings up a RAID5 system (instead, it may be simpler to only save one partition instead of two, it might fit onto the first as a gzip'd file even); finally, <tag>``reenable disk''</tag> would be an operator's hint to the OS to try out a previously failed disk (it would simply call disable then enable, I suppose). </descrip> </itemize> </quote> </p> </article>