RAID: Redundant Array of Independent Disks
This page has the following topics: RAID overview, definitions of RAID levels, and details of implementations
- RAID history/overview
- Abbreviation: New expansion, same technology
Wikipedia's article on RAID: “History” section says “The term RAID was first defined by David A. Patterson, Garth A. Gibson and Randy Katz at the University of California, Berkeley, in 1987.” They “published a paper: "A Case for Redundant Arrays of Inexpensive Disks (RAID)" in June 1988 at the SIGMOD conference.” (The cited hyperlink has been added to, and integrating directly into, the quote.)
However, economics may change. While multiple smaller drives may have once been more expensive than a “Single Large Expensive Drive” (which has been abbreviated “SLED”), a full RAID solution including an expensive hardware controller may cost more expensive than a larger drive. The term RAID still refers to data redundancy, but is now more typically recognized by using the word “independent” instead of “inexpensive”.
- RAID levels
- Most common single levels
- [#raidzero]: RAID 0: Striping/Spanning (“Zero Protection”)
Offers zero redundancy, and can be extremely destructive to data in the event of a hardware failure. If any hard drive in a RAID 0 fails, all of the data may become fairly useless. This includes any data on another hard drive which has not failed. This technology may be combined with other RAID levels (as noted in the section about “Nested RAID Levels”) to offset the greater risk of data being lost (since there are multiple points of failure when multiple drives are being used, and there is no increase in redundancy).
In fact, while other RAID levels often provide some amount of redundancy, using RAID 0 actually increases how much data is prone to experience a significant loss. Data which exceeds a certain size (which might be very small -- maybe just a few lines of text) may get distributed over multiple disks. Therefore, if a single drive fails, chances are quite high that some important data will be lost, including parts of large files as well as critical data like a file allocation table that keeps track of where pieces of files get stored.
A key advantage gained by using RAID 0 is a feature called “striping”. The advantage of using stripes is an increase in speed. RAID 0 is often recommended when there is sufficient available hardware. (Namely, this is commonly used when hardware “controller” circuitry can access multiple data storage devices simultaneously, which benefits speed.) The downsides of greater risk to data may not matter so much if redundancy is implemented differently (such as using another RAID layer in a multiple-RAID-layer setup) or when data is of little long term value, such as swap space on a machine which can afford to go down.
Another way to utilize multiple disks is to use &ldqo;spanning”, by using the “JBOD” concept, which stands for “just a bunch of disks”. (Yes, the “JBOD” abbreviation includes a letter from the word “of”, but does not include the letter from the word “a”.) This basically refers to the idea that there is a “volume” (which is a group of storage space) that spans over multiple disks, and a user might even be able to increase the size of that volume very easily, just by adding another disk. This easy flexibility to utilizie additional disk space is an advantage people can get by using spanning.
The difference between spanning and RAID 0's striping is that RAID 0's striping will utilize additional disks right away, which increases speed, while spanning might just add additional space after the other available space. So, while spanning offers simplicity in how volumes get used, easily providing additional data storage capacity (“disk space”), a newly added drive might not have its space be used up right away, resulting in a lack of an increase in speed.
The author of this text has heard multiple opinions on whether spanning is considered to be a feature that is part of “RAID 0”, or if “RAID 0” only refers to striping. Until researched further, this text does not take a hard stance on this.
- RAID 1: Mirroring
For each even number of drives, one (even-numbered) drive stores the exact same data as each (odd-numbered) drive. This way, if one drive is lost, its contents may be re-built. If parallel disk access is working well, writing should not really need to take any substantially longer amount of time than writing to a single drive. Reading may actually go faster than using a single drive, because the speed of the two drives may be able to be added.
However, some of this speculation about speed is just conjecture: reality may vary based in implementation. Hardware-based RAID may involve “controller” circuitry that can use multiple drives at once, and so that may cause additional speed when the controller's circuitry can operate faster than the data storage devices. When large data storage typically used magnetic hard drives, this was a significant advantage. However, software-based RAID 1 implementations might be less prone to gaining some of the speed advantage.
- RAID1E (Mirroring, Enhanced)
Wikipedia's article on RAID: section about “Non-standard levels” says “IBM (among others) has implemented a RAID 1E (Level 1 Enhanced). It requires a minimum of 3 drives. It is similar to a RAID 1+0 array, but it can also be implemented with either an even or odd number of drives. The total available RAID storage is” half of the amount of data that can be stored on the disks.
The only particular downside notable for using RAID1E, compared to RAID1, is just that RAID1E is less commonly supported.
- Some less common RAID levels
RAID levels 2 through 4 are really not seen very often. A lot of modern, lower end hardware may support RAID levels 0, 1, and 5, and perhaps also 10 and 6, but not support RAID levels 2 through 4. Despite the relative unimportance of people knowing these RAID levels, they are covered here primarily because inquiring minds are often curious. Also, there is some benefit to learning about how any one of these infrequently used RAID levels works, because that is essentialy the same as learning a lot of the details about RAID 5, so there's no huge and compelling reason to cmopletely skip over these.
Wikipedia's article on RAID: section titled “Overview” says, “Most” RAID levels “use simple XOR, but RAID 6 uses two separate parities”. Also, Wikipedia's article on RAID (“Standard levels” section) notes that RAID 2 uses “Hamming-code parity”.
- [#raidtwo]: RAID 2
RAID 2 stores a bit of parity on a dedicated drive. So, after a bit is stored on each other drive, a calculation is done and stores a bit on that parity drive.
Wikipedia's article on RAID (“Standard levels” section) notes, “This level is of historical significance only; although it was used on some early machines” ... “as of 2014 it is not used by any commercially available system.”
- [#raidthre]: RAID 3
Wikipedia's article on RAID (“History” section) notes, “RAID 3 and RAID 4 are often confused and even used interchangeably.” Later on the same page, Wikipedia's article on RAID (“Standard levels” section) notes, “RAID 3 consists of byte-level striping with dedicated parity.” So, basically RAID 3 is like RAID 2, except it stores parity a block at a time, rather than a bit at a time. Wikipedia's article on RAID (“Standard levels” section) notes, “Although implementations exist, RAID 3 is not commonly used in practice.”
- [#raidfour]: RAID 4
Just as RAID 3 is better than RAID 2 because handling bytes is faster than handling bits, RAID 4 is yet better than RAID 3. RAID 4 works with blocks, rather than just bytes.
A “block” is a certain number of bits. Examples of block sizes might be 2,048 bits (one half of a kilobyte), or perhaps 262,144 bits (32 kilobytes), or something quite notably larger (perhaps 4,194,304 bits... half a megabyte).
The best “block size” to use will be dependent on hardware. The basic advantage to using the ideal block size will be speed. Using an unideal block size will technically work, but may just be a bit slower than if an optimum size is chosen when the RAID is set up.
- [#raidfive]: RAID 5: Striping With Parity
The “parity” bits in RAID 5 are very useful if a single drive becomes unavailable, which causes the RAID to enter a “degraded” state. RAID 5 stores these parity bits so that RAID 5 can rebuild an array, back to full optimum online health, if a drive becomes unavailable.
For three drives, a simple and common way to implement RAID 5 is to use a “boolean logic” operator named “exclusive OR” (commonly abbreviated as “XOR”). This can be calculated simply by determining if there is any difference. If the bits are different, the result is a one. Otherwise, if there is “zero” difference between the bits, then a zero is used as parity. (This can be seen in the data shown in the example.)
You can see this in the following example.
Here is a visual example, using a quite unlikely stripe size of a byte of data, using a nibble of parity. The reason this example uses a stripe size of just a byte large is to show things on a small scale, so that we can easily see how this works conceptually. In reality, the size of each stripe may be many hundreds of bits. (When this text just said “many hundreds”, that word “many” definitely applies. A stripe size could most certainly be millions of bits.) Using a very small stripe size, to be able to see some details more visually, seemed much more appropriate.
This sample shows storing three bytes, with the following simple short message: “Now Go!” This will look like the following:
Hexadecimal value Decimal value Binary value ASCII character 4E 78 01101110 N 6F 111 01101111 o 77 119 01110111 w 20 32 00100000 47 71 01100111 G 6E 110 01101111 o 21 33 00100001 !
And next is what it would look like in RAID 5, using a nibble-sized block.
Note that in the following chart, the “p” and “d” letters shown in the table would not actually be written do the disk. The point of showing the “p” and “d” letters is just to help identify whether a bit was meant to be treated as stored data, or as a “parity” bit which helps to make a not about what the other data looks like.
Drive One Drive Two Drive Three Char 0d 1d 1p N 1d 0d 1p 1d 1d 0p 1d 1d 0p 1d 0p 1d o 1d 1p 0d 1d 0p 1d 1d 0p 1d 0p 0d 1d w 1p 1d 0d 1p 1d 1d 1p 0d 1d 0p 0d 0p 1p 0d 1p 0p 0d 0p 0p 0d 0p 0d 1p 1d G 1d 1p 0d 1d 0p 1d 1d 0p 1d 1p 0d 1d o 1p 1d 0d 0p 1d 1d 0p 1d 1d 1p 0d 1p ! 1p 0d 1p 0p 0d 0p 0p 1d 1p
RAID 5 may use a XOR method to store extra data. If the XOR data is lost, it may be re-calculated. If another drive is lost, the remaining data plus the parity bit can be re-calculated by determining what bit, when XOR'ed, would generate the available parity bit.
- RAID 5's Stripes
The best way to think about a “RAID 0” stripe and a “RAID 5” stripe is that they are entirely different things. A “RAID 0” stripe consists of the entire “RAID 0” dataset, so multiple entire drives may be part of one “RAID 0” set. In contrast, a “RAID 5” stripe is simply an amount of data that is written while parity is written to one drive, before the parity location moves.
What makes RAID 5 notably different than RAID 4 is RAID 5's concept of striping. Basically, one drive is where the parity gets used for a number of bits, but then a different drive stores parity for a while. You can clearly see that in the above example: the third drive stored parity for a while, and then the second drive stored parity for a while. If all of the bits were visible, a person could draw some stripes that show which drive is stroing the parity, and the black backgrounds in that chart are meant to show that effect of “stripes”. This may result in higher speed (with at least some hardware implementations).
The point to having the parity being striped (as seen in RAID 5), rather than just being stored on one disk (as seen in RAID 4), is simply this: speed. Somehow, at least some RAID solution(s) have been able to gain an advantage of speed by placing the parity on different drives at different points.
The ideal “stripe size” to use will typically be based on how the RAID is implemented. If a wrong “stripe size” is used, the effect is just a lower speed. The typical way to figure out the best stripe size is simply to read documentation, which may involve documentation for “RAID controller” circuitry, and/or documentation for the data storage devices being used.
The stripe size is typically customizable in the beginning, when the RAID gets created. Once the RAID is created, that stripe size is set and is not typically adjustable (easily) without deleting the RAID, which causes the RAID to no longer be recognized (essentially losing any data that was stored on the RAID).
- RAID 5 requirements/overhead
RAID 5 needs at least three drives. The cost of overhead is often described as “1/n”, where “n” represents the number of drives used. So, in a setup using 3 data storage devices, n=3 which means that 1/3 (just over 33%) of the space is used for overhead. In a setup using four data storage devices, n=4 so 1/4 (25%) of the space is used for overhead. In a setup using five data stroage devices, n=5 so 1/5 (20%) of the space is used for overhead.
As you can see, adding more drives results in less space wasted in overhead. Adding more drives may also add to speed. These are positive things. However, there are also a couple of downsides. Adding more drives does increase overall cost. Also, when there are more drives, there are more parts, and so there is probably an increased chance that at least one drive will fail, causing the entire RAID away to go into a “Degraded” state.
- [#raidsix]: RAID 6
“Storage Network Industry Association” (“SNIA”)'s definition of “Redundant Array of Independent Disks” (“RAID”) 6 defines RAID 6 as “Any form of RAID that can continue to execute read and write requests to all of a RAID array's virtual disks in the presence of any two concurrent disk failures.” The definition goes on (in a separate paragraph) to say, “Several methods, including dual check data computations (parity and Reed Solomon), orthogonal dual parity check data and diagonal parity have been used to implement RAID Level 6.” Wikipedia's article on RAID (section titled “Overview”) notes “RAID 6 uses two separate parities based respectively on addition and multiplication in a particular Galois field or Reed–Solomon error correction.” Wikipedia's statement is likely just mentioning what is rather common.
Wikipedia's article on RAID: Section on computing RAID 6 parity notes how two bits are used to keep track of the other bits. One of the bits can be as simple as a XOR calculation. If a single drive is lost then recovery is available and if that drive isn't the second extra bit, then recovery can be as simple and fast as using XOR logic similar to RAID 5. At least if the first bit is a simple XOR, the second bit may be more complicated, and is described in some further detail by the cited section of the cited Wikipedia article.
- Math behind RAID 6
Here is some information seen from Wikipedia's article on RAID: Section on computing RAID 6 parity.
- A verbal reading
To demonstrate this for a class, I may pull up the Wikipedia's article on RAID: Section on computing RAID 6 parity. Then, I read the first paragraph that is quoted below in the section called “Mostly text from Wikipedia”.
Then, here is how I attempt to read the text from Wikipedia, a bit more verbally. (This might be spoken in a way that is slightly mathematically incorrect. It's just a reflection of a best attempt to verbalize things that I could do at the time this was written.)
“To deal with the second syndrome, the Galois field function GF of m is introduced with m equal to two raised to the kth power, where GF of m is approximately equal to F sub/base 2 of x divided by p of x for a suitable irreducable polynomial which is p of x, of degree k. A chunk of data can be written as the derivative... is this a derivative? We have a d and some stuff written in subscript. Based on my memories of calculus, I think we're looking at calculus derivatives here. So, a chunk of data can be written as the calculus derivative of k minus one times the derivative k minues 2 and so forth, up through derivative zero in binary where each derivative of i is binary, meaning it is either zero or one.”
Okay, that's about a quarter of the way through the “math”[-type of] talk... By “a quarter”, I mean a fourth. And if what I just said is too mathematical for your comfort, you're really not going to love what is coming up next...
“This is chosen to correspond with element d sub k minus one, times x raised to the k minus oneth power, plus d sub k minus two, times x raised to the k minus twoth power, and so on, adding up to d sub onetimes x (raised to the first power), plus d sub zero through D sub n minus one, when an element of Galois Field of m correspond to the stripes of data across hard drives encoded as field elements in this manner. In practice they would probably be broken into byte-sized chunks. If g is the generator of the field and we start using “circled plus” to denote addition in the field while concatenation denotes multiplication, then P and Q may be computed as follows, where n denotes the number of data disks:”
(Whew! Here we go...)
Let P equal the circled plus of i for every i where D of i equals d sub zero circled plus d sub one circled plus d sub two circled plus, and so forth, until we reach circled plus of d sub n minus one.
and let Q qual the circled plus of i for ever i where g raised to the ith power times D sub i equals one, since anything raised to the zero'th power is one, times D sub zero, circled plus g, which I'm not going to bother raising to the first power, times D sub 1, circled plus g squared times D sub 2, and even though it's not printed here, next would be g cubed times D sub three, circled plus with a continuation on with that same pattern, until we circle plus g raised to the n minus oneth power times D sub n minus one.
Computer scientists are encouraged to think of circled plus as a bitwise XOR operator and gi is the action of a linear feedback shift register on a chunk of data. So, in this garbledegook of a messy looking formula that was shown, the calculation of P is just the bitwise “exclusive or” of each stripe. There's a reason for this. And if you actually understood all that, you end up with the simple idea that Q is the “exclusive or” results of a shifted version of each stripe.
- Mostly text from Wikipedia
“For a Reed Solomon implementation,” the computer uses a couple of things that are eached called a “syndrome”. As an example, one of these two syndromes may be “the simple XOR of the data across the stripes, as with RAID 5. A second, independent syndrome is more complicated and requires the assistance of field theory.”
“To deal with” the second syndrome, “the Galois field GF(m) is introduced with m = 2k, where GF(m) ≅ F2[x]/(p(x)) for a suitable irreducable polynomial p(x) of degree k. A chunk of data can be written as dk-1dk-2...d0 in base 2 where each di is either zero or one. This is chosen to correspond with element dk-1xk-1+dk-2xk-2+...d1x+d0 in the Galois field. Let D0,...,Dn-1 ∈ GF(m)
Di = D0 ⊕ D 1 ⊕ D 2 ⊕ ... ⊕ D n-1
giDi = g0D0 ⊕ g1D1 ⊕ g2D2 ⊕ ... gn-1Dn-1
D0 ⊕ D 1 ⊕ D 2 ⊕ ... ⊕ D n-1
If you have enough mathematical training that you felt like you could understand all that, then kudos for paying good attention during some math classes. If you couldn't fathom tht information even as it was presented, then just understand this: calculating RAID 6 is more computationally intensive than just seeing whether bits are the same.
- [#mdfraid]: MDF RAID
- “Storage Network Industry Association” (“SNIA”)'s Common “Redundant Array of Independent Disks” (“RAID”) Disk Data Format (“DDF”) version 2.0 (available as a PDF file) describes this as “Multi disk Failure RAID. Similar to RAID-6, but supporting more than two physical disk failures”.
- [#raidnstd]: Nested RAID Levels
- [#raidyx]: Painfully reversed RAID acronyms
PC Guide article on multi-RAID levels describes this, starting by staying:
Naming conventions for multiple RAID levels are just horrible. The standard that most of the industry seems to use is that if RAID level X is applied first and then RAID level Y is applied over top of it, that is RAID "X+Y", also sometimes seen as "RAID XY" or "RAID X/Y".
in fact the terminology that most companies use. Unfortunately, other companies reverse the terms! They might call the RAID 0 and then RAID 1 technique "RAID 1/0" or "RAID 10" (perhaps out of fear that people would think "RAID 01" and "RAID 1" were the same thing). Some designers use the terms "RAID 01" and "RAID 10" interchangeably. The result of all this confusion is that you must investigate to determine what exactly a company is implementing when you look at multiple RAID. Don't trust the label.
- [#raid0x]: RAID 0+X (RAID 0+#, RAID 0+?, RAID 0+1 through RAID 0+6, RAID 0# and similar versions without a plus sign)
Note that RAID 0X (e.g. RAID 01) is not meant to be considered the same thing as RAID X (e.g. RAID 1). The leading zero has a significant impact. For this reason it may be nicer to call it RAID 0+X so that people are less likely to want to drop a leading zero.
Note that in some cases RAID 0+1 may refer to what is more commonly referred to as RAID 10, as described by the RAID nomenclature note.
Striping first and then adding redundancy is not recommended. In some cases an intelligent RAID implementation, particularly a hardware card, may deal with failures in such an intelligent way that the difference between something like RAID10 and RAID01 may not be very noticable. PC Guide article on multi-RAID levels notes, “Unfortunately, most controllers aren't that smart.” The article goes on to say that “in general, a controller won't swap drives between component sub-arrays unless the manufacturer of the controller specifically says it will.”
RAID 1+0 is better than RAID 0+1 describes the process of dealing with redundancy before striping as better than the alternative order. Speedwise, the implementations are generally identical or nearly so. The big difference tends to be related to how the setup (meaning, the entire array) is affected by a lost drive.
So, to clarify, add redundancy first. This is more commonly known as RAID 10, so RAID 10 is better than RAID 01.
What this means is that the mirrors should be created first, and so those will be the inner RAID arrays, and then the stripe would come later, so the stripe would be the outer RAID array.
There are multiple reasons why...
Also, when using Mirroring first (commonly RAID 10) instead of Sriping first (commonly RAID 01), more RAID components may report as functioning optimally well, and rebuilding may require work from less equipment, and rebuilding times may be shorter.
To demonstrate this, we consider what would happen if eight drives were set up, with four drives being in a RAID 01 and four drives being in a RAID 10. A random drive in the first setup goes offline, and a random drive in the second setup goes offline. (In the following example, these are drives 2 and 6. However, there is nothing special about being the second drive in an array. The examples used the same drive just for an easier side-by-side comparison, but the essential results would be the same no matter which drive stopped working.
Stripe First (usually RAID 0+1) Mirror First (usually RAID 1+0) Mirror-3c [Degraded] Stripe-1a [Offline] Drive 1: [OK] Drive 2: [Bad] Stripe-2b [Online] Drive 3: [OK] Drive 4: [OK] Stripe-6f [Online] Mirror-4d [Degraded] Drive-5: [OK] Drive-6: [Bad] Mirror-5e [Online] Drive-7: [OK] Drive-8: [OK] Description of Current Status:
- RAID 0 Stripes require all components to be operational, so Stripe-1a is entirely offline.
- A mirror is degraded if one of its components goes offline, so the entire Mirror-3c is marked as degraded (due to Stripe-1a being offline)
- A mirror is degraded if one of its components goes offline, so the Mirror-4d is marked as degraded.
- RAID 0 Stripes require all components to be operational, and both mirrors do function, so Stripe-6f is in Online (Healthy) status.
Risk Analysis: Until everything is brought back to optimum health...
- If drive 1 also failed, then no components would become less healthy, because Stripe-1b is already in the worst state, being offline. So, the second drive failure would just make Stripe-1b require more work to be brought online.
- If drive 3 or 4 also failed, then Stripe-2c would go offline, which would bring Mirror-1a offline, causing a catastrophic total failure of the entire mirror.
- So, if a second drive failed, there'd be a 2/3 chance that the second drive failure results in a catastrophic total failure of all the data.
- If drive 5 also failed, then Mirror-4d would go into an Offline state. That would bring down Stripe-6f, which would result in a total catastrophic failure of the entire stripe.
- If a second drive did fail, and that second drive that was in another stripe (e.g., either Drive 7 or Drive 8), then Mirror-5e would go into a Degraded state. Stripe-6d would still see two working components, and would be unaffected, reporting that it is on an Online state.
- So, if a second drive failed, there'd be a 1/3 chance that the second drive failure results in a catastrophic total failure of all the data.
Rebuilding the mirror...
One a bad drive is replaced, bringing things back to optimum/online health could be done simply be re-building the mirror. A simple rebuild of the entire mirror can be achieved by copying all the data from one of Mirror-3c's members, which is the working stripe (Stripe-2b), and copying that to another one of Mirror-3c's members, which is the non-working stripe (Stripe-1a). In the above example, that would involve copying data from Drive 3 to overwrite the identical data seen in Drive 1. This unnecessary work adds “wear and tear” usage of the drives, and could extend the time required for a re-build, and reduce RAID responsiveness during the re-build.
Getting everything back to full health just involves Mirror-4d rebuilding, which requires reading from one drive (Drive 5) to restore data to a single other drive (Drive 6). Therefore, full optimum health only requires two drives. This may be able to happen more quickly than if all drives were being used in a process of “re-buiding”. Further commentary...
As acknowledged by the PC Guide article on multi-RAID levels, the hard drives from the broken stripe could be used by a controller. In that case, Drive 1 might not need to have data from Drive 3 copied. However, this would require a controller that had a design that was advanced enough to notice this opportunity, so that faster approach might not be an option with some RAID 1 setups.
As already noted, if drive 5 were to fail, that would bring down the entire Stripe-6f. Mirror-5e would seem to be unaffected, but since it just contains partial data from a RAID 0 stripe, that preserved data may be rather useless on its own. (Recovery might not be easy, and only be unacceptably partial.) However, keeping Mirror-5e online means that only Mirror-4d will need to be restored from backup, and then everything will work (without needing to restore Mirror-5e from backup).
As long as the degraded mirror keeps providing data, some speed advantage from load balancing the mirrors may still be available (even though one of the mirrors may be operating a bit slower). Fixing the issue involves simply re-building the degraded mirror, so the re-building process may only require writing a bunch of data to just one drive.
The above section on “Health” touched upon the concept of “Risk”, but also spent a bit more time looking at the health state of each array (in a nested array setup). This section takes a closer look at just the topic of “Risk”, and looks at a 8-drive setup to demonstrate this.
To keep things simple, this example treats all online drives as if they are at equal risk of going down. (Since this simplistic example didn't introduce data from which to make any other calculations, there is no calculable higher risk for any one drive compared to any other.)
Stripe First (usually RAID 0+1) Mirror First (usually RAID 1+0) Mirror-9c [Degraded] Stripe-7a [Offline] Drive-1: [OK] Drive-2: [OK] Drive-3: [Bad] Drive-4: [OK] Stripe-8b [Online] Drive-5: [OK] Drive-6: [OK] Drive-7: [OK] Drive-8: [OK] Stripe-12f [Online] Mirror-10d [Online] Drive-9: [OK] Drive-10: [OK] Drive-11: [OK] Drive-12: [OK] Mirror-11e [Degraded] Drive-13: [OK] Drive-14: [Bad] Drive-15: [OK] Drive-16: [OK] Description of Current Status:
- If a random drive fails, there is a 4/7 chance (just over 57%) that the next drive to fail will be either Drive 5, 6, 7, or 8. This would cause Stripe-7a to go offline, bringing down the entire Mirror-9c which would be a catastrophic total failure.
- If a random drive fails, there is a 1/7 chance (just under 14.3%) that the next drive to fail would be the one needed to restore Drive-14 (presumably Drive-13), sending Mirror-11e into an Offline state, bringing down the entire Stripe-12f which would be a catastrophic total failure.
Either way could result in a catastrophic total failure. However, do you prefer odds of 4/7 of such a tragedy, or just odds of 1/7?
- A 6-drive exmaple
In RAID 01, if you have a 6-drive array, and one drive flops, and then another drive flops, the chances that the second drive are in the same stripe are 40% (2 out of the remaining 5 drives). So there's a 40% chance that another lost drive won't interfere with easy recovery, and a 60% chance that the other stripe will go down.
If RAID 10 is used with six drives and one drive flops, the chances that the second drive flopping will cause issues (by being in the same mirror) are 20% (one drive out of the remaining five).
How this works can be seen a bit more easily by looking 8-drive example shown above. The chance of a smooth recovery offered by RAID01 is 40%, while RAID10's chances are 80%.
- RAID 10
- A forum post discussion RAID says some vendors allow odd number of RAID 10 disks. Wikipedia's article on RAID: section about “Non-standard levels” notes that Linux's MD's RAID10 “can have any number of drives, including odd numbers.”
- Supporting software
A key feature desired about a RAID driver is the ability to detect problems with the RAID away, such as when a drive has dropped out. Another key feature is the ability to work with responding to the event, such as beginning a rebuild process on a drive. For hardware RAID cards which have an embedded noise-making alarm that can be set off when a problem with the RAID is detected, another desirable feature for the supporting RAID software is to turn off the noise once someone has started to take care of the problem (such as investigating why the noise has been made).
Some RAID software has been known to show only minimal information when an event has occurred. Using more up to date software would help provide further details to determine which drive a problem was detected with. In some cases updated software may possibly even provide different details, such as more details, about past events (in addition to future events).
The OpenBSD team has made some commentary about RAID software with the release of OpenBSD 3.8. See the commentary (on the left hand side) at OpenBSD Songs/Lyrics (song for OpenBSD 3.8: “Hackers of the Lost RAID”).
- Comparing pure software-based solutions to pure hardware-based solutions
- Why happy hardware RAID is preferable over sad software
Software RAID has some advantages over hardware RAID, but before choosing a purely software-based solution, there are some things to know about. One is that support for a purely software-based solution is typically not available until after the operating system loads support for the software solution, and so the operating system may need to be loaded before any content from software RAID is accessible. A lot of data may be on hard drives or partitions that are handled by software RAID, but the operating system typically cannot be. Even the operating system may typically be stored on a disk that is part of a hardware-based RAID implementation.
How soon the softare RAID support starts can be one limitation. There is another aspect to this consideration: if, and not just when. If the software RAID is not supported by the operating system being used (possibly due to a change/upgrade of the operating system), then even the data may not be quite so accessible. A specific software RAID implementation may not be supported by some operating systems. However, no special software support is needed for a hardware RAID option to provide the most basic operations. This is because the hardware RAID is usually largely handled by the hard drive controller to such an extent that even an unsupporting operating system will work. Granted, any sort of RAID maintenance may require using interactive code from an Option ROM (or perhaps the BIOS setup routine). In this case, changes cannot be done while the preferred operating system is currently running. However, unpleasant options can often be less limiting than non-options.
Another disadvantage to software RAID may be speed: hardware RAID implementations may be custom designed to support parallel drive access in superior ways so that overhead on other parts of the system, such as the CPU, may be less than what is experienced with software RAID solutions.
- Why super software RAID is preferable over horrible hardware
A chief disadvantage of hardware-based RAID may be cost. Specialized hardware may come at a price for initial investment.
To explain this incompatibility: Although the definitions of RAID 0 and RAID 1 are pretty standardized (as well as RAID 5, although the size of the stripes may vary), the RAID volumes may start with some sort of incompatible headers which lead to compatibility issues. Therefore, another issue is that the volumes of one RAID method may not work with a different RAID implementation, and with hardware-based RAID this ends up meaning that a requirement of compatible hardware may be impactful. For example, if using some hardware-based RAID support that is built into a motherboard, and the motherboard stops working (which may be for reasons unrelated to the RAID support), choosing a different motherboard model of motherboard may not allow the old data to be easily accessed if the new motherboard doesn't read the RAID that was used by the old motherboard. However, trying to use the old model of motherboard may often be undesirable: if the older motherboard model is discontinued and isn't as easily available, the effort required to obtain such a compatible motherboard may be unpleasantly significant. Adding to the unpleasantness, if a rare old motherboard is found, spending money on a motherboard that uses older technology may not be a nice solution (even if it does seem to be the only feasible solution).
- [#obvsadptc]: RAID hardware: Some OS support
- Adaptec Issues
OpenBSD/i386 platform FAQ: archived by the Waybck Machine @ archive.org used to have quite a bit more data about various supported hardware, while OpenBSD 5.5's version of that page removed a lot of those details. With the older versions, the “RAID and Cache Controllers” section had the following note about the level of (non-)support from Adaptec.
Note: In the past years Adaptec has lied to us repeatedly about forthcoming documentation which would have allowed us to stabilize, improve and manage RAID support for these (rather buggy) raid controllers.
As a result, we do not recommend the Adaptec cards for use.
Even after that comment was removed from the website, problems with Adaptec cards have also been noted in the OpenBSD FAQ on the “aac” driver used for Adaptec's FSA controllers, which has stated, “these RAID controllers seem to be very buggy”, which is all the more reason that “documentation is critical for a useful driver.” However, “Adaptec has refused to provide useful and accurate documentation about their FSA-based” ... “RAID controllers.” Although a driver (named “aac”) has been able to support such equipment, “this is a known-flawed driver. Maybe it works with some variations of hardware sufficiently well to be usable, but we don't recommend betting your data on it.” (Related news/articles: Undeadly post, Slashdot article, JustSkins forum post)
In contrast, OpenBSD 3.8 song: “Hackers of the Lost RAID” commentary recommended LSI/AMI RAID cards because they worked. This commentary referenced Message about OpenBSD 3.8 RAID management.
- [#sftwraid]: Software implementations
- Operating system overview
Before discussing each of the implementations, here's a quick overview of support, sorted by operating system.
- For OpenBSD: softraid and bioctl seem to be the newer preference; Older options may include ccd and RAIDframe older options that white paper on softraid has scathing comments about.
- FreeBSD Handbook page about RAID references ccd, and the FreeBSD Handbook page about RAID: section about Vinum Volume Manager discusses Vinum. Vinum supports RAID 0, 1, and 5. Web page describes Vinum/RAIDFrame on FreeBSD.
- NetBSD Features: section about RAID refers to RAIDframe. Page about RAID is the hyperlink from netbsd page. It says: "Some of the options, like RAID 6 and parity logging, are still in a highly developmental stage, and are not suitable for even experimental use."
- Favorite seems to be mdadm. raidtools was used before mdadm was made. An upcoming/current option may be FlexRAID.
- Microsoft Windows
Some support may be built in. Alternatively, an upcoming/current option may be FlexRAID.
As for the built in support, some versions provide support for some types of fancy disk layouts. Other versions provide support for more types of fancy disk layouts. The term used for Microsoft's implementation of fancy disk layouts is a “Dynamic Disk”. See Dynamic Disks of Microsoft Windows (Server/Pro) and “Dynamic Storage” Microsoft Windows software RAID features.
- Details by implementation
- [#softraid]: softraid
The name “softraid” is a name for a specific software product that implements software-based RAID.
- Historical Change
OpenBSD 4.7 -> 4.8 Upgrade Guide: section about Softraid metadata change discusses a change in the Softraid metadata format. Backwards compatibility was provided, “but at the cost of being unable to use some of the upcoming future features”.
This fortunate reality contrasted with less fortunate predictions announced during the software development. For example, see an announcement of softraid changes and OpenBSD 4.7 upgrade guide's advance warning of requirement for softraid volume rebuild. One of the anonymous posts says “softraid is still under pretty heavy development”. Perhaps also see: Article about trying to use the SoftRAID driver.
- Softraid may not yet be bootable, as noted by: news article about softraid.
In general (see the OpenBSD man
page on softraid for detials, like the exception for Sparc hardware
noted in OpenBSD 4.6's man page), OpenBSD's softraid requires the usage
of some OpenBSD disklabel entries that specify sections of one or more
disks as being of an fstype of RAID. For example, /dev/wd1a may
an IDE/ATA device's first disklabel entry. Then the bioctl command is
used, as shown by OpenBSD
man page on softraid: “Examples” section, to create a
device (which in the example uses the “sd” SCSI disk driver,
unit zero, which is called sd0). (The sd driver also creates a matching
??file, in this case /dev/rsd0
?, as noted by OpenBSD man page for sd: “Files” section. The last letter of the /dev/rsd0
?drive specifies the disklabel entry: With OpenBSD the disklabel c: or
*c represents the entire disk (similar to d: or
*d of some other BSD). This “drive” created by bioctl will then need to be treated like a regular drive (requiring fdisk, disklabel, and newfs).
- Responding to a degraded array
Nabble page says to use something like:
OpenBSD FAQ 14 (“Disk Setup”): section (14.13) called “RAID options for OpenBSD” says: “OpenBSD also includes RAIDframe (raid(4), requires a custom kernel), and ccd(4) as historic ways of implementing RAID, but at this point OpenBSD does not suggest implementing either as a RAID solution for new installs or reinstalls.”
This may be like RAID 0, not others with redundancy???
- [#raidfram]: RAIDframe
The basics: A driver called raid, and software called raidctl, provide support for RAID levels 0, 1, 4, 5, and combinations of those, and has been included with various BSD distributions: NetBSD 1.4, FreeBSD 5-current, OpenBSD 2.5, and newer versions. A “NetBSD and RAIDframe” page by Greg Oster says “RAIDframe handles a large number of different RAID levels and configuration options including RAID 0, 1, 4, 5, 6, hot spares, parity logging, and a number of other goodies. At this point, unfortunately, only a subset of these have been extensively tested in a NetBSD environment. Some of the options, like RAID 6 and parity logging, are still in a highly developmental stage, and are not suitable for even experimental use.” The NetBSD man page for the RAIDframe disk driver doesn't even mention its support for RAID 6, and nor does the FreeBSD 5.2.1's man page for the raid driver, nor OpenBSD's page for the raid driver, nor the raidctl man pages for any of these operating systems. In each of these three operating systems has a “BUGS” section of the raidctl man page that says “Hot-spare removal is currently not available.” Also, each such page has a “WARNINGS” section of the raidctl man page that describes when necessary parity recomputations must be made.
RAIDframe seems to have lost some favor: OpenBSD FAQ 14 (“Disk Setup”): section called “RAID options for OpenBSD” (OpenBSD FAQ 14.13) says: “OpenBSD also includes RAIDframe (raid(4), requires a custom kernel), and ccd(4) as historic ways of implementing RAID, but at this point OpenBSD does not suggest implementing either as a RAID solution for new installs or reinstalls.” FreeBSD's Handbook: page on RAID doesn't discuss this at all (but discusses other options ccd and vinum), and the online FreeBSD man pages don't seem to have info on this driver for versions of FreeBSD newer than 5.2.1. A forum post in a thread about RAIDFrame (and the earlier post in that thread) show key developers not having time for this anymore, and Scott Long, who was behind the FreeBSD port, said about RAIDframe in FreeBSD: “Unfortunately, while it was made to work pretty well on 4.x, it has never been viable on 5.x; it never survived the introduction of GEOM and removal of the old disk layer.”. However, NetBSD's features: section on RAID still points users to Greg Oster's page on “NetBSD and RAIDframe”.
Other information about using RAIDFrame:
Scott Long brought RAIDframe to FreeBSD and said it is based on “the NetBSD RAIDframe port by Greg Oster”. Parallel Data Laboratory at Carnegie Mellon University has PDL @ CMU's page about RAIDFrame. OpenBSD 2.5's man page for raidctl shows that version supported RAIDs 0, 1, 4, and 5, and these are the only RAID levels mentioned with OpenBSD 4.6's man page.
“RAIDing OpenBSD” with RAIDFrame: Information which is mostly old, but shows how this was done with OpenBSD 3.6, including installing OpenBSD and then later making sure that the necessary kernel support for RAIDFrame was added.
- [#mdadm]: mdadm
- Operating system support
- Operating systems using the Linux kernel
(Some early work on this section has been started. Notes will be reviewed and so much more substantial notes may be added soon.)
- [#msdynstr]: “Dynamic Storage” Software RAID features from Microsoft Windows (Server/Pro) (using Dynamic Disks)
- Operating System/Hardware Support
This has started to be considered to be a premium feature of more expensive varieties of Microsoft Windows. Microsoft KB Q314343 says, “Dynamic storage is supported in Windows 2000 and Windows XP Professional.” “Dynamic disks are not supported on portable computers or on Windows XP Home Edition-based computers.”
Page about Vista's support says, “Windows Vista Ultimate and Windows Vista Enterprise editions support spanning and striping dynamic disks, but not mirroring.
- Mirrored and RAID 5
If an operating system supports simple/spanned/striped volumes, it may be able to manage remote mirrored and RAID 5 volumes. However, locally supporting mirrored and RAID 5 volumes in Microsoft Windows is basically only supported by the operating systems that are marketed as “server” operating systems.
Microsoft KB Q314343 says, “you can use a Windows XP Professional-based computer to create a mirrored or RAID-5 volume on remote computers that are running Windows 2000 Server, Windows 2000 Advanced Server, or Windows 2000 Datacenter Server. You must have administrative privileges on the remote computer to do this.” The key phrase there is that any such volume is “on remote computers”. Windows XP Pro is not designed to create such a volume itself, although it can issue commands for a remote system to create such a volume.
Page about Vista's support says, “Windows Vista Ultimate and Windows Vista Enterprise editions support spanning and striping dynamic disks, but not mirroring. (Windows Server 2008 supports mirroring.)”
- Disk layout
- A disk must be using Microsoft's “Dynamic Disk” format in order to use the software RAID features of Server/Pro releases of Microsoft Windows. (See the section about the Dynamic Disks of Microsoft Windows for details about putting a disk into this sort of format.)
- Using dynamic disks
For now, this section does not currently have extensive documentation on this topic. Perhaps see one or more of the following resources:
- Multi-volume disks in Win NT
- Microsoft KB Q314343 describes an option, which was “multidisk volumes that are created by using Windows NT 4.0 or earlier, such as volume sets, stripe sets, mirror sets, and stripe sets with parity. Windows XP does not support these multidisk basic volumes. Any volume sets, stripe sets, mirror sets, or stripe sets with parity must be backed up and deleted or converted to dynamic disks before you install Windows XP Professional.” (Presumably this would also be needed before installing any newer version of Microsoft Windows.)
At the time of this writing, this guide has little information about ready solutions to deploy FlexRAID. Some of the features do sound interesting. However, no strong recommendations about how to use this software, or even whether to use this software, are hereby being made. The first step may be to determine whether a ready solution appears to be available. Wikipedia's Talk/Discussion about the site's on FlexRAID has a comment indicating that registration on a forum is required.
Perhaps see: Beginner's Guides on the FlexRAID Wiki. For installation help, see: Beginner's Guide to Installing FlexRAID in Windows (from August 30, 2011), and for Linux, Beginner's Guide to Installing FlexRAID on Zentyal Server (September 3, 2011).
- RAID is a copy of data sufficient for backup purposes
This is widely recognized to be false. (Further info may arrive here.)
- Things need to be identical.
The claim is that things need to be indentical. This may include identical hardware, including drives need to be used, as well as identical controllers. Firmware versions might need to be identical. Partition layouts on different drives may need to be identical.
Possibly. (This is possibly true.) Having drives that are very similar can help with speed. Also, they will be more likely to have the same geometry detected, which can be important. As an example, Microsoft KB Q167045: Reasons why Windows NT does not boot from a shadow mirror drive lists several requirements for Windows NT's “Fault Tolerant” technology, including: “Both the primary drive and shadow drive MUST be identical in make, model, and in many cases firmware revision. This is to ensure that the drive geometry is identical and is being translated identically.”
Q167045 also cites requirements for identical controllers, translation options on controllers, operating system partitions, and any partitions before the operating system partitions. There are some other requirements as well: “Failing to meet ANY of the above requirements may prevent booting into Windows NT from the shadow drive.”
It is entirely possible that different implementations may have different requirements. (Note that much of the quoted material there is related to Windows NT's “fault tolerant” implementation(s), which is quite old.)
This will likely be merged into the newer content above.
- Hardware based
Hardware based implementations are generally largely transparent to the underlying software. Key management, such as setting up the RAID array, is typically done through interaction with the code in an Option ROM. This code gets started by the system's standard bootup process, such as from a BIOS, and occurs before the operating system is loaded. Specialized software can be installed that recognizes the RAID card and can also be used to perform some management tasks. For instance, acknowledging an error (silencing any unneeded audible alarm) and deciding to manually start a re-build (perhaps after a replacement drive has been inserted) may be done with such software. (The next logical step in such a scenario would typically be a re-enable the audible alarm so that people become aware of another serious problem when (if) another serious problem is detected.)
Be aware of compatibilty lock-in. For example, if ROMB is used and a motherboard goes bad, perhaps due to some part completely unrelated to disk storage, the existing data needs to be read by hardware that can use the RAID array set up. In some cases, even extremely similar replacement parts (whether motherboards or other RAID controllers) might not be sufficiently compatible if those replacement parts aren't exactly the same make and model. This is particularly important to keep in mind for motherboards since new models of motherboards are constantly being designed to support and take advantage of improving technology. If a problem develops in a few years, finding a replacement old motherboard that is compatible may not be quite as easy.
- RAID on motherboard (ROMB)
- Dedicated cards
- Software based
/proc/mdstatfile system object may provide some detail. (This can be seen with “
”.) A command used to interact with the RAID array is
. For example, after seeing the names of the md devices with
, one may be able to see more details with a command line “
”. Further options provided by the
command may be seen with “
- File system based
- FlexRAID has been designed to support existing file systems and protect the data with RAID technologies.