Redundant array of independent disks



Redundant array of independent disks

ONE



RAID (redundant array of independent disks) is a way of storing the same data in different places (thus, redundantly) on multiple hard disk. By placing data on multiple disks, I/O operations can overlap in a balanced way, improving performance. Since multiple disks increases the mean time between failure (MTBF), storing data redundantly also increases fault-tolerance.

A RAID appears to the operating system to be a single logical hard disk. RAID employs the technique of striping, which involves partitioning each drive's storage space into units ranging from a sector (512 bytes) up to several megabytes. The stripes of all the disks are interleaved and addressed in order.

In a single-user system where large records, such as medical or other scientific images, are stored, the stripes are typically set up to be small (perhaps 512 bytes) so that a single record spans all disks and can be accessed quickly by reading all disks at the same time.

In a multi-user system, better performance requires establishing a stripe wide enough to hold the typical or maximum size record. This allows overlapped disk I/O across drives.

There are at least nine types of RAID plus a non-redundant array (RAID-0):

• RAID-0. This technique has striping but no redundancy of data. It offers the best performance but no fault-tolerance.

• RAID-1. This type is also known as disk mirroring and consists of at least two drives that duplicate the storage of data. There is no striping. Read performance is improved since either disk can be read at the same time. Write performance is the same as for single disk storage. RAID-1 provides the best performance and the best fault-tolerance in a multi-user system.

• RAID-2. This type uses striping across disks with some disks storing error checking and correcting (ECC) information. It has no advantage over RAID-3.

• RAID-3. This type uses striping and dedicates one drive to storing parity information. The embedded error checking (ECC) information is used to detect errors. Data recovery is accomplished by calculating the exclusive OR (XOR) of the information recorded on the other drives. Since an I/O operation addresses all drives at the same time, RAID-3 cannot overlap I/O. For this reason, RAID-3 is best for single-user systems with long record applications.

• RAID-4. This type uses large stripes, which means you can read records from any single drive. This allows you to take advantage of overlapped I/O for read operations. Since all write operations have to update the parity drive, no I/O overlapping is possible. RAID-4 offers no advantage over RAID-5.

• RAID-5. This type includes a rotating parity array, thus addressing the write limitation in RAID-4. Thus, all read and write operations can be overlapped. RAID-5 stores parity information but not redundant data (but parity information can be used to reconstruct data). RAID-5 requires at least three and usually five disks for the array. It's best for multi-user systems in which performance is not critical or which do few write operations.

• RAID-6. This type is similar to RAID-5 but includes a second parity scheme that is distributed across different drives and thus offers extremely high fault- and drive-failure tolerance. There are few or no commercial examples currently.

• RAID-7. This type includes a real-time embedded operating system as a controller, caching via a high-speed bus, and other characteristics of a stand-alone computer. One vendor offers this system.

• RAID-10. This type offers an array of stripes in which each stripe is a RAID-1 array of drives. This offers higher performance than RAID-1 but at much higher cost.

RAID-53. This type offers an array of stripes in which each stripe is a RAID-3 array of disks. This offers higher performance than RAID-3 but at much higher cost.

TWO

... 

If you have a large organization with many employees and/or customers connected to your system and your system is widely dispersed geographically, then you will probably want to be up 24/7/365. There is a correlation between reliability and cost: The more reliability you require, the higher the cost. Level 1 RAID (see below), for example, provides the highest reliability, but it also carries the highest cost.

THREE



Understand RAID

A redundant array of inexpensive disks (RAID) provides a way to organize multiple physical disks to appear as one logical disk, to increase availability and performance. RAID relies on redundancy or parity information to reconstruct and retrieve data from failed disk drives. Although theoretically there are nine primary RAID levels (0, 1, 2, 3, 4, 5, 6, 7, and S), only RAID levels 0, 1, 5, 7, and S (and variations, such as 0+1, 0+S, and so on) are commercially implemented at most Oracle sites. When selecting a vendor, it is essential to know what levels of RAID the vendor supports. Some vendors do not support RAID 0 by itself. Vendors offer their own proprietary RAID implementations. Most vendors adhere to the ANSI- and ISO-standard SCSI protocols.

RAID 5, S, and 7 cost less than RAID 1 or 0+1, since there is a 100-percent overhead in the case of the latter. However, the investment in this overhead provides insurance against a variety of disk-drive failures. For example, RAID 1 and 0+1 offer protection against multiple points of failure in a disk array (except when both mirrors fail simultaneously), but RAID 5/S/7 will not continue to function when more than one disk crashes at the same time. Also, with RAID 5/S/7, performance degrades substantially even with a single disk failure in addition to increasing your exposure to a full-fledged outage.

To drive home the point, here's my rule of thumb for RAID: "RAID 5/S/7 can be used without much performance degradation for read-intensive applications such as decision-support systems (DSSs). For write-intensive OLTP applications, however, they are often the poor man's choice. RAID 0+1 is the optimal configuration from both availability and performance perspectives, for most Oracle applications (OLTP and DSS). Use it if you have the money."

FOUR



RAID 0

RAID level 0 could more correctly be called “AID,” because there's no redundancy about it. Data is merely divided into blocks, each one written sequentially to the next drive in the array. If there are four drives in the array, as shown in the figure on page 36, each logical I/O is broken into four physical operations.

The point of RAID 0 is performance. Theoretically, it can deliver n times the performance of a single drive, where n is the number of drives in the array. However, tuning the stripe size is important. If it is too large, many I/O operations will fit in a single stripe and take place on a single drive. If it is too small, each logical operation will be broken into too many physical operations, saturating the bus or controller to which the drives are attached.

Reviewers of RAID 0 products on workstations have commented that they offer little advantage with typical applications, such as word processors and spreadsheets. However, in cases where very large files must be opened or saved-on video servers, for example-they can be very beneficial.

RAID 1

RAID 1 is the simplest actual redundant array design, employing mirrored pairs of disk drives. As seen in the figure, it merely creates a duplicate of the contents of one disk drive onto another. While that fact makes RAID 1 easy to implement, it also makes it the most costly (100 percent redundancy) in terms of required disk overhead.

RAID 1's write performance is slower than that of a solo drive, since all data must be written twice. However, buffering on a controller usually hides this fact from the host computer. Reads can be faster, since it's always possible to retrieve data from whichever drive is available sooner.

RAID 2

RAID 2 is a bit-oriented scheme for striping data. Each bit of a data word is written to a separate disk drive, in sequence. Checksum information is then computed for each word and written to physically separate error-correction drives.

Unfortunately, I/O is slow, especially for small files, because each drive must be accessed for every operation. Controller design is relatively simple, high data-transfer rates are possible for large files, and disk overhead is typically 40 percent. However, while reliable, RAID 2 is seldom considered worth bothering with today.

RAID 3

RAID 3 introduces a more efficient way of storing data while still providing error correction. It still stripes data across drives bit by bit (or byte by byte). However, error-checking now takes place by storing parity information (computed via a mathematical function known as the Exclusive OR, or XOR) on a separate parity drive (see figure).

Given that parity values are simple to compute and write, RAID 3 arrays can perform swiftly. However, any I/O operation must address all drives simultaneously. This means that, while RAID 3 delivers high data-transfer rates, it is best suited to large files such as video streams.

RAID 4

RAID 4 modifies the RAID 3 concept by working with data in terms of blocks (as does RAID 0), rather than bits or bytes. This reduces processing overhead and can make for high aggregate data-transfer rates on reads. For writes, however, there is inevitable contention for the sole parity drive, making this RAID level relatively sluggish.

RAID 5

One of the most popular RAID levels, RAID 5 is again block-oriented and based on the storing of parity information. However, instead of placing parity data on a single drive, it distributes it across the entire array (see figure).

Because RAID 5 eliminates the parity-drive bottleneck, it enhances write performance. And due to the independence of all the drives in the array, read performance is tops among true RAID levels. Recovery following a disk failure is relatively slow, but reliable enough. All in all, RAID 5 achieves an excellent balance between performance, data protection, and low cost.

RAID 10 and RAID 53

RAID 10 is also known as RAID 0+1 or 1+0 because it combines the elements of RAID 0 and RAID 1. It uses two sets of drives that mirror one another, as in RAID 1. Then, within these sets, data is striped across the drives (as in RAID 0) in order to speed access.

RAID 53, which should really be called RAID 30 using the above logic, combines RAID 0 and RAID 3. Again, it uses a striped array, as with RAID 0, but the segments of this are RAID 3 arrays. High data-transfer rates and high I/O rates for small requests are both offered-but at a price.

ENHANCING PERFORMANCE

RAID controllers can become a bottleneck, especially with high-speed interconnects like Fibre Channel, because of the calculations they must perform. For example, to perform a disk-write operation to a RAID 5 array, a Read-Modify-Writeback operation must be performed. First, old data must be read from both a data drive and a parity drive. Second, that data must be XORed. Third, new data must be written to the data drive. Fourth, new data must also be XORed with the parity data, and only then can the result finally be written to the parity drive.

One solution to this bottleneck has been to move the responsibility of calculating XOR data to the disk drives themselves. Seagate, IBM, and other vendors have released drives that can perform XOR calculations in parallel with other disks, without the aid of the RAID controller.

The industry is entering a period of rapid transition in I/O architectures. InfiniBand's 2001 products will couple I/O directly to host memory, offering transfer rates of up to 6Gbytes/sec, and RAID products will evolve to support such throughput. At the same time, the falling cost of controllers and drives will make RAID arrays ever more commonplace on the low end. Your next notebook computer may even offer you a choice of RAID levels.

FIVE



RAID Levels

RAID 0 — Striping. RAID 0 specifies that data is striped across two or more drives. This allows multiple drives to be used when accessing data and makes more efficient use of SCSI bandwidth. RAID 0 carries no redundancy in case of a drive failure.

RAID 1 — Mirroring. RAID 1 makes duplicate copies of data on each drive in the RAID system. It is the ultimate in redundancy.

RAID 0 + 1 (10) — RAID 10 combines mirroring and striping in a single RAID subsystem. This provides the maximum redundancy with no loss in performance. Other RAID levels require a small loss in performance to provide redundancy.

RAID 3 — This level takes a block of data and breaks it up into stripes that are recorded across two or more drives. Parity information for each data stripe is recorded on a single additional drive. RAID 3 is infrequently used in hard drive RAID systems, but it is used in tape arrays.

RAID 4 — This is similar to RAID 3 except that instead of creating parity for each stripe of data, parity is created for the entire data block. RAID 4 supports higher transaction rates. Parity is checked on each block rather than each stripe.

RAID 5 — Similar to RAID 4 in that parity is generated for each block. However, instead of a single dedicated parity disk, the parity information is striped on the data disks along with the blocks themselves. Transaction rates are high, but write speed is penalized as the RAID 5 controller has to avoid placing data blocks and associated parity information on the same disks.

RAID 5 + 3 — A combination of RAID 5 and RAID 3. Each of the "drives" of a RAID 0 system is set up as a RAID 3 subsystem.

SIX



Fault Tolerance

Fault tolerance is the ability of a system to continue functioning when part of the system fails. Normally, the expression fault tolerance is used to describe disk subsystems, but it can also apply to other parts of the system or the entire system. Fully fault-tolerant systems use redundant disk controllers and power supplies as well as fault-tolerant disk subsystems. You can also use uninterruptible power supplies (UPSs) to safeguard against local power failure. For more information about UPS, see "Managing Uninterruptible Power Supplies" later in this chapter.

Although the data is always available and current in a fault-tolerant system, you still need to make tape backups to protect the information on your disk subsystem against destructive events such as fire, earthquakes, tornadoes, floods, and user errors. Disk fault tolerance is not an alternative to a backup strategy with offsite storage. For more information about backing up to tape, see Chapter 6 "Backing Up and Restoring Network Files."

Fault tolerance is designed to combat problems with disk failures, power outages, or corrupted operating systems which can include boot files, the operating system itself, or system files.

Fault-tolerant disk systems are standardized and categorized in six levels known as Redundant Arrays of Inexpensive Disks (RAID) level 0 through level 5. Each level offers various mixes of performance, reliability, and cost. Disk Administrator includes RAID levels 0, 1, and 5. Only levels 1 and 5 provide fault-tolerance.

RAID strategies can be implemented using hardware or software solutions. In a hardware solution, the controller interface handles the creation and regeneration of redundant information. In Windows NT Server, this activity can be performed in the software. A hardware implementation of a RAID strategy can offer performance advantages over the software implementation included in Windows NT Server.

Understanding RAID

Disk arrays consist of multiple disk drives coordinated by a controller. Individual data files are typically written to more than one disk in a manner that, depending on the RAID level used, can improve performance and/or reliability.

However, there is no fault tolerance until the fault is repaired. Few RAID implementations can withstand two simultaneous failures. When the failed disk is replaced, the data can be regenerated using the redundant information. Data regeneration occurs without bringing in backup tapes or performing manual update operations to cover transactions that took place since the last backup. When data regeneration is complete, all data is current and again protected against disk failure. The ability to provide cost-effective high data availability is the key advantage of disk arrays.

Level 0: Stripe Sets

Stripe sets are created by combining areas of free space on from three to 32 disks into one large logical volume. Data is divided into blocks and spread in a fixed order among all the disks in the array.

Level 0 stripe sets do not provide any fault tolerance.

RAID Level 0

Stripe sets in Windows NT write data to multiple partitions, as is done with volume sets. However, striping writes files across all disks so that data is added to all disks in the set at the same rate.

Stripe sets offer the best performance of all the Windows NT Server disk management strategies, including volume sets. However, like volume sets, it does not provide fault tolerance. If any partition in the set fails, all data is lost.

Level 1: Mirror Sets

Mirror sets provide an identical twin for a selected disk; all data written to the primary disk is also written to the shadow or mirror disk. This results in disk space utilization of only 50 percent. If one disk fails, the system uses data from the other disk. For more information about dealing with boot failures, see "Fixing a System or Boot Failure" later in this chapter.

Mirror sets protect a partition on a disk from media and, possibly, controller failure by maintaining a fully redundant copy on another disk. When a mirrored partition fails, you must break the mirror set to expose the remaining partition as a separate volume with its own drive letter. That volume then becomes the main partition, and you can create a new mirror-set relationship with unused free space of the same size or greater on another disk.

Mirror sets are created by duplicating a partition using free space on another disk. If the second partition is larger, the remaining space becomes free space. The same drive letter is used for both partitions. Any existing partition, even the system and boot partitions, can be mirrored onto another partition of the same size, or greater, on another disk using either the same or a different controller. When creating mirror sets, it is best to use disks that are the same size, model, and manufacturer.

RAID Level 1

Mirror sets have better overall read and write performance than level 5, stripe sets with parity. Another advantage of mirror sets over stripe sets with parity is that there is no loss in performance when a member of a mirror set fails. Mirror sets are more expensive in terms of dollars per megabyte because its disk space utilization is less. But its entry cost is lower because it requires only two disks, whereas stripe sets with parity require three or more disks.

The following illustration shows examples of mirror sets using the same and different controllers.

Mirror sets reduce the chance of an unrecoverable error by providing a duplicate set of data, which doubles the number of disks required and the input/output (I/O) operations when writing to the disk. However, some performance gains are achieved for reading data because of I/O load balancing of requests between the two partitions.

When you want to use the space in a mirror set for other purposes, you must first break the mirror set and then delete the partition. Breaking the mirror set does not delete the information, but it is still safer to do a backup first. You will then be ready to delete one of the partitions that made up the mirror set to regain free space.

In the case of an unrecoverable error on a partition within a mirror set, you need to break the mirror-set relationship to expose the remaining partition as an individual partition or logical drive. You can then reassign some free space on another disk to create a new mirror set. For more information about breaking mirror sets, see the Windows NT Server Resource Kit version 4.0.

For information about how to establish, break, or delete a mirror set, see "Establishing a Mirror Set" and "Breaking a Mirror Set," in Disk Administrator Help.

Level 5: Stripe Sets with Parity

Level 5 is commonly known as striping with parity. The data is striped in large blocks across all the disks in the array. Level 5 differs because it writes the parity across all the disks. The data redundancy is provided by the parity information. The data and parity information are arranged on the disk array so that the two are always on different disks.

RAID Level 5 Configuration

Stripe sets with parity have better read performance than mirror sets. However, when a member is missing, such as when a disk has failed, the read performance is degraded by the need to recover the data with the parity information.

Nevertheless, this strategy is recommended over mirror sets for applications that require redundancy and are primarily read-oriented. Write performance is reduced by the parity calculation. Also, a write operation requires three times as much memory as a read operation during normal operation. Moreover, when a partition fails, reading requires at least three times the memory as would normally be used, both caused by parity calculation.

................
................

In order to avoid copyright disputes, this page is only a partial summary.

Google Online Preview   Download