Wednesday, April 15, 2009

Approach to Raid Data Recovery

Approach to Raid Data Recovery
—How to recover a Raid 5EE Case

As we all know, in 1987, Patterson, Gibson and Katz at the University of California Berkeley, published a paper entitled "A Case for Redundant Arrays of Inexpensive Disks (RAID)". This paper described various types of disk arrays, referred to by the acronym RAID. The basic idea of RAID was to combine multiple small, inexpensive disk drives into an array of disk drives which yields performance exceeding that of a Single Large Expensive Drive (SLED). Additionally, this array of drives appears to the computer as a single logical storage unit or drive.

The Mean Time Between Failure (MTBF) of the array will be equal to the MTBF of an individual drive, divided by the number of drives in the array. Because of this, the MTBF of an array of drives would be too low for many application requirements. However, disk arrays can be made fault-tolerant by redundantly storing information in various ways.

Five types of array architectures, RAID-1 through RAID-5, were defined by the Berkeley paper, each providing disk fault-tolerance and each offering different trade-offs in features and performance. In addition to these five redundant array architectures, it has become popular to refer to a non-redundant array of disk drives as a RAID-0 array.

Data Striping

Fundamental to RAID is "striping", a method of concatenating multiple drives into one logical storage unit. Striping involves partitioning each drive's storage space into stripes which may be as small as one sector (512 bytes) or as large as several megabytes. These stripes are then interleaved round-robin, so that the combined space is composed alternately of stripes from each drive. In effect, the storage space of the drives is shuffled like a deck of cards. The type of application environment, I/O or data intensive, determines whether large or small stripes should be used.

Most multi-user operating systems today, like NT, Unix and Netware, support overlapped disk I/O operations across multiple drives. However, in order to maximize throughput for the disk subsystem, the I/O load must be balanced across all the drives so that each drive can be kept busy as much as possible. In a multiple drive system without striping, the disk I/O load is never perfectly balanced. Some drives will contain data files which are frequently accessed and some drives will only rarely be accessed. In I/O intensive environments, performance is optimized by striping the drives in the array with stripes large enough so that each record potentially falls entirely within one stripe. This ensures that the data and I/O will be evenly distributed across the array, allowing each drive to work on a different I/O operation, and thus maximize the number of simultaneous I/O operations which can be performed by the array.

In data intensive environments and single-user systems which access large records, small stripes (typically one 512-byte sector in length) can be used so that each record will span across all the drives in the array, each drive storing part of the data from the record. This causes long record accesses to be performed faster, since the data transfer occurs in parallel on multiple drives. Unfortunately, small stripes rule out multiple overlapped I/O operations, since each I/O will typically involve all drives. However, operating systems like DOS which do not allow overlapped disk I/O, will not be negatively impacted. Applications such as on-demand video/audio, medical imaging and data acquisition, which utilize long record accesses, will achieve optimum performance with small stripe arrays.

A potential drawback to using small stripes is that synchronized spindle drives are required in order to keep performance from being degraded when short records are accessed. Without synchronized spindles, each drive in the array will be at different random rotational positions. Since an I/O cannot be completed until every drive has accessed its part of the record, the drive which takes the longest will determine when the I/O completes. The more drives in the array, the more the average access time for the array approaches the worst case single-drive access time. Synchronized spindles assure that every drive in the array reaches its data at the same time. The access time of the array will thus be equal to the average access time of a single drive rather than approaching the worst case access time.

The different RAID levels

RAID-0
RAID Level 0 is not redundant, hence does not truly fit the "RAID" acronym. In level 0, data is split across drives, resulting in higher data throughput. Since no redundant information is stored, performance is very good, but the failure of any disk in the array results in data loss. This level is commonly referred to as striping.

RAID-1
RAID Level 1 provides redundancy by writing all data to two or more drives. The performance of a level 1 array tends to be faster on reads and slower on writes compared to a single drive, but if either drive fails, no data is lost. This is a good entry-level redundant system, since only two drives are required; however, since one drive is used to store a duplicate of the data, the cost per megabyte is high. This level is commonly referred to as mirroring.

RAID-2
RAID Level 2, which uses Hamming error correction codes, is intended for use with drives which do not have built-in error detection. All SCSI drives support built-in error detection, so this level is of little use when using SCSI drives.

RAID-3
RAID Level 3 stripes data at a byte level across several drives, with parity stored on one drive. It is otherwise similar to level 4. Byte-level striping requires hardware support for efficient use.

RAID-4
RAID Level 4 stripes data at a block level across several drives, with parity stored on one drive. The parity information allows recovery from the failure of any single drive. The performance of a level 4 array is very good for reads (the same as level 0). Writes, however, require that parity data be updated each time. This slows small random writes, in particular, though large writes or sequential writes are fairly fast. Because only one drive in the array stores redundant data, the cost per megabyte of a level 4 array can be fairly low.

RAID-5
RAID Level 5 is similar to level 4, but distributes parity among the drives. This can speed small writes in multiprocessing systems, since the parity disk does not become a bottleneck. Because parity data must be skipped on each drive during reads, however, the performance for reads tends to be considerably lower than a level 4 array. The cost per megabyte is the same as for level

RAID-5EE
Since our case study is for a RAID 5EE case, so we need to get to understand RAID level-5EE
(Note: This feature is not supported on all controllers.)
RAID level-5EE is similar to RAID level-5E but with a more efficient distributed spare and faster rebuild times. Like RAID level-5E, this RAID level stripes data and parity across all of the drives in the array.
RAID level-5EE offers both data protection and increased throughput. When an array is assigned RAID level-5EE, the capacity of the logical drive is reduced by the capacity of two physical drives in the array: one for parity and one for the spare.
The spare drive is part of the RAID level-5EE array. However, unlike RAID level-5E, which uses contiguous free space for the spare, a RAID level-5EE spare is interleaved with the parity blocks, as shown in the following example. This allows data to be reconstructed more quickly if a physical drive in the array fails. With such a configuration, you cannot share the spare drive with other arrays. If you want a spare drive for any other array, you must have another spare drive for those arrays.
RAID level-5EE requires a minimum of four drives and, depending upon the level of firmware and the stripe-unit size, supports a maximum of 8 or 16 drives. RAID level-5EE is also firmware-specific.

The following illustration is an example of a RAID level-5EE logical drive.

RAID level-5EE example
Start with four physical drives.
Create an array using all four physical drives.
Then create a logical drive within the array.
The data is striped across the drives, creating blocks in the logical drive. The storage of the data parity (denoted by *) is striped, and it shifts from drive to drive as it does in RAID level-5E. The spare drive (denoted by S) is interleaved with the parity blocks, and it also shifts from drive to drive.

If a physical drive fails in the array, the data from the failed drive is reconstructed. The array undergoes compaction, and the distributed spare drive becomes part of the array. The logical drive remains RAID level-5EE.

When you replace the failed drive, the data for the logical drive undergoes expansion and returns to the original striping scheme.
Advantages and disadvantages

RAID level-5EE offers the following advantages and disadvantages.
Advantages
· 100% data protection
· Offers more physical drive storage capacity than RAID level-1 or level-1E
· Higher performance than RAID level-5
· Faster rebuild than RAID level-5E
Disadvantages
· Lower performance than RAID level-1 and level-1E
· Supports only one logical drive per array
· Cannot share a hot-spare drive with other arrays
· Not supported on all controllers

Recently, we successfully retrieved data from it by using Data Compass from SalvationDATA.
Firstly, you need image all the drives into files. Then run the Data Compass Controller and Program:

In Data Compass, RAID utility supports both “automatic mode” and “manual mode” of analysis. We will introduce how to analyze a RAID 5EE system in automatic mode first:
Select Analysis Mode
First select the analyzing mode; then we only need to define the Number of Segment Disks (the original number of disks in the RAID system), RAID Type and RAID Manufacturer. In this example, we have 4 segment disks, RAID type is RAID 5EE and from MANUFACTURER_Standard

Please Note: the IBM RAID controller is SNIA Compatible; all manufacturers belong to MANUFACTURER_Standard, aside from AMI, HP/COMPAQ and DYNAMIC Disk.
The next step is to import RAID segment disks, they can be disk image or physical disk; in automatic mode, users can import the segment disks by random sequence, the program will work out the actual order.

Import Segment Disks
Click “Apply’’ after importing all the segment disks, then program will analyze segment disk sequence and the storage method according to these setting information, and return with all parameters.
Analysis Finishes

Remark: How to understand the returned HDD sequence
HD0, HD1, HD2 and HD3 on the left side represents the importing sequence of the segment disks (this sequence means nothing indeed); 3.1.2.0 on the right side represents the native sequence of the segment disks. For example, the native segment sequence in this case study is 3.1.2.0, which means the segment you imported as HD3 is actually the first segment disk in the RAID system, HD1 is the 2rd, HD2 the 3rd and HD0 the 4th.

By clicking the OK button next to the “Apply” and “Stop” button, program will try to open the virtually created RAID system and display the partitions and files in the DCEXP interface.
Partition of the RAID system shows up in DCEXP

Double click the needed partition, and double click ROOT to show folders and files.
Now users can recover the needed files just like recovering data from a normal HDD.

Second, we will introduce how to analyze a RAID 5EE system in manual mode:
When select Manual Setting, all the parameters should be set manually and you need to know the native sequence of the segment disks (in case you don’t know, work it out by trial and error) and other parameters marked in red frame. In this case, the Number of Segment Disks should be 4, with RAID Type 5EE. Manufacturer Standard, Block Order is RIGHT_ASYNCHRONOUS and Stripe Size is 8KB.

Attention:
1. The IBM RAID controller is SNIA Compatible; all manufacturers belong to MANUFACTURER_Standard, aside from AMI, HP/COMPAQ and DYNAMIC Disk.
2. Delay is only available in HP/COMPAQ RAID system; users don’t need to set this option for RAID system of other manufactures.
3. For all the RAID system of HP/COMPAQ and some of other manufactures, users need to set Header Size for analysis. The Header Size need to match different RAID system and users normally set it by trial and error according to experience. In this case we don’t need to set the header size, because there is no Header Size (offset) in this RAID system.
After that, import the segment disks exactly in the native sequence. In this case, the third segment drive was lost.

After setting, click “Apply” button and wait until you receive the below message:
By clicking the OK button next to the “Apply” and “Stop” button, program will try to open the virtually created RAID system and display the partitions and files in the DCEXP interface.
Attention: If the parameter set is incorrect and program can’t acquire the partition, users need to reset the parameter and try again *(Manual Mode is a process of trial and error).
Double click the needed partition, and double click ROOT to show folders and files.
Now users can recover the needed files just like recovering data from a normal HDD.

From the above case study, we know that the basic steps to recover a RAID case. Nowadays, the Raid is widely used by corporations, and large businesses, the storage efficiency is raised thank to this technology. But it doesn’t mean it is worry free for our crucial data. Data Recovery is remedial way and our last choice of no choice. The key of data security is BACKUP, no other shortcuts.

No comments:

Post a Comment