Troubleshooting RAID 1 in Solstice DiskSuite Software

I found this on my notes folder, compile by my former colleague during the day when we support SUN Servers.

Database Replica Errors

  • Problem: State database is corrupted or unavailable
  • Cause: Disk failure, disk I/O error
  • Symptom: Error message at boot time if databases are <= 50% of total database. System comes into single-user mode.

Suggested steps to follow:

1. At the ok prompt, issue the boot command. The system will enter single-user mode because of the broken database replicas.

ok > boot
...
Hostname: host1
metainit: host1: stale databases
Insufficient metadevice database replicas located.
Use metadb to delete databases which are broken.
Ignore any "Read-only file system" error messages.
Reboot the system when finished to reload the metadevice
database.
After reboot, repair any broken database replicas which were
deleted.
Type Ctrl-d to proceed with normal startup,
(or give root password for system maintenance):
Entering System Maintenance Mode.

2. Use the metadb command to look at the metadevice state database. You can see which state database replicas are not available -- they are marked by "unknown" and the M flag.

# metadb -i
flags first blk block count
a m p lu 16 1034 /dev/dsk/c0t0d0s7
a p l 1050 1034 /dev/dsk/c0t0d0s7
M p unknown unknown /dev/dsk/c0t1d0s7
M p unknown unknown
3. Delete the state database replicas on the bad disk using the -d option. At this point, the root (/) file system is read-only. You can ignore the mddb.cf error messages:
# metadb  -d  -f  c0t1d0s7
metadb: demo: /etc/opt/SUNWmd/mddb.cf.new: Read-only file system.

Verify deletion:

# metadb  -i
flags first blk block count
a m p lu 16 1034 /dev/dsk/c0t0d0s7
a p l 1050 1034 /dev/dsk/c0t0d0s7

4. Reboot the system.

5. Use the metadb command to add back the state database replicas and verify that these replicas are correct.

# metadb -a -c 2 c0t1d0s7
# metadb -i
flags first blk block count
a m p luo 16 1034 dev/dsk/c0t0d0s7
a p luo 1050 1034 dev/dsk/c0t0d0s7
a u 16 1034 dev/dsk/c0t1d0s7
a u 1050 1034 dev/dsk/c0t1d0s7

Metadevice Errors

  • Problem: Sub-mirrors are out of sync in "Needs maintenance" state
  • Cause: Disk problem or failure, improper shutdown, communication problems between two mirrored disks
  • Symptom: "Needs maintenance" errors in metastat output

Suggested steps to follow:

1. Replace the faulty disk.

2. Create a partition that is the same as the original disk. If you need to recover the state database, follow the above steps.

3. Log in to the Solaris OS and issue the metastat command. You will see the results as shown below:

# metastat

d0: Mirror
Submirror 0: d10
State: Needs maintenance
Submirror 1: d20
State: Okay
...
d10: Submirror of d0
State: Needs maintenance
Invoke: "metareplace d0 /dev/dsk/c0t3d0s0 "
Size: 47628 blocks
Stripe 0:
Device Start Block Dbase State Hot Spare
/dev/dsk/c0t3d0s0 0 No Maintenance

d20: Submirror of d0
State: Okay
Size: 47628 blocks
Stripe 0:
Device Start Block Dbase State Hot Spare
/dev/dsk/c0t2d0s0 0 No Okay
4. The result shows that the disk c0t3d0s0 was faulty and replaced. Use the metareplace command to enable the device:
# metareplace -e d0 c0t3d0s0
Device /dev/dsk/c0t3d0s0 is enabled
Or if you want to move the faulty device to a new disk with a different target, you can use this command:
# metareplace  d0 c0t3d0s0   

0 comments: