How to rebuild Intel Raid (isw) on Linux

For years, I’ve ran many small servers running the popular ICH/ISW Intel Storage Matrix RAID in Raid-1 configuration. For many years this has worked absolutely perfectly with no issues on both Windows and Linux. But something has always really bugged me. What do i do when (and they will) a drive fails? How does ISW handle it?

On Windows, this is simple, you launch the Storage Matrix software and click rebuild (if it isn’t rebuilding automagically). But how do you do this on a Linux server which has no Storage Matrix software? After hours of Googling, i came across the command “dmraid -R”. But that didn’t work in my test environments.

So i spent a whole afternoon figuring this out. This is what i found.

DMRaid Works. Sort of

DMRaid is the linux implementation of popular onboard RAID setups. Your raid can be from Intel, Nvidia, Promise and a few others who do implement it. Intel is the most common one, and that’s the one i generally have on all my Intel servers. What *you* may find is that your implementation is different, but this posting should help you.

My test setup was a simple ICH6R machine with two 160gb Seagate hard drives. I booted up the machine, went into the Intel raid setup, and created a 20gb mirror partition called “System”. I then installed CentOS 5.5 32bit on this machine, and went to work.

Initial results

The first thing i did, was find out what i’ve got. Running “dmraid -s” gave me

[root@nasri ~]# dmraid -s
*** Group superset isw_djhffiddde
–> Active Subset
name   : isw_djhffiddde_System
size   : 41942528
stride : 256
type   : mirror
status : ok
subsets: 0
devs   : 2
spares : 0

Then running “dmraid -r” gave me

[root@nasri ~]# dmraid -r
/dev/sda: isw, "isw_djhffiddde", GROUP, ok, 312581806 sectors, data@ 0
/dev/sdb: isw, "isw_djhffiddde", GROUP, ok, 312581806 sectors, data@ 0

This tells me, my mirror set is running, and has two drives attached and all is happy.

Broken results

I then, turned the machine off, and yanked a drive, inserted a different drive, and turned it back on. After fiddling with the bios for a few minutes (my machine wanted to boot form the newly installed drive, not the raid) i got back in, and this is what i saw

[root@nasri ~]# dmraid -s
ERROR: isw: wrong number of devices in RAID set "isw_djhffiddde_System" [1/2] on /dev/sda
*** Group superset isw_djhffiddde
–> *Inconsistent* Active Subset
name   : isw_djhffiddde_System
size   : 41942528
stride : 256
type   : mirror
status : inconsistent
subsets: 0
devs   : 1
spares : 0

and

[root@nasri ~]# dmraid -r
/dev/sda: isw, "isw_djhffiddde", GROUP, ok, 312581806 sectors, data@ 0

So, dmraid tells me that the raid is broken and inconsistent. Great. That’s what i want to see when a disk fails in my raid sets. According to the man pages, and the Google, to repair it you use “dmraid -R <raid id> /dev/<device>”

So, here goes.

[root@nasri ~]# dmraid -R isw_djhffiddde_System /dev/sdb
ERROR: isw: wrong number of devices in RAID set "isw_djhffiddde_System" [1/2] on /dev/sda
isw: drive to rebuild: /dev/sdb

RAID set "isw_djhffiddde_System" already active
device "isw_djhffiddde_System" is now registered with dmeventd for monitoring
Error: Unable to write to descriptor!
Error: Unable to execute set command!
Error: Unable to write to descriptor!
Error: Unable to execute set command!

Hrm. Error’s. I don’t like errors. What’s happened? To be honest, I’ll never know – but it seems like it was not working. dmraid thinks its working, but i cant see it. I cant really hear any grumblings from the drive, nor can i see the LED’s flash. dmraid tells me the following:

[root@nasri ~]# dmraid -s
*** Group superset isw_djhffiddde
–> Active Subset
name   : isw_djhffiddde_System
size   : 41942528
stride : 256
type   : mirror
status : nosync
subsets: 0
devs   : 2
spares : 0

Ok, so its not inconsistent now, but it is “nosync”, which i cannot figure out what it means. I should look at the source code, but i cant be bothered.

Alright, so it appears that its not working.

Plan B

To figure out if its doing something, i turned the machine off and removed the new drive, and put in a Western Digital Raptor. Something that makes sounds. Booted up, and dmraid still showed the same stuff, inconsistent raid set. Now, i added the new WDRaptor to this set.

[root@nasri ~]# dmraid -R isw_djhffiddde_System /dev/sdb
ERROR: isw: wrong number of devices in RAID set "isw_djhffiddde_System" [1/2] on /dev/sda
isw: drive to rebuild: /dev/sdb

RAID set "isw_djhffiddde_System" already active
device "isw_djhffiddde_System" is now registered with dmeventd for monitoring

Oh wow, much better. On top of that, i could hear the grumblings of the WD, and i could see LED activity. So, it works!

I also found a command to monitor this progress. Its called “dmsetup status”

[root@nasri ~]# dmsetup status
isw_djhffiddde_Systemp2: 0 41720805 linear
isw_djhffiddde_Systemp1: 0 208782 linear
isw_djhffiddde_System: 0 41942776 mirror 2 8:16 8:0 928/1280 1 AA 1 core
VolGroup00-LogVol01: 0 4128768 linear
VolGroup00-LogVol00: 0 37552128 linear

[root@nasri ~]# dmsetup status
isw_djhffiddde_Systemp2: 0 41720805 linear
isw_djhffiddde_Systemp1: 0 208782 linear
isw_djhffiddde_System: 0 41942776 mirror 2 8:16 8:0 936/1280 1 AA 1 core
VolGroup00-LogVol01: 0 4128768 linear
VolGroup00-LogVol00: 0 37552128 linear

[root@nasri ~]# dmsetup status
isw_djhffiddde_Systemp2: 0 41720805 linear
isw_djhffiddde_Systemp1: 0 208782 linear
isw_djhffiddde_System: 0 41942776 mirror 2 8:16 8:0 1280/1280 1 AA 1 core
VolGroup00-LogVol01: 0 4128768 linear
VolGroup00-LogVol00: 0 37552128 linear

And finally

[root@nasri ~]# dmraid -r
/dev/sdb: isw, "isw_djhffiddde", GROUP, ok, 312581806 sectors, data@ 0
/dev/sda: isw, "isw_djhffiddde", GROUP, ok, 72303838 sectors, data@ 0
[root@nasri ~]# dmraid -s
*** Group superset isw_djhffiddde
–> Active Subset
name   : isw_djhffiddde_System
size   : 41942528
stride : 256
type   : mirror
status : ok
subsets: 0
devs   : 2
spares : 0

So. This is why it “sort of” works. It didn’t work with another Seagate drive, but it worked with a different drive. Consequently, i yanked the good 80gb drive from this set, and plugged in a 750gb Seagate, and was able to mirror back to that without a problem. Maybe initially it was my drives.

Conclusion

To fix your broken Raid1’s on your Intel raid’s, use “dmraid -R <raidid> <dev>” and watch “dmsetup status” and wait for the ratio to be 1.

How to install the SNMP service on Microsoft Hyper-V R2

Another quick post/reminder to myself. I’ve been experimenting with the idea of using Cacti to monitor the performance of my Hyper-V servers, so i needed SNMP on my HyperV machines. However there is no UI to add that feature into the core installs. So, to install SNMP on HyperV R2, use the following command line

start /w ocsetup SNMP-SC

That’s it!

Adaptec 3805 – It’s rubbish

An update on my previous posting about the Adaptec 3805 and my troubles with getting compatible drives.

I’ve been running a RAID-5 with the 3085 using 4 Samsung SpinPoint F3’s for about 2 weeks, and two days ago it started to give problems. The fourth disk in the array just dropped out, with no visible SMART issues or physical defects.

So, after waking up to this news I added it back into the array (probably not a great idea, but usually its fine). The moment the full initialization completed, something occurred that caused one of my Virtual Machines running from that RAID array to stop functioning (it was a mail server). There was another machine on there with much lesser activity, that kept running without a problem – but it did all but destroy the other virtual machine. Luckily, i have backups.

Not only did it knock off one of my virtual machines, it kicked out a different disk from the array, and started complaining again. So then I broke down, and ordered 2 ES.2 disks from Newegg which are on the HCL for this controller. However, 8hrs later, the controller barfed up completely and started giving timeouts to the host to the logical drive – even with 3 perfectly useable drives. The host became very unstable, i had to shut down the virtual machines, and reset the machine (thank god for Lights out control).

Upon reboot, the logical raid-5 array was unusable, and had to be forced online. I copied the two files that i absolutely had to have, and finally destroyed the array. When i go up to the data center again, I’m removing this card and burying it. Going to go back to my trusty Highpoint RR2224 which I’ve had for over 5 years now, without a single glitch.

Done.

Adaptec 3805 Compatibility Issues with Western Digital Blacks

Hello, a quick post / announcement that the Adaptec 3805 SAS Raid controller has compatibility issues with Western Digital Caviar Black drives. I don’t know if its an issue with all capacities of the Black series, but the ones i had were the 1TB ones, model number WDC-WD1001FALS-0.

After doing some research, it appears that the WD Blacks are NOT on the HCL for the Adaptec 3805 controller, but it took me some time to find this. I hope this post helps anyone who was going to go down this route.

Since i’d already gotten the WD Blacks, I ended up using Samsung Spinpoint F3’s from some of my servers. The WD Blacks work fine on Intel ICH Raid. I think next time, I’ll stick with these Samsungs. Cheap, good warranty, and FAST.

Now in general, why can’t Western Digital make standards compliant drives? I could easily blame Adaptec as well, but in this case, i think it has to be WD’s fault. They are known to make ATA drives that don’t work the same way as everyone else. How hard can it be? Even Samsung got it right.

Update, see my second post on this

Enabling xp_cmdshell

A quick post on how to enable xp_cmdshell on Sql 2005/2008/2008R2

EXEC sp_configure ‘show advanced options’, 1
GO
RECONFIGURE
GO
EXEC sp_configure ‘xp_cmdshell’, 1
GO
RECONFIGURE
GO
EXEC sp_configure ‘show advanced options’, 0
GO
RECONFIGURE
GO

Done.

HyperV Time issues with CentOS

Quick Reminder to myself on how to fix this time speedup issue with Centos 5.4 and HyperV

nano /boot/grub/grub.conf

append

divider=10 clocksource=acpi_pm

to the end of the current kernel (or all of them actually)

VSSAdmin List Writers is Empty

Hello,

Quick post to remind myself of how to fix volume shadow copy having no writers. Turns out this is a common problem when using the newsid utility on templated servers/computers. The fix is relatively simple.

1. Stop the Microsoft Shadow Copy Provider & Volume Shadow Copy Service
2. Export the contents of the HKLM\Software\Microsoft\EventSystem key to a .reg file (as a backup).
3. Delete the HKLM\Software\Microsoft\EventSystem\{26c409cc-ae86-11d1-b616-00805fc79216}\Subscriptions key. (Just delete the Subscriptions subkey; leave the EventClasses key.)
4. Restart the server.
5. Run the “VSSADMIN LIST WRITERS” command.

Thanks to Rhys Winter on his post in technet.

Windows7 LSASS crashing system

Microsoft’s Windows 7 packs a lot of power, performance and stability – for the vast majority of us. Unfortunately, one glitch left me tearing my hair out from this virtually perfect operating system. It appears that in a particular environment, Windows 7 Ultimate x64 will crash and burn all the time for no apparent reason. Luckily, David Weisz has found the problem, and given a reason and a solution to the problem.

Let me explain the scenario where this occurs. I have a Asus M50V laptop, a stellar laptop for business and gaming use. It has 4gb’s of RAM, and it’s my daily workstation for practically everything. Since I’m an IT professional, i run and operate my own Windows Domain within my enterprise, along with Exchange and all the other goodies – mainly as a showpiece of what a good network deployment can do for a customer. Hence, my laptop is joined to a Windows 2003 AD, just like all my other previous laptops and workstations and operating systems.

After installing Windows 7, it appeared that there may have been a glitch with the firmware on the motherboard and Power management, as 7/10 times, when i’d resume my laptop from a sleep state, and after logging in – i’d get the LSASS crash and the usual “Critical error, system will reboot in one minute” message, making me scramble to clsoe all my documents.

However, it appears that it is a small environmental issue with the types of Domain controllers i have, and authentication of such. Hence David Weisz’s solution works. The fix is a simple registry change so that the LSA service authenticates to Windows Servers, pretending to be of Windows Server compatibility. That change is

Key:   HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\Lsa\Kerberos\Parameters
Type:   REG_DWORD
Name:   DefaultEncryptionType
Data:    23 (decimal) or 0x17 (hexadecimal)

And bingo, no more crashed, and now i can resume my dance of saying Windows 7 is pretty much perfect in every way.

My advice for Windows 7

To put it simply, it is my opinion that everyone should eventually be using Windows 7 as it is a remarkable operating system with little issues. I’ve been using Windows 7 since January 2009, and have had very few issues, if any at all during that time. I should also add that I have been using Windows Vista since late 2006, and am coming from using Windows Vista for a long time on a laptop that was designed for Windows XP.

Here is a simple guide that you can follow that should tell you in a nutshell what you should do

IF

You are running Windows Vista now – Windows 7 is recommended to you.

You are running Windows XP or Windows 2000 – Windows 7 can be considered after a consultation.

You are looking to buy a new computer – Make sure it comes preloaded with Windows 7. If you are a business customer, and have a server, make sure the computer comes with Windows 7 Professional or Ultimate

How to install

In all cases, I do not recommend a Windows 7 upgrade. It is always in my interest to keep customers and the public in a safe environment, and because of my belief in this – I only recommend a clean install of windows 7, and not an upgrade.

To install Windows 7, backup all of your data to an external device – and then install windows7 by formatting the computer (select a custom install and delete partitions and install to the un partitioned area)

Once installed, reinstall your office applications and other applications and restore your data back to you’re my documents / favorites.

How to buy Windows 7

Windows 7 comes in a variety of editions. Since you already own Windows, you should look for the upgrade packages only. You do not need to buy the full version to do a clean install of windows. If you are unsure which edition to get, please contact me.

I’m on Windows XP, what do I do?

If you are on Windows XP, then there is no rush to move to Windows7. We should make sure that all of your existing applications can run in Windows 7, before making the plunge into Windows 7. Like Vista, windows7 runs in a different way to Windows XP leaving a lot of applications and devices unusable on Windows7 without updates or upgrades. Some devices such as old scanners will not work in Windows7 at all.

If you are running on a relatively new computer (newer than 2005) and have atleast 1gb of RAM, then running Windows7 should be fine – but perform a clean install as described above.

If you have one of the newer Netbooks (the ultra portable computers with 10inch screens) then running Windows7 should not be a problem, as long as you have enough disk space to install Windows 7 and 1gb of RAM.

If you are unsure, do not install Windows 7, and contact me instead.

NTBackup on Windows Server 2003 x64 and SQL Server 2000

Just came across this gem of a KB from Microsoft, which happens to be pretty recent as well. I had a customer with a x64 Win2k3 installation where NTBackup would just not run. It hung on Preparing the Volume Shadow Service, and never did and simply died. This KB outlines the exact issue and fixes it. Not sure if the patch is needed, but it could just be that the Registry key is needed.

http://support.microsoft.com/kb/913360