Monday, May 11, 2009

SQL 2008 MSCS cluster verification error

This error confused me for a while and Google didn't come up with many hits so I thought I write something out there to help others that may hit on this seemingly rare issue. I ran into this problem on my 2nd or 3rd SQL 2008 x64 install on a Windows 2008 x64 cluster and they were running on the same hardware (Dell R900). Only thing I didn't do this time was do the actual install of Windows. I figured how different can someone else install Windows?....

SQL 2008 x64 Enterprise installing on Windows 2008 x64 Enterprise Microsoft Cluster Server.

During SQL install verification, it fails a cluster check even thought you've ran a Cluster Verification on your 2008 x64 cluster.



Rule "Microsoft Cluster Service (MSCS) cluster verification errors" failed.

The cluster either has not been verified or there are errors or failures in the verification report. Refer to KB953748 or SQL Server Books Online for more information.

As of this writing, KB953748 is still an Microsoft internal KB and according to the Premier Support tech I talked to, it doesn't have much info to help either. After spending hours on the phone with them, I gave up for the night and decided to do some digging myself the next day. After reading all I can find online, one guy hinted at using procmon to monitor the install process to see what is wrong with the verification process, this led to the resolution (ie. tricking the installer) to the issue.



First look in your C:\Windows\Cluster\Reports folder and you should see at least 2-3 files if you ran a full Cluster Verification prior to SQL installation. Cluster.log, ValidateStorage.log, and Validation Data for Node Set xxxxxxxxxxx. The Validation Data for Node Set file is the one that the installer is looking for in order to parse and check to see if the cluster has been verified. What we need to do is run AND PASS (warnings are ok, not failures) a cluster verification to generate these files, then run procmon on the installer and find the Validation Data for Node Set file name it is actually looking for and rename the existing one. I had to delete all the other files in the Reports folder before the SQL install would take the Validation Data for Node Set file with the name it is looking for.



Filter your procmon to only look at setup100.exe after the install starts the verification and look for the two highlighted entries. The second entries should say failed or something to that effect and you can copy the file name it is looking for from there. Delete or move all other files after you rename the file in the reports folder and run verification again, it should work for you. I had to do the same thing on the 2nd node as it looked for a different file name but I used the same verification file.

I also ran into this error on the second node, http://support.microsoft.com/kb/957459, The current SKU is invalid. The simplest way is just to delete the DefaultSetup.ini that contains your embedded serial number and just put in the same serial manually during install. I run my installation off of network drives after I copy them from the ISO. Hope this helps!


Edit: it looks like Microsoft may have something that works as well. However, I did try to skip verification with them on the phone but the commands we used might of been different, try this to see if it works first, much easier.

At a command prompt, change to the hard disk drive and to the folder that contains SQL Server Setup (Setup.exe). Then, type one of the following commands to skip the validation rule:

* For an integrated failover setup, run the following command on each node that is being added:
Setup /SkipRules=Cluster_VerifyForErrors /Action=InstallFailoverCluster

* For an advanced or enterprise installation, run the following command:
Setup /SkipRules=Cluster_VerifyForErrors /Action=CompleteFailoverCluster

Tuesday, February 19, 2008

Netbackup 5.0 Windows cluster migration

If you are in the same boat as me, Netbackup 5.0 Windows Master Server running on old hardware, you will need to migrate to new hardware before you should upgrade to the latest, in this case Netbackup 6.5.1, version. Here is how I performed the migration relatively painlessly.

First you need the new cluster, see my previous post about installing Netbackup offline. I highly recommend keeping the same Master Server name. If you have a cluster like me, you can have a new cluster name and ip but the virtual Master Server name and ip should remain the same. That is where installing Netbackup offline makes sense. Just use a cheap switch and put in host file entries for everything after you build the cluster online. Make sure your new cluster has every single software needed, I forgot Perl and some modules for Perl on mine. You should work with Symantec support on running NBCC to ensure no catalog issues before migration.

Step 1: Old cluster - Do a catalog backup after you stop all running backups.

Step 2: Old cluster - Shut off Master Server Virtual IP, this should cause Netbackup to go offline as well. Also shut off any addons, like NBAR or Aptare. Leave the shared disk running. Kill any Netbackup processes that may still be running.

Step 3: New cluster - Shut off all Netbackup services including the virtual name and IP, leave the shared disk running.

Step 4: Old cluster - Copy various databases and custom scripts to the new server over the network, make sure you turn off virus scanner as it can slow it down significantly. You need to know which ones to copy, basically your shared data which are the image database and volmgr database. You can see which ones exactly in your environment if you look in your catalog backup setup. You only need the stuff from your master server and nothing from the media server. An alternate option, and one that Symantec tells you to do, is to recover from a catalog backup. I've not had good luck with it, as it doesn't put everything in the right place, even if I had setup the new cluster exactly the same. Of course, if you didn't keep the directory structure the same, it won't be in the right place. Copying manually is just much faster and easier since you don't have to run a restore from command line, know which tape you need, blah blah. Since this isn't a disaster recovery scenario, just copy the data over.

Step 5: New cluster - bring all Netbackup services up, the only one that goes offline should be the device manager if you have a dedicated Master Server with no attached storage or tape drives.

Step 6: New cluster - Go into "Host Properties" in the Netbackup Administration Console, Master Server, and double click your Master Server. Put all your settings back as they were registry settings before, you did write them down right? You will need a restart once you make that change.

Step 7: New cluster - Check if your scratch pool is still there, mine was still there but not marked as a scratch pool, jobs failed when they ran out of tape. I guess that was some sort of registry setting to not get transferred over. Simply check the box marked Scratch pool and you are set.

Step 8: New cluster - Go into your cluster administrator and setup registry replication. You will want the SOFTWARE\Veritas keys replicated as that is all your Master Server settings. That way when you failover, your 2nd node will have all the right settings.

Step 9: New cluster - Double check host file entries, server list and any custom scripts you may have. At this point, Netbackup should be up and running and since you kept the same Master Server name you will be able to run backups once again.

That was about it, hope your migration goes smoothly. If I saved you thousands in consulting fees, please send a check to .....

I should have some experiences on the Netbackup 6.5.1 Windows Cluster upgrade in 2-3 weeks time, hope that goes smoothly as well.

Saturday, February 16, 2008

Netbackup 5.0 cluster install work around

Following the previous post, in order to install Netbackup 5.0 into the MSCS 2003 with a different default drive we had to change the default program files location in the registry. Microsoft apparently does not support it, but they tell you how to anyway. http://support.microsoft.com/kb/933700's title even says so, which is kind of hilarious.

I will change it back after and see if it works, I hope this will not have to happen when we are upgrading to 6.5.

Steps to change the ProgramFilesDir registry value to use the default location for the Program Files folder
Warning Serious problems might occur if you modify the registry incorrectly by using Registry Editor or by using another method. These problems might require that you reinstall the operating system. Microsoft cannot guarantee that these problems can be solved. Modify the registry at your own risk.

To change the location of the Program Files folder back to the default location, follow these steps:
1. Click Start, click Run, type regedit , and then click OK.
2. Locate and then click the following registry subkey:
HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Windows\CurrentVersion
3. In the details pane, right-click the ProgramFilesDir registry value, and then click Modify.
4. In the Value data box, type the default location for the Program Files folder, and then click OK.

NoteThe default location of the Program Files folder is systemdrive\Program Files. For example, if Windows is installed on drive C, type C:\Program Files in the Value data box.
5. Exit Registry Editor.

Saturday, February 9, 2008

Netbackup 5.0 Cluster Server move overly difficult

Here is the scenario, Windows 2000 MSCS running Netbackup 5.0MP6 master server and I want to move to Windows 2003 MSCS or VCS 5 running Netbackup 6.5.1 on new hardware while keeping the same master server name. My original plan was to install Windows 2003 on new hardware, install VCS 5, install Netbackup 5, migrate over, then upgrade after a couple of weeks. The hardest part of that plan is to keep the same name while production is still running.

Problem is, Netbackup 5 is only supported in VCS 2.0. No one wants to downgrade to VCS 2.0 in order to upgrade later to 5.0, such a huge jump. So I scrap that idea and going back to 2003 MSCS solution. I setup a MSCS cluster, add in host file entries, plug all the public NICs to a switch and do a cluster install of Netbackup. When I finally get it to work, turns out Netbackup won't install to the drive I tell it to...

The shared data drive is fine, installs fine to there, except the base install keeps going to C: drive which I changed and tried twice. It even says in the finish screen that I may need to change permissions to the drive I told it to install to. At this point I'm at a loss, this migration/upgrade has been far more difficult than expected. Some may say Netbackup is job security! I'd say source of high blood pressure.

Why Sun Microsystems sales are crap?

They wonder why, part of the problem is the support. StorageTek might be marketing leaders in physical tape libraries but I believe they are going away sooner than later. Mainly I think because their technical support is crap. I've had nothing but average to bad experiences with them. This last time was in the middle of the night and they were reluctant to come out. When I got sick of waiting and finally fixed it myself I agreed to have them come out the next day they never did come out. They gave excuse after excuse and I finally gave up. Would I recommend them? If you don't have StorageTek products, I'd say no, buy 2 VTLs instead and duplicate offsite.