Monday, May 11, 2009

SQL 2008 MSCS cluster verification error

This error confused me for a while and Google didn't come up with many hits so I thought I write something out there to help others that may hit on this seemingly rare issue. I ran into this problem on my 2nd or 3rd SQL 2008 x64 install on a Windows 2008 x64 cluster and they were running on the same hardware (Dell R900). Only thing I didn't do this time was do the actual install of Windows. I figured how different can someone else install Windows?....

SQL 2008 x64 Enterprise installing on Windows 2008 x64 Enterprise Microsoft Cluster Server.

During SQL install verification, it fails a cluster check even thought you've ran a Cluster Verification on your 2008 x64 cluster.



Rule "Microsoft Cluster Service (MSCS) cluster verification errors" failed.

The cluster either has not been verified or there are errors or failures in the verification report. Refer to KB953748 or SQL Server Books Online for more information.



As of this writing, KB953748 is still an Microsoft internal KB and according to the Premier Support tech I talked to, it doesn't have much info to help either. After spending hours on the phone with them, I gave up for the night and decided to do some digging myself the next day. After reading all I can find online, one guy hinted at using procmon to monitor the install process to see what is wrong with the verification process, this led to the resolution (ie. tricking the installer) to the issue.



First look in your C:\Windows\Cluster\Reports folder and you should see at least 2-3 files if you ran a full Cluster Verification prior to SQL installation. Cluster.log, ValidateStorage.log, and Validation Data for Node Set xxxxxxxxxxx. The Validation Data for Node Set file is the one that the installer is looking for in order to parse and check to see if the cluster has been verified. What we need to do is run AND PASS (warnings are ok, not failures) a cluster verification to generate these files, then run procmon on the installer and find the Validation Data for Node Set file name it is actually looking for and rename the existing one. I had to delete all the other files in the Reports folder before the SQL install would take the Validation Data for Node Set file with the name it is looking for.



Filter your procmon to only look at setup100.exe after the install starts the verification and look for the two highlighted entries. The second entries should say failed or something to that effect and you can copy the file name it is looking for from there. Delete or move all other files after you rename the file in the reports folder and run verification again, it should work for you. I had to do the same thing on the 2nd node as it looked for a different file name but I used the same verification file.

I also ran into this error on the second node, http://support.microsoft.com/kb/957459, The current SKU is invalid. The simplest way is just to delete the DefaultSetup.ini that contains your embedded serial number and just put in the same serial manually during install. I run my installation off of network drives after I copy them from the ISO. Hope this helps!


Edit: it looks like Microsoft may have something that works as well. However, I did try to skip verification with them on the phone but the commands we used might of been different, try this to see if it works first, much easier.

At a command prompt, change to the hard disk drive and to the folder that contains SQL Server Setup (Setup.exe). Then, type one of the following commands to skip the validation rule:

* For an integrated failover setup, run the following command on each node that is being added:
Setup /SkipRules=Cluster_VerifyForErrors /Action=InstallFailoverCluster

* For an advanced or enterprise installation, run the following command:
Setup /SkipRules=Cluster_VerifyForErrors /Action=CompleteFailoverCluster

7 comments:

Anonymous said...

This is exactly I was looking for,m very helpful! Thanks.

Anonymous said...

Thanks for help

Anonymous said...

I noticed the same thing in Perfmon today. As you did, I copied the validation report to the filename that the SQL installer was looking for, and it succeeded. I'm guessing this is because one of my nodes has a hostname in lowercase letters, and some tools convert this to upper case and some don't. This could be a problem if the random-looking part of the filename is dependent on the hostname.

Anonymous said...

Thank you very much. It helped me. I just renamed ValidateStorage.log file and SQL validation is successful.

Unknown said...

Yes Thank you very much.

This happened to us on a SQL cluster reinstall. We had uninstalled sql, unclustered, and moved the nodes to another domain. Upon SQL reinstall we ran into this issue. SQL install ran perfectly the first time, so even on the same install of windows its possible to get this error.

THANK YOU for taking the time to write this up for your fellow geeks.

Aditya Samant said...

You sir are awesome! Thanks for posting this.

Unknown said...

May God bless you Brother.