Thursday, January 17, 2008

VCS 5.0 not starting in Windows 2003

I'm setting a 2 node Windows 2003 cluster using VSF HA 5.0 and in the process of testing I shut off all nodes and only turned 1 on. I couldn't connect to the cluster with the VCS Cluster Manager. If you do the same or lose all nodes or somehow the cluster completely crashes, here how you can solve this problem.

Look in your event viewer and you will see an error in application, source Had or maybe the same message in a popup. VCS CRITICAL V-16-1-11306 Did not receive cluster membership, manual intervention may be needed for seeding. Run the following command in a command prompt.

gabconfig -x

This will enable seed of control port, if it is not in your server path it is under "\Veritas\comms\gab". Connect as normal through your Cluster Manager.

Sunday, January 13, 2008

Netbackup 6.5 liveupdate using HTTP does not work either

Following the last post, I setup a web server on the Windows master server as the repository for the update to try to update another Windows client. Updated all settings on the client and master server. It still had the same issue of not able to find the files LiveUpdate is looking for at the address. If anyone out there has got Netbackup LiveUpdate working, please let me know how it is setup! For now, 6.5.1 will have to get there manually.

Update 1/17/2008

The Netbackup LiveUpdate 6.5.1 package isn't out yet, the current update release is only for manual installation. Set it up all you want, it won't work without this LiveUpdate package.

Tuesday, January 8, 2008

Netbackup 6.5 Liveupdate using UNC share does not work

*UPDATE* refer to this post for LiveUpdate, it has been confirmed that there is no update package for Netbackup LiveUpdate at this time.

Netbackup 6.5 is a fairly new major release and 6.5.1 has been out for about a month now. I've been trying to use Netbackup 6.5 Liveupdate from a Windows Master Server with a LAN share (UNC) on the same box to update another Windows client to 6.5.1 that is already running 6.5. You set everything up, you run the liveupdate policy, the job quits with a status 77 and the following message in the activity monitor.

Info nbliveup(pid=somenumber) EXIT STATUS 77
execution of the specified system command returned a nonzero status(77)

According to everything I've read and everything I've tried, this method is broken. Following the install guides and the Symantec class books (Yes, I went to class, and no, it was not worth it) my Netbackup Liveupdate environment is as follows.

Friday, January 4, 2008

EMC CDL

So, our DL700 (CX700 with a VTL engine v2.1) has been dog slow and I've been getting media errors. Several cases later, EMC recommended 6 BCCs, because we have ATA disk so no LCCs, and 2 sets of cables to be changed. Also, since we've been getting ghost single fan fault errors on the disk library, we had 2 power supplies changed as well. All the hardware change went well and the library was back on two hours later with the latest frumon code.

The problem became one of the links wouldn't come back on in the VTL engine. I checked our Netbackup servers, I checked the host HBAs, I even tracked down the switch admins our links were going through and had them reset the ports. No luck at all, after 2 reboots, I call EMC in Australia. One more hour of WebX and sending SP collects, X-Rays, and logs to them later, they send out a new VTL engine. 2 more hours of waiting for the parts to arrive. 30 more minutes of installing it since the VTL wouldn't fit because it had a slightly warped bezel. At that point I was ready to use a hammer and kick it in. There is nothing like being at work for 16 hours working on a problem.

The next day I had EMC on the phone again since everything keeps trespassing to SPA, they found the problem on the VTL paths, they somehow all reset to the same one. Ran a path optimizer script, did nothing, then manually repathed everything. So, we are back at the point where we started, let's see what has been fixed.

Slowness: still slow, same speeds.
Media errors: Netbackup still gets them on the VTL and still freezes tapes.
Ghost Single Fan Fault error: every few hours like clock work.

EMC support had the following to say, our test lab CDL runs about the same speeds (5 disk raid 3 running at 30mb/sec? wow!). You shouldn't get anymore media errors, and we have no idea on the single fan fault, maybe reboot the SPs. I can't afford anymore downtime since our duplications are already days behind. Lesson learned? Buy Hitachi if you can afford it!