Failover Cluster Troubleshooting for Beginners / Amature

1
This topic provides information about the following issues:
  • Basic troubleshooting steps.
  • Recovering from a failover cluster failure.
  • Resolving the most common failover clustering problems.
Problem: Incorrect use of command-prompt syntax to install SQL Server
Issue 1: It is difficult to diagnose Setup issues when using the /qn switch from the command prompt, as the /qn switch suppresses all Setup dialog boxes and error messages. If the /qn switch is specified, all Setup messages, including error messages, are written to Setup log files.
Resolution 1: Use the /qb switch instead of the /qn switch. If you use the /qb switch, the basic UI in each step will be displayed, including error messages.
Problem: SQL Server cannot log on to the network after it migrates to another node
Issue 1: SQL Server service accounts are unable to contact a domain controller.
Resolution 1: Check your event logs for signs of networking issues such as adapter failures or DNS problems. Verify that you can ping your domain controller.
Issue 2: SQL Server service account passwords are not identical on all cluster nodes, or the node does not restart a SQL Server service that has migrated from a failed node.
Resolution 2: Change the SQL Server service account passwords using SQL Server Configuration Manager. If you do not, and you change the SQL Server service account passwords on one node, you must also change the passwords on all other nodes. SQL Server Configuration Manager does this automatically.
Problem: SQL Server cannot access the cluster disks
Issue 1: Firmware or drivers are not updated on all nodes.
Resolution 1: Verify that all nodes are using correct firmware versions and same driver versions.
Issue 2: A node cannot recover cluster disks that have migrated from a failed node on a shared cluster disk with a different drive letter.
Resolution 2: Disk drive letters for the cluster disks must be the same on both servers. If they are not, review your original installation of the operating system and Microsoft Cluster Service (MSCS).
Problem: Failure of a SQL Server service causes failover
Resolution: To prevent the failure of specific services from causing the SQL Server group to fail over, configure those services using Cluster Administrator in Windows, as follows:
Clear the Affect the Group check box on the Advanced tab of the Full Text Properties dialog box. However, if SQL Server causes a failover, the full-text search service restarts.
Problem: SQL Server does not start automatically
Resolution: Use Cluster Administrator in MSCS to automatically start a failover cluster. The SQL Server service should be set to start manually; the Cluster Administrator should be configured in MSCS to start the SQL Server service.
Problem: The Network Name is offline and you cannot connect to SQL Server using TCP/IP
Issue 1: DNS is failing with cluster resource set to require DNS.
Resolution 1: Correct the DNS problems.
Issue 2: A duplicate name is on the network.
Resolution 2: Use NBTSTAT to find the duplicate name and then correct the issue.
Issue 3: SQL Server is not connecting using Named Pipes.
Resolution 3: To connect using Named Pipes, create an alias using the SQL Server Configuration Manager to connect to the appropriate computer. For example, if you have a cluster with two nodes (Node A and Node B), and a failover cluster instance (Virtsql) with a default instance, you can connect to the server that has the Network Name resource offline using the following steps:
  1. Determine on which node the group containing the instance of SQL Server is running by using the Cluster Administrator. For this example, it is Node A.
  2. Start the SQL Server service on that computer using net start.
  3. Start the SQL Server SQL Server Configuration Manager on Node A. View the pipe name on which the server is listening. It should be similar to \.$$VIRTSQLpipesqlquery.
  4. On the client computer, start the SQL Server Configuration Manager.
  5. Create an alias SQLTEST1 to connect through Named Pipes to this pipe name. To do this, enter Node A as the server name and edit the pipe name to be \.pipe$$VIRTSQLsqlquery.
  6. Connect to this instance using the alias SQLTEST1 as the server name.
Problem: SQL Server Setup fails on a cluster with error 11001
Issue: An orphan registry key in [HKEY_LOCAL_MACHINESOFTWAREMicrosoftMicrosoft SQL ServerMSSQL.XCluster]
Resolution: Make sure the MSSQL.X registry hive is not currently in use, and then delete the cluster key.
Problem: Cluster Setup Error: “The installer has insufficient privileges to access this directory: <drive>Microsoft SQL Server. The installation cannot continue. Log on as an administrator or contact your system administrator”
Issue: This error is caused by a SCSI shared drive that is not partitioned properly.
Resolution: Re-create a single partition on the shared disk using the following steps:
  1. Delete the disk resource from the cluster.
  2. Delete all partitions on the disk.
  3. Verify in the disk properties that the disk is a basic disk.
  4. Create one partition on the shared disk, format the disk, and assign a drive letter to the disk.
  5. Add the disk to the cluster using Cluster Administrator (cluadmin).
  6. Run SQL Server Setup.
Problem: Applications fail to enlist SQL Server resources in a distributed transaction
Issue: Because the Microsoft Distributed Transaction Coordinator (MS DTC) is not completely configured in Windows, applications may fail to enlist SQL Server resources in a distributed transaction. This problem can affect linked servers, distributed queries, and remote stored procedures that use distributed transactions.
Resolution: To prevent such problems, you must fully enable MS DTC services on the servers where SQL Server is installed and MS DTC is configured.
To fully enable MS DTC, use the following steps:
  1. In Control Panel, open Administrative Tools, and then open Computer Management.
  2. In the left pane of Computer Management, expand Services and Applications, and then click Services.
  3. In the right pane of Computer Management, right-click Distributed Transaction Coordinator, and select Properties.
  4. In the Distributed Transaction Coordinator window, click the General tab, and then click Stop to stop the service.
  5. In the Distributed Transaction Coordinator window, click the Logon tab, and set the logon account NT AUTHORITYNetworkService.
  6. Click Apply and OK to close the Distributed Transaction Coordinator window. Close the Computer Management window. Close the Administrative Tools window.

Documented by : MADHAN MOHAN

Madan Mohan

Madan Mohan

Facebook Comments