Archive

Posts Tagged ‘CSV’

[RHS] Error 5023 from ResourceControl for resource Cluster Disk

August 18, 2011 1 comment

This week I got a call from one of my team mates (Nashaat Sorial) telling me that he is facing a problem with his Hyper-V cluster..

He is running 3 Hyper-V hosts with Windows 2008 R2 SP1 running multiple applications. One of the hosts keep failing and He can not Live migrate any VM to the other hosts until restart.

We went through all well known workarounds for such cases like

1- if you have AV please disable

2- Also check http://technet.microsoft.com/en-us/library/cc773533%28WS.10%29.aspx

3- Please if you have teaming..Break it and check again

4- Please check http://support.microsoft.com/kb/981618

5- Make sure that you enable the "file and printer sharing" as well as the "client for MS networks" on the heartbeat networks cards on all nodes.  http://support.microsoft.com/kb/2008795

So we have to start reading more in the cluster logs. You can generate the cluster logs as per

http://blogs.msdn.com/b/clustering/archive/2008/09/24/8962934.aspx

We got some interesting data and errors in the logs just like:

In Win2008 the Physical Disk resource type private property that stores the disk signature change from "Signature" and it is now "DiskSignature" in Win2008.  The lack of the DiskSignature property not being populated was resulting in the resource failing to come online

ERROR_CLUSTER_GROUP_MOVING(5908)’ because of ”Virtual Machine Configuration R-Web2003′ is owned by node 2, not 1.’

mm.. So it looks like something from the Hardware level. Searching HP blades errors come with more useful information

http://cb-net.co.uk/index.php?option=com_content&view=article&id=84:hp-bl460-asr-hpqilo2-issues&catid=3:windows-server-2003&Itemid=3

 

he HP Integrated Management Log shows ‘ASR Detected by System ROM‘ along with the following events in the System event log on an affected machine:

Event Type:    Warning
Event Source:    hpqilo2
Event Category:    None
Event ID:    57
Failed GET SENSOR READING, sensor 16
Event Type:    Warning
Event Source:    hpqilo2
Event Category:    None
Event ID:    57
NetFN 0×4, command 0x2D timed out

The solution for this was to perform the following:

  • Install the latest ILO Firmware Update v. 1.81
  • Install the HP iLO Management Channel Interface Driver v. 1.15.0.0
  • Install the HP ProLiant iLO2 Management Controller Driver v. 1.12.0.0

All of these drivers can be downloaded form the following location, selecting your operating system:

http://h20000.www2.hp.com/bizsupport/TechSupport/DriverDownload.jsp?prodNameId=3288156?=en&cc=us&prodTypeId=3709945&prodSeriesId=1842750&taskId=135

My CSV is full… So what can I do ?!!

October 22, 2010 1 comment

One of my customers faced this problem.. he created Hyper-V cluster with CSV volume and start placing VMs on it.. after a while CSV start running out of space and he is planning to expand CSV volume using SAN tools. Is LUN expansion from the SAN level, is it support from MS ? What are the risks with doing that?

After some help from Microsoft support group I got this answer

According to:

 

Volume expansion is also a regular requirement since data growth is not often considered up-front. Cluster disks can be extended without rebooting if the controller supports dynamic LUN expansion. This feature allows for the physical expansion to be implemented without disruption and users can use tools (diskpart) provided by Microsoft to allow for the change to be seamlessly applied at the logical level as well.

 

Storage Topologies

http://technet.microsoft.com/en-us/library/cc738154(WS.10).aspx

 

We can see LUN expansion is supported by Microsoft Operation System.

 

Also, from the following information:

 

The ability to add storage while the system is operating enables better storage efficiency as it does not require any downtime for storage management. The storage can be expanded either by adding an additional LUN or by expanding the existing LUN on the CLARiiON array.

 

Optimized Storage Solution for Enterprise Scale Hyper-V Deployments

http://download.microsoft.com/download/D/E/6/DE619C29-BBCA-468F-960C-93B8113F612B/EMC_Sanbolic_MS_POC-Final.pdf

 

We can see the storage can be expanded either by adding an additional LUN or by expanding the existing LUN.

 

Regarding the risks, as far as I know, there should be no risk if we follow the proper steps to expand the LUN.

 

Here, I list following link for your reference:

 

Best Practices – SAN LUN Creation and Size Expansion for SAP Architectures

http://blogs.technet.com/b/lobapps/archive/2010/09/14/lun-creation-and-size-expansion.aspx

 

 

Considerations for Backing Up Virtual Machines on CSV with the System VSS Provider

July 10, 2010 1 comment

If your SAN vendor does not have hardware VSS providers, you can use software snapshots to back up your virtual machines.

We recommend that virtual machines deployed on CSV should be backed up serially.

There are two aspects to serialization of backup jobs in a CSV environment:

  • Serializing virtual machine backups on a per node basis.
  • Serializing virtual machine backups on a per CSV LUN basis.

Enabling Per Node Serialization

Create the following registry key on the DPM server:

Key HKLM\Software\Microsoft\Microsoft Data Protection Manager\2.0\Configuration\MaxAllowedParallelBackups
Value Microsoft Hyper-V
Data 1
Type DWORD

This ensures that only one backup job will run at a time on a Hyper-V host.

Enabling Per CSV LUN Serialization

This form of serialization limits the number of virtual machine backups happening on a single CSV LUN. This is done by creating a DataSourceGroups.xml file and placing it in the DPM server at %PROGRAMFILES%\Microsoft DPM\DPM\Config. This file provides DPM with information about the CSV virtual machine deployment configuration/distribution on the various CSV LUN so as to serialize the backups that happen per CSV LUN.

A DSConfig.ps1 script (below) creates the DataSourceGroups.xml file by listing all the virtual machines running on CSV in groups. Each group has the list of all virtual machines hosted on one CSV LUN. DPM permits only one backup from one such group at a time.

# DSConfig.ps1

$infoText = "This script will generate the DatasourceGroups.xml file in the current path. Once this file is created merge it with the same file name under %programfiles%\Microsoft DPM\DPM\Config directory on the DPM server. Read the documentation for more details."
echo $infoText

$header = "<?xml version=`"1.0`" encoding=`"utf-16`"?> `n <DatasourceGroup xmlns:xsi=`"http://www.w3.org/2001/XMLSchema-instance`" xmlns:xsd=`"http://www.w3.org/2001/XMLSchema`" xmlns=`"http://schemas.microsoft.com/2003/dls/GroupDatasourceByDisk.xsd`">"
$footer = "</DatasourceGroup>"

import-module -name FailoverClusters

$dir = [guid]::NewGuid()
md $dir

$cluster = get-Cluster
$FQDN = $cluster.Name + "." + $cluster.Domain
$res = get-clusterresource | where-object { $_.ResourceType.Name -eq "Virtual Machine Configuration"}
foreach ($r in $res)
{
$VmObj = Get-ClusterParameter -inputobject $r | where {$_.Name -eq "VmStoreRootPath"} # Identifies the CSV volume on which the VM is hosted.
$VmName = Get-ClusterParameter -inputobject $r | where {$_.Name -eq "VmId"}
$vol = $vmobj.Value.Split("\")[2] # $vol will return to us the Volume<number> of the CSV on which the VM resides.
$line = "<Datasource DatasourceName=`"" + $VmName.Value +"`"" + " ProtectedServerName=`"" + $r.OwnerGroup.Name + "."+ $FQDN +"`"" + " WriterId=`"66841cd4-6ded-4f4b-8f17-fd23f8ddc3de`" />"
echo $line >> $dir\$vol # File VolumeX will contain entries for all VMs hosted on CSV VolumeX
}

echo $header > DataSourceGroups.xml
$filelist = dir $dir\Volume*
$GroupEndString = "</Group>"
foreach ($file in $filelist)
{
   $GroupBeginString = "<Group GroupName=`"" + $file.Name + "-" + $FQDN + "`">" # Group name is kept VolumeX itself
   echo $GroupBeginString >> DataSourceGroups.xml
   type $file >> DataSourceGroups.xml # Consolidating groups pertaining to all the volumes. 
   echo $GroupEndString >> DataSourceGroups.xml
}

Remove-Item -Force -Recurse $dir

echo $footer >> DataSourceGroups.xml

Procedure to Create the DataSourceGroups.xml File and Serialize the Backup Jobs

  1. Generate the DataSourceGroups.xml file by running the DSConfig.ps1 script on any one node of a cluster containing CSV. For more information about how to generate the file, see Procedure to Generate the DataSourceGroups.xml File on a CSV Cluster.
  2. Repeat step 1 for every cluster that is protected by a DPM server.
  3. Merge all such DataSourceGroups.xml files into a single file on the DPM server. You can skip this step and copy the file directly to %PROGRAMFILES%\Microsoft DPM\DPM\Config if the DPM server is protecting only one cluster. For more information about merging the files, see Procedure to Merge the DataSourceGroups.xml Files from All CSV Clusters.
  4. If a protection group has already been created for the virtual machines, perform the steps in the Modify Protection Group Wizard. If a protection group has not been created, create a new protection group and the job serialization described above will take effect.

The DataSourceGroups.xml file needs to be updated only when virtual machines are added, deleted, or modified in the cluster and protection is configured for them.

Regenerate the DataSourceGroups.xml file from the CSV cluster and update the DataSourceGroups.xml file by replacing the existing groups for that cluster with the new groups.

Procedure to Generate the DataSourceGroups.xml File on a CSV Cluster

  1. Copy the DSConfig.ps1 file to any one node of a CSV cluster.
  2. Run this script in an elevated Windows PowerShell window and locate the DataSourceGroups.xml file generated in the same folder C:\MyFolder\>DSConfig.ps1
  3. This script will generate the DataSourceGroups.xml file in the current path. After this file is created, copy it to the %programfiles%\Microsoft DPM\DPM\Config directory on the DPM server.
  4. You can verify the groupings by opening the XML file that is generated. The following is the expected format:
    <?xml version="1.0" encoding="utf-16"?>
    <DatasourceGroup xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns="http://schemas.microsoft.com/2003/dls/GroupDatasourceByDisk.xsd">
    <Group GroupName="Group1">
    <Datasource DatasourceName="EA24071A-7B7B-42CF-AB1D-BBAE49F50632" ProtectedServerName="SCVMM VM-Vol7-03 Resources.CSVSCALE.SCALEDPM01.LAB" WriterId="66841cd4-6ded-4f4b-8f17-fd23f8ddc3de" />
    </Group>
    </DatasourceGroup>
    

Procedure to Merge the DataSourceGroups.xml Files from All CSV Clusters

noteNote
You can skip this step if the DPM server is protecting only one CSV cluster. The generated DataSourceGroups.xml file can be used directly on the DPM server.
  1. Copy any one of the DataSourceGroups.xml files that was generated to the DPM server under the location %Programfiles%\Microsoft DPM\DPM\Config.
  2. Open the file to edit it.
    <?xml version="1.0" encoding="utf-16"?>
    <DatasourceGroup xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns="http://schemas.microsoft.com/2003/dls/GroupDatasourceByDisk.xsd">
    <Group GroupName="Group1">
    <Datasource DatasourceName="EA24071A-7B7B-42CF-AB1D-BBAE49F50632" ProtectedServerName="SCVMM VM-Vol7-03 Resources.CSVSCALE.SCALEDPM01.LAB" WriterId="66841cd4-6ded-4f4b-8f17-fd23f8ddc3de" />
    </Group>
    </DatasourceGroup>
  3. Copy the <Group> tags from all the DataSourceGroup.xml files generated and add the text between the <DataSourceGroup> tags. The DataSourceGroups.xml file will now contain one <header> tag, one <DataSourceGroup> tag, and <Group> tags from all CSV clusters.
  4. Close the DataSourceGroups.xml file on the DPM server. It is now ready to use.

Cluster shared volume is no longer available on this node Event ID: 5120

July 9, 2010 1 comment

I faced a problem after building a new cluster that I have these error on 2nd node (Passive one) and when I tried to access it explorer hangs

Event ID: 5120
Source: Microsoft-Windows-FailoverCluster
Level: Error
Description: Cluster shared volume “volume_name” is no longer available on this node because of “STATUS_BAD_NETWORK_PATH(c00000be)’. All I/O will temporarily be queued until a path to the volume is re-established.

Event ID generated: 5142
Source: Microsoft-Windows-FailoverCluster
Description: Cluster Shared Volume ‘Volume_name’ (‘Cluster Disk #’) is no longer accessible from this cluster node because of error ‘ERROR_TIMEOUT(1460)’. Please troubleshoot this node’s connectivity to the storage device and network connectivity

I found all details on MS KB 2008795

Cause:

When accessing a CSV volume from a passive (non-coordinator) node, the disk I/O to the owning (coordinator) node is routed through a ‘preferred’ network adapter and requires SMB be enabled on that network adapter. For SMB connections to work on these network adapters, the following protocols must be enabled:

  • Client for Microsoft Networks
  • File and Printer Sharing for Microsoft Networks

Solution

Review each cluster node and verify the following protocols are enabled the network adapters available for Cluster use

  • Client for Microsoft Networks
  • File and Printer Sharing for Microsoft Networks

1. Click Start , click Run , type ncpa.cpl , and then click OK .
2. Right-click the local area connection that is associated with the network adapter, and then click Properties .
3. Verify that the above protocols appear in the This connection uses the following items box.  If either is missing, follow these steps:
a.  Click Install , click Client , and then click Add .
b.  Select the missing protocol, click OK , and then click Yes .
4. Verify that the check box that appears next to Client for Microsoft Networks is selected.

And in my case I discovered that file and print sharing was not checked on the private connection.

Windows Server 2008 R2 Live Migration – “The devil may be in the networking details.”

December 10, 2009 3 comments

Source: Ask the Core Team

Windows Server 2008 R2 has been publicly available now for only a short period of time, but we are already seeing a good adoption rate for the new Live Migration functionality as well as the new Cluster Shared Volumes (CSV) feature. I personally have worked enough issues now where Live Migration is failing that I felt a short blog on what process I have followed to work through these may have some value.

It is important to mention right up front that there is information publicly available on the Microsoft TechNet site that discusses Live Migration and Cluster Shared Volumes. This content also includes some troubleshooting information. I acknowledge that a lot of people do not like to sit in front of a computer monitor and read a lot of text to try and figure out how to resolve an issue. I am one of those people. Having said that, let’s dive in.

It has been my experience thus far that issues that prevent Live Migration from succeeding have to do with proper network configuration. In this blog, I will address the main network related configuration items that need to be reviewed in order to be sure Live Migration has the best chance of succeeding. I begin with an initial set of assumptions which include the R2 Hyper-V Failover Cluster has been properly configured and all validation tests have passed without failure, the highly available VM(s) have been created using cluster shared storage, and the virtual machine(s) are able to start on at least one node in the cluster.

I start off by identifying the virtual machines that will not Live Migrate between nodes in the cluster. While it should not be necessary in Windows Server 2008 R2, I recommend first running a ‘refresh’ process on each virtual machine experiencing an issue with Live Migration. I say it should not be necessary because a lot of work was done by the Product Group to more tightly integrate the Failover Cluster Management interface with Hyper-V. Beginning with R2, virtual machine configuration and management can be done using the Failover Cluster Management interface. Here is a sample of some of the actions that can be executed using the Actions Pane in Failover Cluster Manager.

clip_image002

If virtual machine configuration and management is accomplished using the Failover Cluster Management interface, any configuration changes made to a virtual machine should be automatically synchronized across all nodes in the cluster. To ensure this has happened, I begin by selecting each virtual machine resource individually and executing a Refresh virtual machine configuration process as shown here –

clip_image004

The process generates a report when it completes. The desired result is shown here –

clip_image006

If the process completes with a Warning or Failure, examine the contents of the report and fix the issue(s) that was reported and run the process again until it successfully completes.

If the refresh process completes without Failure, try to Quick Migrate the virtual machine to each node in the cluster to see if it succeeds.

clip_image008

If a Quick Migration completes successfully, that confirms the Hyper-V Virtual Networks are configured correctly on each node and the processors in the Hyper-V servers themselves are compatible. The most common problem with the Hyper-V Virtual Network configuration is that the naming convention used is not the same on every node in the cluster. To determine this, open the Hyper-V Management snap-in, select the Virtual Network Manager in the Actions pane and examine the settings.

clip_image010

The information shown below (as seen in my cluster) must be the same across all the nodes in the cluster (which means each node must be checked). This includes not only spelling but ‘case’ as well (i.e. PUBLIC is not the same as Public) –

clip_image012

It is important to be able to successfully Quick Migrate all virtual machines that cannot be Live Migrated before moving forward in this process. If the virtual machine can Quick Migrate between all nodes in the cluster, we can begin taking a closer look at the networking piece.

Start verifying the network configuration on each node in the cluster by first making sure the network card binding order is correct. In each cluster node, the Network Interface Card (NIC) supporting access to the largest routable network should be listed first. The binding order can be accessed using the Network and Sharing Center, Change adapter settings. In the Menu bar, select Advanced and from the drop down list choose Advanced Settings. An example from one of my cluster nodes is shown here where the NIC (PUBLIC-HYPERV) that has access to the largest routable network is listed first.

clip_image014

Note: You may also want to review all the network connections that are listed and Disable those that are not being used by either the Hyper-V server itself or the virtual machines.

On each NIC in the cluster, ensure Client for Microsoft Networks and File and Printer Sharing for Microsoft Networks is enabled (i.e. checked). This is a requirement for CSV which requires SMB (Server Message Block).

clip_image016

Note: Here is where people get into trouble usually because they are familiar with clusters and have been working with them for a very long time, maybe even as far back at NT 4.0 days. Because of that, they have developed a habit for configuring cluster networking which basically is outlined in KB 258750. This article does not apply to Windows Server 2008.

Note: If CSV is configured, all cluster nodes must reside on the same non-routable network. CSV (specifically for re-directed I/O) is not supported if cluster nodes reside on separate, routed networks.

Next, verify the local security policy and ensure NTLM security is not being restricted by a local or domain level policy. This can be determined by Start > Run > gpedit.msc > Computer Configuration > Windows Settings > Security Settings > Local Policies > Security Options. The default settings are shown here –

clip_image018

In the virtual machine resource properties in the Failover Cluster Management snap-in, set the Network for Live Migration ordering such that the highest speed network that is enabled for cluster communications and is not a Public network is listed first. Here is an example from my cluster. I have three networks defined in my cluster –

clip_image020

The Public network is used for client access, management for the cluster, and for cluster communications. It is configure with a Default Gateway and has the highest metric defined in the cluster for a network the cluster is allowed to use for its own internal communications. In this example, since I am also using ISCSI, the ISCSI network has been excluded from cluster use. The corresponding listing on the virtual machine resource in the Network for live migration tab looks like this –

clip_image022

Here, I have unchecked the iSCSI network as I do not want Live Migration traffic being sent over the same network that is supporting the storage connection. The Cluster network is totally dedicated to cluster communications only so I have moved that to the top as I want that to be my primary Live Migration network.

Note: Once the live migration network priorities have been set on one virtual machine, they will apply to all virtual machines in the cluster (i.e. it is a Global setting).

Once all the configuration checks have been verified and changes made on all nodes in the cluster, execute a Live Migration and see if it completes successfully.

Bonus material:

There are configurations that can be put in place that can help live migrations run faster and CSV to perform better. One thing that can be done, is to Disable NetBIOS on the NIC that will be supporting the primary network used by CSV for re-directed I/O. This should be a dedicated network and should not be supporting any other traffic other than internal cluster communications, redirected I/O for CSV and\or live migration traffic.

clip_image024

Additionally, on the same network interface supporting live migration, you can enable larger packet sizes to be transmitted between all the connected nodes in the cluster.

clip_image026

If, after making all the changes discussed here, live migration is still not succeeding, then perhaps it is time to open a case with one of our support engineers.

Thanks again fro you time, and I hope you have found this information useful. Come back again.

Additional resources:

Using Live Migration with Cluster Shared Volumes in Windows Server 2008 R2

High Availability Product Team Blog

Hyper-V and Virtualization on Microsoft TechNet

Windows Server 2008 R2 Hyper-V Forum

Windows Server 2008 R2 High Availability Forum

Chuck Timon
Senior Support Escalation Engineer
Microsoft Enterprise Platforms Support

How will Microsoft does the Live Migration through Cluster Shared Volumes

November 22, 2008 Leave a comment

As Microsoft announced about 2 months ago the new Hyper-V live Migration in Windows 2008 R2. A lot of questions been rose how Microsoft will do that?

Actually I have been looking forward into that because I have a lot of reservations in this point.

Now Windows Server 2008 R2 Reviewers announce their (BETA) guide which includes a lot of information according this new feature.

Microsoft will use new Cluster Shared Volumes (CSV) feature within Failover Clustering in Windows Server 2008 R2. The CSV volumes enable multiple nodes in the same failover cluster to concurrently access the same logical unit number (LUN). From a VM’s perspective, each VM appears to actually own a LUN; however, the .vhd files for each VM are stored on the same CSV volume.

This technique is same as VMware ESX for live migration as now all cluster nodes would access the shared volumes by using these fully qualified paths.

The good news is administrators won’t have to reformat their SANs to take advantage of CSVs.

I believe that Microsoft will provide more and more functionality in Hyper-V 2 and hope that they stuck in their promise to be just in place update.

 

Follow

Get every new post delivered to your Inbox.

Join 1,015 other followers

%d bloggers like this: