Live Migrate fails with event 21502 (2019-->2016 host)

January 25, 2019, 8:43 am

≫ Next: Windows 2012 r2 Cluster issues - Guest vms fail when one specific node hosts CSV

≪ Previous: Cluster

I have 2016 Functional level cluster with Server 2019 (basically in a process of replacing 2016 host with 2019)

If VM is running on 2019 host I can poweroff, quick migrate to 2016 host, power on & all is good

But Live migration always gives me above error

All I am getting in Event Data is (very descriptive?!):

Live migration of 'Virtual Machine Test' failed.

Nothing else, no reason.

If VM is running on 2016 host I CAN do live migration to 2019 fine! (albeit with errors reported in this thread, but I do NOT have VMM being used!)

vm\service\ethernet\vmethernetswitchutilities.cpp(124)\vmms.exe!00007FF7EA3C2030: (caller: 00007FF7EA40EC65) ReturnHr(138) tid(2980) 80070002 The system cannot find the file specified.
    Msg:[vm\service\ethernet\vmethernetswitchutilities.cpp(78)\vmms.exe!00007FF7EA423BE0: (caller: 00007FF7EA328FEE) Exception(7525) tid(2980) 80070002 The system cannot find the file specified.
]

Both host are IDENTICAL hardware on same firmware level of every component!

There is NOTHING relating to even attempting migration in local host Hyper-V VMMS/Admin/Operational logs

In Hyper-V High Availability/Admin I get same error but with Even ID 21111

Seb

I am wondering if it is easier to ditch 2019 & stick with 2016 for now

↧

Windows 2012 r2 Cluster issues - Guest vms fail when one specific node hosts CSV

March 5, 2019, 3:21 pm

≫ Next: Live Migrate fails with event 21502 (2019-->2016 host)

≪ Previous: Live Migrate fails with event 21502 (2019-->2016 host)

I have a Windows Server 2012 r2 cluster set up with 3 nodes.

2 nodes, vm3 and vm5, have no issues acting as owner of any role, including the CSV volumes, Quorum Disk Witness, and the individual VMs.

1 node, vm1, has no issues owning any of the individual VM roles, one of the CSV volumes (high-speed-lun), or the Quorum Disk Witness. However, if vm1 is set as the owner of LUN_1 or LUN_2, any of the VMs that have their OS vhd(x) file hosted on those LUNS and are not owned by vm1, fail and can't be restarted.

The VMs that

a) are owned by vm1 and have their os vhd(x) files on the a LUN that is owned by vm1 or,
b) are owned by any vm host and have their os vhd(x) files on the "high-speed-lun" no matter what node owns "high-speed-lun"

are not affected and have no issues booting or running. It does not matter if LUN/CSV ownership fails over automatically, or if I manually change the owner node to vm1, any running VM that does not fit one of the above 2 descriptions will immediately die and not be able to restart.

Some scenarios that will hopefully clarify this issue a bit:

vmguest1 and vmguest2 are hosted on vm1 node and their os storage is located on LUN_2, which is owned by vm5 node. this is not a problem and everything works. also no issues if this is reversed.
vmguest1 is owned by vm1 and vmguest2 is owned by vm3 node and their os storage is located on "high-speed-lun", which is owned by vm1 node. This is not a problem and everything works.
vmguest1 is owned by vm1 and vmguest2 is owned by vm3 node, with both os storage located on LUN_1, which is owned by vm1 node. vmguest1 will be fine, while vmguest2 will fail to run/start.

When this issue occurs, I see the following errors in the Cluster Events/Event Viewer:

Error, Event ID 1069 "Cluster resource 'Virtual Machine vmguest1' of type 'Virtual Machine' in clustered role 'vmguest1' failed. The error code was '0x780' ('The file cannot be accessed by the system.').
Error, Event Id 1205 "The Cluster service failed to bring clustered role 'vmguest1' completely online or offline. One or more resources may be in a failed state. This may impact the availability of the clustered role."

I know this is a lot of info, just trying to give as clear of an outline of the issues I'm seeing as possible up front.

Any thoughts anyone has to help get this all cleaned up would be greatly appreciated.

In the interest of reducing questions about the cluster setup/environment, I'm going to try and get all of the potentially relevant info here in one fell swoop below.

Node info ("vm1", "vm3", "vm5"):

all 3 nodes are running 2012 r2,
all have the same updates [verified by cluster validation],
2x xeon e5-2430l hex-core, 64gb memory,
2x onboard nics teamed for cluster comms,
2x onboard nics teamed and assigned to hyper-v switch,
4x nics on individual subnets for communication with SAN
only known physical difference between the nodes is that vm1 has it's OS drive set up as a 2-disk 558GB RAID1, while vm3/vm5 have their OS drives set up as 4-disk 1.1tb RAID10.
all AD Joined with 3 DCs in 2 locations, 2 remote in the satellite office, 1 in the dc local to this cluster on separate hardware. All AD tests/replication/etc have been tested and are, to the best of my knowledge, working properly.

Storage hardware ("dcsan"):

Dell MD3200i with dual controllers
each controller has 4 nics that are set up on individual subnets to match how the server nics are configured
One disk group set up as RAID10 across 8 physical 2tb, 7.2k rpm drives, with 7,430 gb total storage available ("Disk Group 0")
One disk group set up as RAID5 across 4 physical 600gb, 15k rpm drives, with 1,660 gb total storage available ("Disk Group 2")
MPIO is configured on each server node

Dell MDSM host mappings (see screenshot, actual host names changed for security):

The LUNs are available in Storage->Disks on each node as follows (LUN name in screenshot above, LUN Size, disk group, assigned to, Disk Number):

High-Speed-lun (HighSpeed1, 1.6 tb, Disk Group 2, Cluster Shared Volume, 4)
LUN_1 (Lun_1, 3.5tb, Disk Group 1, Cluster Shared Volume, 3)
LUN_2 (LUN_2, 3.5tb, Disk Group 1, Cluster Shared Volume, 3)
Quorum Witness (Cluster_Quorum, 520 mb, Disk Group 1, Disk Witness in Quorum, 1)

Cluster Roles:

approx 20-25 guest vms, majority running 2012 r2, with a few running ubuntu (14.04-18.04 os)

↧

Live Migrate fails with event 21502 (2019-->2016 host)

January 25, 2019, 8:43 am

≫ Next: Windows 2012 r2 Cluster issues - Guest vms fail when one specific node hosts CSV

≪ Previous: Windows 2012 r2 Cluster issues - Guest vms fail when one specific node hosts CSV

I have 2016 Functional level cluster with Server 2019 (basically in a process of replacing 2016 host with 2019)

If VM is running on 2019 host I can poweroff, quick migrate to 2016 host, power on & all is good

But Live migration always gives me above error

All I am getting in Event Data is (very descriptive?!):

Live migration of 'Virtual Machine Test' failed.

Nothing else, no reason.

If VM is running on 2016 host I CAN do live migration to 2019 fine! (albeit with errors reported in this thread, but I do NOT have VMM being used!)

vm\service\ethernet\vmethernetswitchutilities.cpp(124)\vmms.exe!00007FF7EA3C2030: (caller: 00007FF7EA40EC65) ReturnHr(138) tid(2980) 80070002 The system cannot find the file specified.
    Msg:[vm\service\ethernet\vmethernetswitchutilities.cpp(78)\vmms.exe!00007FF7EA423BE0: (caller: 00007FF7EA328FEE) Exception(7525) tid(2980) 80070002 The system cannot find the file specified.
]

Both host are IDENTICAL hardware on same firmware level of every component!

There is NOTHING relating to even attempting migration in local host Hyper-V VMMS/Admin/Operational logs

In Hyper-V High Availability/Admin I get same error but with Even ID 21111

Seb

I am wondering if it is easier to ditch 2019 & stick with 2016 for now

↧

Windows 2012 r2 Cluster issues - Guest vms fail when one specific node hosts CSV

March 5, 2019, 3:21 pm

≫ Next: Live Migrate fails with event 21502 (2019-->2016 host)

≪ Previous: Live Migrate fails with event 21502 (2019-->2016 host)

I have a Windows Server 2012 r2 cluster set up with 3 nodes.

2 nodes, vm3 and vm5, have no issues acting as owner of any role, including the CSV volumes, Quorum Disk Witness, and the individual VMs.

The VMs that

a) are owned by vm1 and have their os vhd(x) files on the a LUN that is owned by vm1 or,
b) are owned by any vm host and have their os vhd(x) files on the "high-speed-lun" no matter what node owns "high-speed-lun"

Some scenarios that will hopefully clarify this issue a bit:

vmguest1 and vmguest2 are hosted on vm1 node and their os storage is located on LUN_2, which is owned by vm5 node. this is not a problem and everything works. also no issues if this is reversed.
vmguest1 is owned by vm1 and vmguest2 is owned by vm3 node and their os storage is located on "high-speed-lun", which is owned by vm1 node. This is not a problem and everything works.
vmguest1 is owned by vm1 and vmguest2 is owned by vm3 node, with both os storage located on LUN_1, which is owned by vm1 node. vmguest1 will be fine, while vmguest2 will fail to run/start.

When this issue occurs, I see the following errors in the Cluster Events/Event Viewer:

Error, Event ID 1069 "Cluster resource 'Virtual Machine vmguest1' of type 'Virtual Machine' in clustered role 'vmguest1' failed. The error code was '0x780' ('The file cannot be accessed by the system.').
Error, Event Id 1205 "The Cluster service failed to bring clustered role 'vmguest1' completely online or offline. One or more resources may be in a failed state. This may impact the availability of the clustered role."

I know this is a lot of info, just trying to give as clear of an outline of the issues I'm seeing as possible up front.

Any thoughts anyone has to help get this all cleaned up would be greatly appreciated.

In the interest of reducing questions about the cluster setup/environment, I'm going to try and get all of the potentially relevant info here in one fell swoop below.

Node info ("vm1", "vm3", "vm5"):

all 3 nodes are running 2012 r2,
all have the same updates [verified by cluster validation],
2x xeon e5-2430l hex-core, 64gb memory,
2x onboard nics teamed for cluster comms,
2x onboard nics teamed and assigned to hyper-v switch,
4x nics on individual subnets for communication with SAN
only known physical difference between the nodes is that vm1 has it's OS drive set up as a 2-disk 558GB RAID1, while vm3/vm5 have their OS drives set up as 4-disk 1.1tb RAID10.
all AD Joined with 3 DCs in 2 locations, 2 remote in the satellite office, 1 in the dc local to this cluster on separate hardware. All AD tests/replication/etc have been tested and are, to the best of my knowledge, working properly.

Storage hardware ("dcsan"):

Dell MD3200i with dual controllers
each controller has 4 nics that are set up on individual subnets to match how the server nics are configured
One disk group set up as RAID10 across 8 physical 2tb, 7.2k rpm drives, with 7,430 gb total storage available ("Disk Group 0")
One disk group set up as RAID5 across 4 physical 600gb, 15k rpm drives, with 1,660 gb total storage available ("Disk Group 2")
MPIO is configured on each server node

Dell MDSM host mappings (see screenshot, actual host names changed for security):

The LUNs are available in Storage->Disks on each node as follows (LUN name in screenshot above, LUN Size, disk group, assigned to, Disk Number):

High-Speed-lun (HighSpeed1, 1.6 tb, Disk Group 2, Cluster Shared Volume, 4)
LUN_1 (Lun_1, 3.5tb, Disk Group 1, Cluster Shared Volume, 3)
LUN_2 (LUN_2, 3.5tb, Disk Group 1, Cluster Shared Volume, 3)
Quorum Witness (Cluster_Quorum, 520 mb, Disk Group 1, Disk Witness in Quorum, 1)

Cluster Roles:

approx 20-25 guest vms, majority running 2012 r2, with a few running ubuntu (14.04-18.04 os)

↧

Live Migrate fails with event 21502 (2019-->2016 host)

January 25, 2019, 8:43 am

≫ Next: Failover Cluster Manager bug on Server 2019 after .NET 4.8 installed - unable to type more than two characters in to the IP fields

≪ Previous: Windows 2012 r2 Cluster issues - Guest vms fail when one specific node hosts CSV

I have 2016 Functional level cluster with Server 2019 (basically in a process of replacing 2016 host with 2019)

If VM is running on 2019 host I can poweroff, quick migrate to 2016 host, power on & all is good

But Live migration always gives me above error

All I am getting in Event Data is (very descriptive?!):

Live migration of 'Virtual Machine Test' failed.

Nothing else, no reason.

If VM is running on 2016 host I CAN do live migration to 2019 fine! (albeit with errors reported in this thread, but I do NOT have VMM being used!)

vm\service\ethernet\vmethernetswitchutilities.cpp(124)\vmms.exe!00007FF7EA3C2030: (caller: 00007FF7EA40EC65) ReturnHr(138) tid(2980) 80070002 The system cannot find the file specified.
    Msg:[vm\service\ethernet\vmethernetswitchutilities.cpp(78)\vmms.exe!00007FF7EA423BE0: (caller: 00007FF7EA328FEE) Exception(7525) tid(2980) 80070002 The system cannot find the file specified.
]

Both host are IDENTICAL hardware on same firmware level of every component!

There is NOTHING relating to even attempting migration in local host Hyper-V VMMS/Admin/Operational logs

In Hyper-V High Availability/Admin I get same error but with Even ID 21111

Seb

I am wondering if it is easier to ditch 2019 & stick with 2016 for now

↧

Failover Cluster Manager bug on Server 2019 after .NET 4.8 installed - unable to type more than two characters in to the IP fields

September 20, 2019, 8:46 am

≪ Previous: Live Migrate fails with event 21502 (2019-->2016 host)

We ran into a nasty bug on Windows Server 2019 and I can't find any KB articles on it. It's really easy to replicate.

1. Install Windows Server 2019 Standard with Desktop Experience from an ISO.

2. Install Failover Cluster Services.

3. Create new cluster, on the 4th screen, add the current server name. This is what it shows:

cluster services working correctly before .NET 4.8 is installed

4. Install .NET 4.8 from an offline installer. (KB4486153) and reboot.

5. After the reboot, go back to the same screen of the same Create Cluster Wizard and now it looks different:

cluster services broken afte.NET 4.8 is installed - unable to put in a 3-digit IP

Now we are unable to type in a 3 digit IP in any of the octet fields. It accepts a maximum of two characters.

Has anyone else encountered this? It should be really easy to reproduce.

↧