Saturday, June 2, 2018

Cisco 5508 WLC HA Datacenter Migration Notes



Recently I had a need to relocate a redundant pair of Cisco 5508 Wireless LAN Controllers (WLC) from one Datacenter to another.  The HA/SSO pair of WLCs were servicing an entire geographical region with about 400+ APs associated to them.  The requirement was to have minimal to no downtime during the relocation process, so moving both controllers at the same time wasn’t an option.  The notes below were based on the migration procedures I took to accomplish this.  I’m sure there were plenty of ways to do this, but this worked for me.  Hopefully it can aid in others trying to do the same…


Prerequisite Information

Licensing (Base vs. Adder)

Before starting, I would recommend doing a little research on the WLCs’ licenses.  This may determine how and which device gets relocated first.  For example, if the primary WLC in the HA has the permanent license to cover the number of APs in your environment, then it might be easier to relocate that WLC first (the HA secondary unit will inherit the license so that unit will continue to function).  Personally I would check CCO’s licensing page to ensure which of the WLC’s serial number is registered to.

Also, please be aware of the differences between Base licensing vs. Capacity Adder licensing.  Base licensing is acquired upon the purchase of the device, whereas Capacity Adder licenses are purchased as an upgrade to the Base licensing, adding to the Base AP count.

My understanding of Base licensing in the context of deploying HA is that if you have two WLCs with a 50 count AP license, the end AP count after creating the HA/SSO pair is 50.  Base licenses do not aggregate in HA so if you have a requirement to support 100 APs, then a 50 AP Upgrade Adder license would be necessary.


Here are links to documents explaining the Base vs. Capacity Adder type licenses.



HA SKU Licensing

There is also a third license type that can be purchased with a specific intent to only be used as HA.  It has a specific “HA” license (e.g., AIR-CT5508-HA-K9) that has a 0 AP count but will inherit the active WLC's license upon a failure.  This “failover” WLC will fully function and will give a 90 day grace period to repair or RMA the failed active unit.  After 90 days, the HA WLC will alert but will continue to function.  This article does a good job explaining the specifics of a WLC running an HA-SKU license.



My Setup (before migration)

  • 2 Cisco 5508 WLCs connected in HA/SSO located in DC “A”.
  • WLC1 is the Active unit and WLC2 is the Standby unit.
  • Both WLCs have 100 AP Base licenses (AIR-CT5508-100-K9).
  • Both WLCs are running version 8.3.141.0.
  • WLC2’s serial number had the upgraded “adder” license registered with Cisco.  The total AP count I had was 500.

Task
  • Relocate the 2 Cisco 5508 WLCs to DC “Z”.


Migration Process
  • In DC “A” powered off WLC2.  WLC1 continued to operate without any issues but just reported the Standby unit was down.
  • Disconnected all network cables (LAN and Redundancy Port), except the Console connection on WLC2.
  • Powered WLC2 back on.
  • Once WLC2 booted, checked redundancy state and disabled SSO.  The WLC rebooted again.
(Cisco Controller) >show redundancy summary 
            Redundancy Mode = SSO ENABLED 
                Local State = ACTIVE 
                 Peer State = UNKNOWN - Communication Down 
                       Unit = Secondary (Inherited AP License Count = 500)
                    Unit ID = C8:9C:1D:xx:xx:xx
           Redundancy State = Non Redundant
               Mobility MAC = 50:3D:E5:xx:xx:xx
            BulkSync Status = Pending


(Cisco Controller) >config redundancy mode disable 


All unsaved configuration will be saved.
And the system will be reset. Are you sure? (y/n)y


(Cisco Controller) >
Saving the configuration...

Configuration Saved!
System will now reboot!
Creating license client restartability thread

Exit Called
Switchdrvr exited!
Restarting system.

  • After the reboot, WLC2 was reconfigured to be “Primary”.
(Cisco Controller) >config redundancy unit primary 

(Cisco Controller) >show redundancy summary 
 Redundancy Mode = SSO DISABLED 
     Local State = ACTIVE 
      Peer State = N/A 
            Unit = Primary
         Unit ID = C8:9C:1D:xx:xx:xx
Redundancy State = N/A 
    Mobility MAC = 50:3D:E5:xx:xx:xx 

  • Saved the configuration one last time and powered off, then shipped WLC2 to new location (DC “Z”).
  • Once the device arrived at DC “Z”, WLC2 was racked and all physical network connections were made.
    • Note: During my WLC installation at DC “Z”, I opted to connect the Redundancy Port (RP) to a L2 infrastructure switch instead of a back to back connection.  Since this DC was a remote location, and was forced to use a facility provided onsite technician, I had everything pre-connected to switches I had control over. This way I was able to perform the HA connection (i.e., no shut RP port) without the need to schedule the local resource, which would have been time consuming and disruptive to my schedule. Connecting the RP port to a L2 switch is supported on 7.5 or later code as explained in this document.
  • From WLC2’s console, its Management IP and Default Gateway were changed based on the assigned WLAN VLAN in DC “Z”.  Made sure the new IP was reachable on the network and able to access the WebUI etc.
  • WLC2's hostname was changed and any other parameters based on its new location.
  • Checked license status again.  Since this box did not have the 500 AP count permanent license (was registered to WLC1’s serial), this WLC’s license defaulted back to the base license of 100 APs.  This was not good considering that 400+ APs needed to be migrated.
  • Forced to enable the 500 AP count evaluation license.  Rebooted the WLC to commit the change.
  • Performed the AP migration.
    • Changed DHCP server's option 43 to inform the APs the new controller’s IP address (changed the HEX value as explained in this document.)

    • For all registered APs on WLC1 in DC A, any references to primary, secondary or tertiary controller’s name or IP in the HA section were removed.

    • APs were resetted/rebooted.
  • Verified all 400+ APs were registered to DC Z’s WLC (was WLC2) and APs were functioning without issues.  At this point DC A’s WLC can be reconfigured and shipped.
  • Performed a backup of DC Z’s WLC configuration (configured as Primary/Standalone).
  • Copied this configuration to DC A’s WLC and rebooted (First had to disable SSO, since it was the Primary HA unit).  Made sure the new configuration took.
    • Note: This step was done because of my misstep with the licensing.  Since this WLC had the 500 count permanent AP license, this unit had to be the primary in the HA.  The plan was to configure this unit as the new primary HA (with SSO enabled) and to swap it out with the existing WLC when it arrived at DC Z.  The existing WLC would then be reconfigured as the secondary HA with SSO.
  • Configured the Redundancy Management and Peer Redundancy Management IP addresses on DC A’s WLC, enabled SSO and rebooted (Remember that DC Z’s WLC was configured as Primary/Standalone).
  • After the reload, the redundancy status was checked to ensure it was Primary with SSO enabled.
(Cisco Controller) >show redundancy summary 
            Redundancy Mode = SSO ENABLED 
                Local State = ACTIVE 
                 Peer State = UNKNOWN - Communication Down 
                       Unit = Primary
                    Unit ID = 50:3D:E5:xx:xx:xx
           Redundancy State = Non Redundant
               Mobility MAC = 50:3D:E5:xx:xx:xx
            BulkSync Status = Pending
  • Saved configuration, Powered off and shipped DC A’s WLC to DC Z.
  • Installed DC A’s WLC and made all physical network connections. All ports were kept in VLAN 1 so those ports could be enabled without it being on the “network”.  Used CDP to verify the WLC's ports to switchport mapping were correct.  Once everything checked out, the switchports were shutdown.
  • From the existing DC Z’s WLC, the Redundancy Management and Peer Redundancy Management IP were changed to the IP that the "secondary" unit should have.  Saved configuration.
  • DC Z’s WLC was configured to be the secondary HA and enabled SSO.  The WLC saved the configuration and rebooted.  During this reboot, the switchports were shutdown to disable it on the network (including RP port connected to another L2 switch).
  • At this point, the WLCs were swapped by enabling DC A’s WLC back on the network.  Verified it was reachable on the network.  The RP port was still shutdown.  I ensured that this WLC was the unit with the 500 count permanent AP license.
  • Verified that all the APs were re-registering to this WLC. This took about 30 mins.
    • Note: At this point, since the WLCs were swapped, there was only a momentary blip of downtime.  Any FlexConnect locally switched WLANs would continue to function without issue, however any centrally switched WLANs would be disrupted.
  • After DC Z’s WLC rebooted, the redundancy state was re-verified as Secondary HA with SSO.  After that checked out, this WLC was rebooted again for good measure.  While it was rebooting, its switchports were reconfigured out of VLAN1 (as it was before), the appropriate trunk and VLANs were configured, and finally enabled on the network (including the RP ports for both WLCs).
  • Watched the console on each WLC and verified the primary and secondary negotiation process and bulk configuration sync took place.
  • Verified that after the negotiation process, the redundancy states were satisfactory.  From the primary WLC, once the peer state was showing “standby hot” and bulk configuration sync complete, the HA was fully enabled.

(Cisco Controller) >show redundancy sum
            Redundancy Mode = SSO ENABLED 
                Local State = ACTIVE 
                 Peer State = STANDBY HOT 
                       Unit = Primary
                    Unit ID = 50:3D:E5:xx:xx:xx
           Redundancy State = SSO
               Mobility MAC = 50:3D:E5:xx:xx:xx
            BulkSync Status = Complete
Average Redundancy Peer Reachability Latency = 457 Micro Seconds
Average Management Gateway Reachability Latency = 953 Micro Seconds

  • Initiated SSO testing by performing a “redundancy force-switchover” via the CLI.  Performed this on the primary first to see if the secondary took over without issues.
(Cisco Controller) >redundancy force-switchover 

This will reload the active unit and force a switch of activity. Are you sure? (y/N) y

System will now restart! Creating license client restartability thread

Exit Called
Switchdrvr exited!
Restarting system.
  • CLI view from secondary WLC.
(Cisco Controller) >
Blocked: Configurations blocked as standby WLC is still booting up.
         You will be notified once configurations are Unblocked

Unblocked: Configurations are allowed now...

(Cisco Controller) >show redundancy summary 

            Redundancy Mode = SSO ENABLED 
                Local State = ACTIVE 
                 Peer State = STANDBY HOT 
                       Unit = Secondary (Inherited AP License Count = 500)
                    Unit ID = C8:9C:1D:xx:xx:xx
           Redundancy State = SSO
               Mobility MAC = 50:3D:E5:xx:xx:xx
            BulkSync Status = In-Progress
Average Redundancy Peer Reachability Latency = 519 Micro Seconds
Average Management Gateway Reachability Latency = 750 Micro Seconds

  • Performed the same “redundancy force-switchover” on secondary WLC to test HA on that unit and to preempt the roles.

Lessons Learned
  • Again, my lack of research of the WLC licensing made this process a little more complicated that it needed to be.  If I just relocated the WLC with the permanent license first, I wouldn't have needed to "swap" the primary WLCs when joining the HA.  I could have simply configured the 2nd relocated WLC as secondary and joined the HA without any downtime.
  • If the WLC's software needs to be upgraded, this might be a good time to do this.  However research must be done to ensure that the new version doesn't conflict with existing APs etc. (i.e., make sure the new software version supports all of your APs).


No comments:

Post a Comment