Showing posts with label Enterprise. Show all posts
Showing posts with label Enterprise. Show all posts

Saturday, June 2, 2018

Cisco 5508 WLC HA Datacenter Migration Notes



Recently I had a need to relocate a redundant pair of Cisco 5508 Wireless LAN Controllers (WLC) from one Datacenter to another.  The HA/SSO pair of WLCs were servicing an entire geographical region with about 400+ APs associated to them.  The requirement was to have minimal to no downtime during the relocation process, so moving both controllers at the same time wasn’t an option.  The notes below were based on the migration procedures I took to accomplish this.  I’m sure there were plenty of ways to do this, but this worked for me.  Hopefully it can aid in others trying to do the same…


Prerequisite Information

Licensing (Base vs. Adder)

Before starting, I would recommend doing a little research on the WLCs’ licenses.  This may determine how and which device gets relocated first.  For example, if the primary WLC in the HA has the permanent license to cover the number of APs in your environment, then it might be easier to relocate that WLC first (the HA secondary unit will inherit the license so that unit will continue to function).  Personally I would check CCO’s licensing page to ensure which of the WLC’s serial number is registered to.

Also, please be aware of the differences between Base licensing vs. Capacity Adder licensing.  Base licensing is acquired upon the purchase of the device, whereas Capacity Adder licenses are purchased as an upgrade to the Base licensing, adding to the Base AP count.

My understanding of Base licensing in the context of deploying HA is that if you have two WLCs with a 50 count AP license, the end AP count after creating the HA/SSO pair is 50.  Base licenses do not aggregate in HA so if you have a requirement to support 100 APs, then a 50 AP Upgrade Adder license would be necessary.


Here are links to documents explaining the Base vs. Capacity Adder type licenses.



HA SKU Licensing

There is also a third license type that can be purchased with a specific intent to only be used as HA.  It has a specific “HA” license (e.g., AIR-CT5508-HA-K9) that has a 0 AP count but will inherit the active WLC's license upon a failure.  This “failover” WLC will fully function and will give a 90 day grace period to repair or RMA the failed active unit.  After 90 days, the HA WLC will alert but will continue to function.  This article does a good job explaining the specifics of a WLC running an HA-SKU license.



My Setup (before migration)

  • 2 Cisco 5508 WLCs connected in HA/SSO located in DC “A”.
  • WLC1 is the Active unit and WLC2 is the Standby unit.
  • Both WLCs have 100 AP Base licenses (AIR-CT5508-100-K9).
  • Both WLCs are running version 8.3.141.0.
  • WLC2’s serial number had the upgraded “adder” license registered with Cisco.  The total AP count I had was 500.

Task
  • Relocate the 2 Cisco 5508 WLCs to DC “Z”.


Migration Process
  • In DC “A” powered off WLC2.  WLC1 continued to operate without any issues but just reported the Standby unit was down.
  • Disconnected all network cables (LAN and Redundancy Port), except the Console connection on WLC2.
  • Powered WLC2 back on.
  • Once WLC2 booted, checked redundancy state and disabled SSO.  The WLC rebooted again.
(Cisco Controller) >show redundancy summary 
            Redundancy Mode = SSO ENABLED 
                Local State = ACTIVE 
                 Peer State = UNKNOWN - Communication Down 
                       Unit = Secondary (Inherited AP License Count = 500)
                    Unit ID = C8:9C:1D:xx:xx:xx
           Redundancy State = Non Redundant
               Mobility MAC = 50:3D:E5:xx:xx:xx
            BulkSync Status = Pending


(Cisco Controller) >config redundancy mode disable 


All unsaved configuration will be saved.
And the system will be reset. Are you sure? (y/n)y


(Cisco Controller) >
Saving the configuration...

Configuration Saved!
System will now reboot!
Creating license client restartability thread

Exit Called
Switchdrvr exited!
Restarting system.

  • After the reboot, WLC2 was reconfigured to be “Primary”.
(Cisco Controller) >config redundancy unit primary 

(Cisco Controller) >show redundancy summary 
 Redundancy Mode = SSO DISABLED 
     Local State = ACTIVE 
      Peer State = N/A 
            Unit = Primary
         Unit ID = C8:9C:1D:xx:xx:xx
Redundancy State = N/A 
    Mobility MAC = 50:3D:E5:xx:xx:xx 

  • Saved the configuration one last time and powered off, then shipped WLC2 to new location (DC “Z”).
  • Once the device arrived at DC “Z”, WLC2 was racked and all physical network connections were made.
    • Note: During my WLC installation at DC “Z”, I opted to connect the Redundancy Port (RP) to a L2 infrastructure switch instead of a back to back connection.  Since this DC was a remote location, and was forced to use a facility provided onsite technician, I had everything pre-connected to switches I had control over. This way I was able to perform the HA connection (i.e., no shut RP port) without the need to schedule the local resource, which would have been time consuming and disruptive to my schedule. Connecting the RP port to a L2 switch is supported on 7.5 or later code as explained in this document.
  • From WLC2’s console, its Management IP and Default Gateway were changed based on the assigned WLAN VLAN in DC “Z”.  Made sure the new IP was reachable on the network and able to access the WebUI etc.
  • WLC2's hostname was changed and any other parameters based on its new location.
  • Checked license status again.  Since this box did not have the 500 AP count permanent license (was registered to WLC1’s serial), this WLC’s license defaulted back to the base license of 100 APs.  This was not good considering that 400+ APs needed to be migrated.
  • Forced to enable the 500 AP count evaluation license.  Rebooted the WLC to commit the change.
  • Performed the AP migration.
    • Changed DHCP server's option 43 to inform the APs the new controller’s IP address (changed the HEX value as explained in this document.)

    • For all registered APs on WLC1 in DC A, any references to primary, secondary or tertiary controller’s name or IP in the HA section were removed.

    • APs were resetted/rebooted.
  • Verified all 400+ APs were registered to DC Z’s WLC (was WLC2) and APs were functioning without issues.  At this point DC A’s WLC can be reconfigured and shipped.
  • Performed a backup of DC Z’s WLC configuration (configured as Primary/Standalone).
  • Copied this configuration to DC A’s WLC and rebooted (First had to disable SSO, since it was the Primary HA unit).  Made sure the new configuration took.
    • Note: This step was done because of my misstep with the licensing.  Since this WLC had the 500 count permanent AP license, this unit had to be the primary in the HA.  The plan was to configure this unit as the new primary HA (with SSO enabled) and to swap it out with the existing WLC when it arrived at DC Z.  The existing WLC would then be reconfigured as the secondary HA with SSO.
  • Configured the Redundancy Management and Peer Redundancy Management IP addresses on DC A’s WLC, enabled SSO and rebooted (Remember that DC Z’s WLC was configured as Primary/Standalone).
  • After the reload, the redundancy status was checked to ensure it was Primary with SSO enabled.
(Cisco Controller) >show redundancy summary 
            Redundancy Mode = SSO ENABLED 
                Local State = ACTIVE 
                 Peer State = UNKNOWN - Communication Down 
                       Unit = Primary
                    Unit ID = 50:3D:E5:xx:xx:xx
           Redundancy State = Non Redundant
               Mobility MAC = 50:3D:E5:xx:xx:xx
            BulkSync Status = Pending
  • Saved configuration, Powered off and shipped DC A’s WLC to DC Z.
  • Installed DC A’s WLC and made all physical network connections. All ports were kept in VLAN 1 so those ports could be enabled without it being on the “network”.  Used CDP to verify the WLC's ports to switchport mapping were correct.  Once everything checked out, the switchports were shutdown.
  • From the existing DC Z’s WLC, the Redundancy Management and Peer Redundancy Management IP were changed to the IP that the "secondary" unit should have.  Saved configuration.
  • DC Z’s WLC was configured to be the secondary HA and enabled SSO.  The WLC saved the configuration and rebooted.  During this reboot, the switchports were shutdown to disable it on the network (including RP port connected to another L2 switch).
  • At this point, the WLCs were swapped by enabling DC A’s WLC back on the network.  Verified it was reachable on the network.  The RP port was still shutdown.  I ensured that this WLC was the unit with the 500 count permanent AP license.
  • Verified that all the APs were re-registering to this WLC. This took about 30 mins.
    • Note: At this point, since the WLCs were swapped, there was only a momentary blip of downtime.  Any FlexConnect locally switched WLANs would continue to function without issue, however any centrally switched WLANs would be disrupted.
  • After DC Z’s WLC rebooted, the redundancy state was re-verified as Secondary HA with SSO.  After that checked out, this WLC was rebooted again for good measure.  While it was rebooting, its switchports were reconfigured out of VLAN1 (as it was before), the appropriate trunk and VLANs were configured, and finally enabled on the network (including the RP ports for both WLCs).
  • Watched the console on each WLC and verified the primary and secondary negotiation process and bulk configuration sync took place.
  • Verified that after the negotiation process, the redundancy states were satisfactory.  From the primary WLC, once the peer state was showing “standby hot” and bulk configuration sync complete, the HA was fully enabled.

(Cisco Controller) >show redundancy sum
            Redundancy Mode = SSO ENABLED 
                Local State = ACTIVE 
                 Peer State = STANDBY HOT 
                       Unit = Primary
                    Unit ID = 50:3D:E5:xx:xx:xx
           Redundancy State = SSO
               Mobility MAC = 50:3D:E5:xx:xx:xx
            BulkSync Status = Complete
Average Redundancy Peer Reachability Latency = 457 Micro Seconds
Average Management Gateway Reachability Latency = 953 Micro Seconds

  • Initiated SSO testing by performing a “redundancy force-switchover” via the CLI.  Performed this on the primary first to see if the secondary took over without issues.
(Cisco Controller) >redundancy force-switchover 

This will reload the active unit and force a switch of activity. Are you sure? (y/N) y

System will now restart! Creating license client restartability thread

Exit Called
Switchdrvr exited!
Restarting system.
  • CLI view from secondary WLC.
(Cisco Controller) >
Blocked: Configurations blocked as standby WLC is still booting up.
         You will be notified once configurations are Unblocked

Unblocked: Configurations are allowed now...

(Cisco Controller) >show redundancy summary 

            Redundancy Mode = SSO ENABLED 
                Local State = ACTIVE 
                 Peer State = STANDBY HOT 
                       Unit = Secondary (Inherited AP License Count = 500)
                    Unit ID = C8:9C:1D:xx:xx:xx
           Redundancy State = SSO
               Mobility MAC = 50:3D:E5:xx:xx:xx
            BulkSync Status = In-Progress
Average Redundancy Peer Reachability Latency = 519 Micro Seconds
Average Management Gateway Reachability Latency = 750 Micro Seconds

  • Performed the same “redundancy force-switchover” on secondary WLC to test HA on that unit and to preempt the roles.

Lessons Learned
  • Again, my lack of research of the WLC licensing made this process a little more complicated that it needed to be.  If I just relocated the WLC with the permanent license first, I wouldn't have needed to "swap" the primary WLCs when joining the HA.  I could have simply configured the 2nd relocated WLC as secondary and joined the HA without any downtime.
  • If the WLC's software needs to be upgraded, this might be a good time to do this.  However research must be done to ensure that the new version doesn't conflict with existing APs etc. (i.e., make sure the new software version supports all of your APs).


Saturday, July 29, 2017

Tunnel Path MTU Discovery in a DMVPN Network: Use with Caution

Everyone knows one of the main issues in managing a DMVPN network is dealing with fragmentation.  Typically, when calculating the tunnel MTU and MSS, we are under the assumption that we are working with a network transport with normal MTU of 1500.  However I’ve been seeing more cases of Internet services being delivered to customers with lower than normal MTU.

Recently, I discovered a couple of sites in Europe where an ISP delivered a DSL service with a backend MTU of 1444.  This MTU was not disclosed to us and the MTU between the router and service provider edge device was set at 1500, giving the appearance of a normal working MTU.  This site performed normally for a good amount of time until the “tunnel path-mtu discovery” command (or tunnel PMTUD) was enabled on the tunnel interface (we discovered later it was added accidentally during a maintenance period). This is when I was alerted about a debilitating performance issue affecting that site.

At first I thought that this tunnel PMTUD feature couldn't have possibly cause such an issue. However after researching and testing it, it made more sense on why things broke.  I wanted to be able to share the experience in this post so others wouldn't get burned by it too.

As a small disclaimer, the following is based on my own experience, so I’m not saying this is a bad feature or to never use it.  But my advice is to use it with caution and enable it after fully understanding how the feature can affect your network.


Lab Environment

  • Headend Router: Cisco 3945 (ISRG2) with IOS version 15.5(3)M2
  • Branch Router: Cisco 2951 (ISRG2) with IOS version 15.5(3)M2
  • Internet R01: Cisco 1801 with IOS version 12.4(15)T4
  • Internet S01: Cisco 3560 with IOS version 12.2(55)SE5

Diagram & Topology



Technology Overview

When looking at what the Tunnel PMTUD feature is doing, it basically does two main things.

  • Copy DF bit from the original IP packet to new GRE header
  • Router listens for ICMP Unreachables with Fragmentation Needed and Don't Fragment Set (Type 3 Code 4)



On the other hand, when Tunnel PMTUD is not configured, the default behavior of GRE is to not copy the DF bit from the original IP header.  This basically allows fragmentation to occur when it encounters a path that has a MTU lower than 1500.


So with these two bits of information in mind, let’s run through a few different scenarios.  This will show Tunnel PMTUD operation on a greater detail.  All scenarios will run through a network with a lower than normal MTU of 1300 in its path (see lab diagram).


  1. Traffic sent from Headend LAN Switch to Branch over the DMVPN tunnel with Tunnel PMTUD disabled.
  2. Traffic sent from Headend LAN Switch to Branch over the DMVPN tunnel with Tunnel PMTUD enabled and working.
  3. Traffic sent from Headend LAN Switch to Branch over the DMVPN tunnel with Tunnel PMTUD enabled and not working (ICMP unreachable blocked).

Scenario 1 (Tunnel PMTUD Disabled)

In this scenario, we do not have any tunnel PMTUD configured.  The tunnel MTU is set at 1400 and the test traffic uses at 1400 byte packet sent from Headend to Branch.

  • Ping with size of 1400 and DF bit set from LAN Switch to Branch network.  Packet makes it to destination without issue.
LAN_SWITCH#ping 10.100.100.254 size 1400 df

Type escape sequence to abort.
Sending 5, 1400-byte ICMP Echos to 10.100.100.254, timeout is 2 seconds:
Packet sent with the DF bit set
!!!!!
Success rate is 100 percent (5/5), round-trip min/avg/max = 1/5/9 ms

  • Headend router’s tunnel MTU is 1400 and shows no fragmentation.
HEADEND#sh ip traffic | i Frag|frag
  Frags: 0 reassembled, 0 timeouts, 0 couldn't reassemble
         0 fragmented, 0 fragments, 0 couldn't fragment






  • Packet reaches router INTERNET_R01 and fragments to pass the MTU path of 1300.  One fragment has a length of 1300 and other with 200. Note: Output below is showing only 1 of 5 packets for brevity.
INTERNET_R01#
* Jul 27 16:12:17.770: IP: tableid=0, s=1.1.1.1 (Vlan100), d=3.3.3.1 (FastEthernet0), routed via FIB
* Jul 27 16:12:17.770: IP: s=1.1.1.1 (Vlan100), d=3.3.3.1 (FastEthernet0), g=3.3.3.1, len 1480, forward, proto=50
* Jul 27 16:12:17.770: IP: s=1.1.1.1 (Vlan100), d=3.3.3.1 (FastEthernet0), len 1300, sending fragment
* Jul 27 16:12:17.770:     IP Fragment, Ident = 40355, fragment offset = 0, proto=50
* Jul 27 16:12:17.770: IP: s=1.1.1.1 (Vlan100), d=3.3.3.1 (FastEthernet0), len 200, sending last fragment
* Jul 27 16:12:17.770:     IP Fragment, Ident = 40355, fragment offset = 1280


  • IP traffic statistic shows 5 packets has been fragmented.
INTERNET_R01#sh ip traffic | in Frag|frag
  Frags: 0 reassembled, 0 timeouts, 0 couldn't reassemble
         5 fragmented, 10 fragments, 0 couldn't fragment


  • Branch router receives packets.
BRANCH#
Jul 27 16:20:11.634: ICMP: echo reply sent, src 10.100.100.254, dst 10.1.1.253, topology BASE, dscp 0 topoid 0
Jul 27 16:20:11.638: ICMP: echo reply sent, src 10.100.100.254, dst 10.1.1.253, topology BASE, dscp 0 topoid 0
Jul 27 16:20:11.642: ICMP: echo reply sent, src 10.100.100.254, dst 10.1.1.253, topology BASE, dscp 0 topoid 0
Jul 27 16:20:11.650: ICMP: echo reply sent, src 10.100.100.254, dst 10.1.1.253, topology BASE, dscp 0 topoid 0
Jul 27 16:20:11.654: ICMP: echo reply sent, src 10.100.100.254, dst 10.1.1.253, topology BASE, dscp 0 topoid 0


  • Branch router reassembles fragmented packets.

BRANCH#sh ip traffic | i Frag|frag
  Frags: 5 reassembled, 0 timeouts, 0 couldn't reassemble
         0 fragmented, 0 fragments, 0 couldn't fragment



Scenario 2 (Tunnel PMTUD Enabled)

In this scenario, tunnel PMTUD is configured.  The tunnel MTU is set at 1400 and the test traffic uses at 1400 byte packet sent from Headend to Branch.

  • Ping with size of 1400 and DF bit set from LAN Switch to Branch network.  This fails.
LAN_SWITCH#p 10.100.100.254 si 1400 df

Type escape sequence to abort.
Sending 5, 1400-byte ICMP Echos to 10.100.100.254, timeout is 2 seconds:
Packet sent with the DF bit set
.M.M.
Success rate is 0 percent (0/5)


  • With Tunnel PMTUD enabled, the router is doing the following:
    • Router sends an ICMP unreachable to itself to adjust MTU.
    • Tunnel PMTUD process calculates the new MTU based on the configured policy and current MTU of 1400 and drops it down to 1334.
    • Router sends an ICMP unreachable to the source telling it the new MTU is 1334.
HEADEND#
Jul 27 06:09:22.507: ICMP: dst (3.3.3.1) frag. needed and DF set unreachable sent to 1.1.1.1
Jul 27 06:09:22.507: ICMP: dst (1.1.1.1) frag. needed and DF set unreachable rcv from 1.1.1.1 mtu:1362
Jul 27 06:09:22.507: Tunnel1: dest 3.3.3.1, received frag needed (mtu 1362), adjusting soft state MTU from 0 to 1334
Jul 27 06:09:22.507: Tunnel1: tunnel endpoint for transport dest 3.3.3.1, change MTU from 0 to 1334
Jul 27 06:09:24.510: ICMP: dst (10.100.100.254) frag. needed and DF set unreachable sent to 10.1.1.253mtu:1334
Jul 27 06:09:26.514: ICMP: dst (10.100.100.254) frag. needed and DF set unreachable sent to 10.1.1.253mtu:1334


  • When trying to figure out why the tunnel PMTUD process is using a value of 1334, the IPSec Overhead Calculator was very useful tool.  We can now see why a payload size of 1334, after encryption and GRE, would yield a packet size is 1400.
  • This begs the question on why PMTUD need to alter the MTU at this point?  It seems inefficient to do so because we are clearly sending data that will fit the tunnel MTU of 1400.  I can understand it if we sent a 1500 byte packet.


  • Headend’s Tunnel interface sets new MTU of 1334 via the PMTUD feature.
HEADEND#sh int tun1 | in Path        
  Path MTU Discovery, ager 10 mins, min MTU 92
  Path destination 3.3.3.1: MTU 1334, expires 00:09:23

  • Headend’s IP statistics shows that it dropped the packet because it couldn’t fragment it.
HEADEND#sh ip traffic | i Frag|frag
  Frags: 0 reassembled, 0 timeouts, 0 couldn't reassemble
         0 fragmented, 0 fragments, 2 couldn't fragment

  • The LAN Switch re-attempts to send traffic to the Branch site with the new MTU of 1334.  It still fails because we have a MTU of 1300 somewhere in the path.
LAN_SWITCH#ping 10.100.100.254 size 1334 df

Type escape sequence to abort.
Sending 5, 1334-byte ICMP Echos to 10.100.100.254, timeout is 2 seconds:
Packet sent with the DF bit set
.M.M.
Success rate is 0 percent (0/5)

  • Internet router received a 1400 byte packet and sends an ICMP unreachable to source because it can’t pass the interface with MTU of 1300.
INTERNET_R01#
* Jul 27 22:59:31.764: IP: tableid=0, s=1.1.1.1 (Vlan100), d=3.3.3.1 (FastEthernet0), routed via FIB
* Jul 27 22:59:31.764: IP: s=1.1.1.1 (Vlan100), d=3.3.3.1 (FastEthernet0), g=3.3.3.1, len 1400, forward, proto=50
* Jul 27 22:59:31.764: ICMP: dst (3.3.3.1) frag. needed and DF set unreachable sent to 1.1.1.1

  • Internet router drops packet because it couldn’t fragment.
INTERNET_R01#sh ip traffic | i Frag|frag
  Frags: 0 reassembled, 0 timeouts, 0 couldn't reassemble
         0 fragmented, 0 fragments, 1 couldn't fragment

  • This time the Headend’s Tunnel PMTU process is doing the following:
    • Router receives an ICMP unreachable from the Internet router with the MTU of 1300.
    • Router send an ICMP unreachable to itself to adjust the MTU.
    • Tunnel PMTUD process calculates the new MTU based on MTU of 1250 and drops it down to 1222.  I’m unsure how it got the 1250 value.
    • Router sends a new ICMP unreachable to the source telling it to drop the MTU to 1222.
HEADEND#
Jul 27 23:24:01.752: ICMP: dst (1.1.1.1) frag. needed and DF set unreachable rcv from 1.0.0.1 mtu:1300
Jul 27 23:24:03.752: ICMP: dst (3.3.3.1) frag. needed and DF set unreachable sent to 1.1.1.1
Jul 27 23:24:03.752: ICMP: dst (1.1.1.1) frag. needed and DF set unreachable rcv from 1.1.1.1 mtu:1250
Jul 27 23:24:03.752: Tunnel1: dest 3.3.3.1, received frag needed (mtu 1250), adjusting soft state MTU from 0 to 1222
Jul 27 23:24:03.752: Tunnel1: tunnel endpoint for transport dest 3.3.3.1, change MTU from 0 to 1222
Jul 27 23:24:05.757: ICMP: dst (10.100.100.254) frag. needed and DF set unreachable sent to 10.1.1.253mtu:1222
Jul 27 23:24:07.761: ICMP: dst (10.100.100.254) frag. needed and DF set unreachable sent to 10.1.1.253mtu:1222

  • Headend’s Tunnel interface sets new MTU of 1222 via the PMTUD feature.

HEADEND#sh int tun1 | In Pa       
  Path MTU Discovery, ager 10 mins, min MTU 92
  Path destination 3.3.3.1: MTU 1222, expires 00:01:06

  • When we use the IPSec Overhead Calculator with a payload size of 1222, after encryption and GRE, the packet size is 1288.  Now this will fit over the 1300 MTU link.



Note: The Tunnel PMTUD process must know the exact overhead calculations to be able to set the correct MTU.  As an example, I set the payload size 1 byte higher and the total is now bigger than 1300.



  • The LAN Switch re-attempts to send traffic to the Branch site with the new MTU of 1222.  This time it succeeds.
LAN_SWITCH#p 10.100.100.254 size 1222 df

Type escape sequence to abort.
Sending 5, 1222-byte ICMP Echos to 10.100.100.254, timeout is 2 seconds:
Packet sent with the DF bit set
!!!!!
Success rate is 100 percent (5/5), round-trip min/avg/max = 1/4/9 ms

  •  Internet router with MTU 1300 receives the post encrypted packet of 1288 and forwards it (showing only 1 of 5 packets).
INTERNET_R01#
* Jul 27 23:59:53.800: IP: tableid=0, s=1.1.1.1 (Vlan100), d=3.3.3.1 (FastEthernet0), routed via FIB
* Jul 27 23:59:53.800: IP: s=1.1.1.1 (Vlan100), d=3.3.3.1 (FastEthernet0), g=3.3.3.1, len 1288, forward

  •  Branch router receives packet and replies.
BRANCH#
Jul 27 23:55:37.862: ICMP: echo reply sent, src 10.100.100.254, dst 10.1.1.253, topology BASE, dscp 0 topoid 0
Jul 27 23:55:37.866: ICMP: echo reply sent, src 10.100.100.254, dst 10.1.1.253, topology BASE, dscp 0 topoid 0
Jul 27 23:55:37.870: ICMP: echo reply sent, src 10.100.100.254, dst 10.1.1.253, topology BASE, dscp 0 topoid 0
Jul 27 23:55:37.874: ICMP: echo reply sent, src 10.100.100.254, dst 10.1.1.253, topology BASE, dscp 0 topoid 0
Jul 27 23:55:37.882: ICMP: echo reply sent, src 10.100.100.254, dst 10.1.1.253, topology BASE, dscp 0 topoid 0



Scenario 3 (Tunnel PMTUD Enabled, ICMP Unreachable Blocked)

In this scenario, tunnel PMTUD is configured but ICMP unreachables are purposely blocked in the simulated Internet infrastructure.  The tunnel MTU is set at 1400 and the test traffic uses at 1400 byte packet sent from Headend to Branch.

To simulate a situation where the ICMP unreachables are blocked or lost within a network, an ACL is added to prevent any ICMP unreachables from reaching the Headend router from the Internet router.


ip access-list extended BLOCK_UNREACHABLES
 deny   icmp any any unreachable log
 permit ip any any


INET_SWITCH#conf t
Enter configuration commands, one per line.  End with CNTL/Z.
INET_SWITCH(config)#int f0/5                                
INET_SWITCH(config-if)#ip access-group BLOCK_UNREACHABLES in
INET_SWITCH(config-if)#end



  • With the block in place, let see what happens now.  We ping with size of 1400 and DF bit set from LAN Switch to Branch network.  It fails.
LAN_SWITCH#ping 10.100.100.254 size 1400 df

Type escape sequence to abort.
Sending 5, 1400-byte ICMP Echos to 10.100.100.254, timeout is 2 seconds:
Packet sent with the DF bit set
.M.M.
Success rate is 0 percent (0/5)

  • With Tunnel PMTUD still enabled, the router is doing the following:
    • Router sends an ICMP unreachable to itself to adjust MTU.
    • Tunnel PMTUD process calculates the new MTU based on the configured policy and current MTU of 1400 and drops it down to 1334.
    • Router sends an ICMP unreachable to the source telling it the new MTU is 1334.
HEADEND#
Jul 27 18:28:21.312: ICMP: dst (3.3.3.1) frag. needed and DF set unreachable sent to 1.1.1.1
Jul 27 18:28:21.312: ICMP: dst (1.1.1.1) frag. needed and DF set unreachable rcv from 1.1.1.1 mtu:1362
Jul 27 18:28:21.312: Tunnel1: dest 3.3.3.1, received frag needed (mtu 1362), adjusting soft state MTU from 0 to 1334
Jul 27 18:28:21.312: Tunnel1: tunnel endpoint for transport dest 3.3.3.1, change MTU from 0 to 1334
Jul 27 18:28:23.317: ICMP: dst (10.100.100.254) frag. needed and DF set unreachable sent to 10.1.1.253mtu:1334
Jul 27 18:28:25.322: ICMP: dst (10.100.100.254) frag. needed and DF set unreachable sent to 10.1.1.253mtu:1334

  • Headend’s Tunnel interface sets new MTU of 1334 via the PMTUD feature.
HEADEND#sh int tun1 | in Path      
  Path MTU Discovery, ager 10 mins, min MTU 92
  Path destination 3.3.3.1: MTU 1334, expires 00:07:13

  • The LAN Switch re-attempts to send traffic to the Branch site with the new MTU of 1334.  It still fails because we have a MTU of 1300 somewhere in the path.
LAN_SWITCH#ping 10.100.100.254 si 1334 df

Type escape sequence to abort.
Sending 5, 1334-byte ICMP Echos to 10.100.100.254, timeout is 2 seconds:
Packet sent with the DF bit set
.....
Success rate is 0 percent (0/5)

  • The Internet router receives a 1400 byte packet.  A MTU 1300 path exists so an ICMP unreachable is sent to the Headend router (1.1.1.1).
INTERNET_R01#
* Jul 27 18:21:02.661: IP: tableid=0, s=1.1.1.1 (Vlan100), d=3.3.3.1 (FastEthernet0), routed via FIB
* Jul 27 18:21:02.661: IP: s=1.1.1.1 (Vlan100), d=3.3.3.1 (FastEthernet0), g=3.3.3.1, len 1400, forward, proto=50
* Jul 27 18:21:02.661: ICMP: dst (3.3.3.1) frag. needed and DF set unreachable sent to 1.1.1.1

  •  The Internet Switch blocks the ICMP unreachable per the ACL.
INET_SWITCH#
* Jul 27 20:20:36.813: %SEC-6-IPACCESSLOGDP: list BLOCK_UNREACHABLES denied icmp 1.0.0.1 -> 1.1.1.1 (3/4), 1 packet


At this point the Headend router doesn’t see the ICMP unreachable message so it does nothing to react.  It doesn’t know to drop the MTU down to 1300 and continues to use the MTU value of 1334 as previously set by the Tunnel PMTU feature.  Without any further mechanisms to correct the MTU, the source will continue to send packets with a size of 1334 and will be dropped, which will create a serious network performance problem (due to TCP retransmits etc.)  This scenario demonstrates firsthand that receiving the ICMP is key for this feature to work correctly.




Conclusion

Besides the fact that this tunnel PMTUD feature seems to be a bit chatty and inefficient at times, the main caveat I would like to mention here is that for this to work successfully in the practical sense, we have to rely on our Internet service providers to generate and forward ICMP unreachables.  In my opinion, that’s a tall order.  We frequently hear of ISPs blocking ICMP’s for security reasons and with most common deployments of DMVPN (or any IPSec VPN network) being created over the Internet, I think its a bad idea to use it in the first place.

The best practice, in my opinion, is to lower tunnel MTU and re-adjust MSS for sites that have an Internet service with a lower than normal MTU.  It’s a quick and easy fix.  Otherwise, if there a specific need for this feature, you just have to be aware of your network environment and ensure the ICMP unreachable prerequisite is met.