Everyone knows one of the main issues in managing a DMVPN
network is dealing with fragmentation. Typically,
when calculating the tunnel MTU and MSS, we are under the
assumption that we are working with a network transport with normal MTU of 1500. However I’ve been seeing more cases of
Internet services being delivered to customers with lower than normal MTU.
Recently, I discovered a couple of sites in Europe where
an ISP delivered a DSL service with a backend MTU of 1444. This MTU was not disclosed to us and the MTU
between the router and service provider edge device was set at 1500, giving the
appearance of a normal working MTU. This site performed normally for a good amount of time until the “tunnel path-mtu discovery” command (or tunnel PMTUD) was enabled on the tunnel interface (we discovered later it was added accidentally during a maintenance period). This is when I was alerted about a debilitating performance issue affecting that site.
At first I thought that this tunnel PMTUD feature couldn't have possibly cause such an issue. However after researching and testing it, it made more sense on why things broke. I wanted to be able to share the experience in this post so others wouldn't get burned by it too.
At first I thought that this tunnel PMTUD feature couldn't have possibly cause such an issue. However after researching and testing it, it made more sense on why things broke. I wanted to be able to share the experience in this post so others wouldn't get burned by it too.
As a small disclaimer, the following is based on my own experience,
so I’m not saying this is a bad feature or to never use it. But my advice is to use it with caution
and enable it after fully understanding how the feature can affect your network.
Lab Environment
- Headend Router: Cisco 3945 (ISRG2) with IOS version 15.5(3)M2
- Branch Router: Cisco 2951 (ISRG2) with IOS version 15.5(3)M2
- Internet R01: Cisco 1801 with IOS version 12.4(15)T4
- Internet S01: Cisco 3560 with IOS version 12.2(55)SE5
Diagram & Topology
Technology Overview
When looking at what the Tunnel PMTUD feature is doing, it basically does two main things.
- Copy DF bit from the original IP packet to new GRE header
- Router listens for ICMP Unreachables with Fragmentation Needed and Don't Fragment Set (Type 3 Code 4)
On the other hand, when Tunnel PMTUD is not configured, the
default behavior of GRE is to not copy the DF bit from the original IP
header. This basically allows
fragmentation to occur when it encounters a path that has a MTU lower than
1500.
So with these two bits of information in mind, let’s run
through a few different scenarios. This
will show Tunnel PMTUD operation on a greater detail. All scenarios will run through a network with
a lower than normal MTU of 1300 in its path (see lab diagram).
- Traffic sent from Headend LAN Switch to Branch over the DMVPN tunnel with Tunnel PMTUD disabled.
- Traffic sent from Headend LAN Switch to Branch over the DMVPN tunnel with Tunnel PMTUD enabled and working.
- Traffic sent from Headend LAN Switch to Branch over the DMVPN tunnel with Tunnel PMTUD enabled and not working (ICMP unreachable blocked).
Scenario 1 (Tunnel PMTUD Disabled)
In this scenario, we do not have any tunnel PMTUD
configured. The tunnel MTU is set at
1400 and the test traffic uses at 1400 byte packet sent from Headend to Branch.
- Ping with size of 1400 and DF bit set from LAN Switch to Branch network. Packet makes it to destination without issue.
LAN_SWITCH#ping
10.100.100.254 size 1400 df
Type escape
sequence to abort.
Sending 5,
1400-byte ICMP Echos to 10.100.100.254, timeout is 2 seconds:
Packet sent
with the DF bit set
!!!!!
Success rate
is 100 percent (5/5), round-trip min/avg/max = 1/5/9 ms
|
- Headend router’s tunnel MTU is 1400 and shows no fragmentation.
HEADEND#sh ip
traffic | i Frag|frag
Frags: 0 reassembled, 0 timeouts, 0
couldn't reassemble
0 fragmented, 0 fragments, 0
couldn't fragment
|
- The Headend adds encryption and new GRE header with no DF bit. Based on the Cisco IPSec Overhead Calculator tool, the packet size is now 1480 bytes.
- Packet reaches router INTERNET_R01 and fragments to pass the MTU path of 1300. One fragment has a length of 1300 and other with 200. Note: Output below is showing only 1 of 5 packets for brevity.
INTERNET_R01#
* Jul 27 16:12:17.770: IP:
tableid=0, s=1.1.1.1 (Vlan100), d=3.3.3.1 (FastEthernet0), routed via FIB
* Jul 27 16:12:17.770: IP:
s=1.1.1.1 (Vlan100), d=3.3.3.1 (FastEthernet0), g=3.3.3.1, len 1480, forward,
proto=50
* Jul 27 16:12:17.770: IP:
s=1.1.1.1 (Vlan100), d=3.3.3.1 (FastEthernet0), len 1300, sending fragment
* Jul 27 16:12:17.770: IP Fragment, Ident = 40355, fragment
offset = 0, proto=50
* Jul 27 16:12:17.770: IP:
s=1.1.1.1 (Vlan100), d=3.3.3.1 (FastEthernet0), len 200, sending last fragment
* Jul 27 16:12:17.770: IP Fragment, Ident = 40355, fragment
offset = 1280
|
- IP traffic statistic shows 5 packets has been fragmented.
INTERNET_R01#sh
ip traffic | in Frag|frag
Frags: 0 reassembled, 0 timeouts, 0
couldn't reassemble
5 fragmented, 10 fragments, 0 couldn't fragment
|
- Branch router receives packets.
BRANCH#
Jul 27 16:20:11.634:
ICMP: echo reply sent, src 10.100.100.254, dst 10.1.1.253, topology BASE,
dscp 0 topoid 0
Jul 27 16:20:11.638:
ICMP: echo reply sent, src 10.100.100.254, dst 10.1.1.253, topology BASE,
dscp 0 topoid 0
Jul 27 16:20:11.642:
ICMP: echo reply sent, src 10.100.100.254, dst 10.1.1.253, topology BASE,
dscp 0 topoid 0
Jul 27 16:20:11.650:
ICMP: echo reply sent, src 10.100.100.254, dst 10.1.1.253, topology BASE,
dscp 0 topoid 0
Jul 27 16:20:11.654:
ICMP: echo reply sent, src 10.100.100.254, dst 10.1.1.253, topology BASE,
dscp 0 topoid 0
|
- Branch router reassembles fragmented packets.
BRANCH#sh ip
traffic | i Frag|frag
Frags: 5 reassembled, 0 timeouts, 0 couldn't reassemble
0 fragmented, 0 fragments, 0
couldn't fragment
|
Scenario 2 (Tunnel
PMTUD Enabled)
In this scenario, tunnel PMTUD is configured. The tunnel MTU is set at 1400 and the test traffic
uses at 1400 byte packet sent from Headend to Branch.
- Ping with size of 1400 and DF bit set from LAN Switch to Branch network. This fails.
LAN_SWITCH#p
10.100.100.254 si 1400 df
Type escape
sequence to abort.
Sending 5,
1400-byte ICMP Echos to 10.100.100.254, timeout is 2 seconds:
Packet sent
with the DF bit set
.M.M.
Success rate
is 0 percent (0/5)
|
- With Tunnel PMTUD enabled, the router is doing the following:
- Router sends an ICMP unreachable to itself to adjust MTU.
- Tunnel PMTUD process calculates the new MTU based on the configured policy and current MTU of 1400 and drops it down to 1334.
- Router sends an ICMP unreachable to the source telling it the new MTU is 1334.
HEADEND#
Jul 27 06:09:22.507:
ICMP: dst (3.3.3.1)
frag. needed and DF set unreachable sent to 1.1.1.1
Jul 27 06:09:22.507:
ICMP: dst (1.1.1.1)
frag. needed and DF set unreachable rcv from 1.1.1.1 mtu:1362
Jul 27 06:09:22.507:
Tunnel1: dest 3.3.3.1, received frag needed (mtu 1362), adjusting soft state MTU from 0 to
1334
Jul 27 06:09:22.507:
Tunnel1: tunnel endpoint for transport dest 3.3.3.1, change MTU from 0 to 1334
Jul 27 06:09:24.510:
ICMP: dst (10.100.100.254)
frag. needed and DF set unreachable sent to 10.1.1.253mtu:1334
Jul 27 06:09:26.514:
ICMP: dst (10.100.100.254)
frag. needed and DF set unreachable sent to 10.1.1.253mtu:1334
|
- When trying to figure out why the tunnel PMTUD process is using a value of 1334, the IPSec Overhead Calculator was very useful tool. We can now see why a payload size of 1334, after encryption and GRE, would yield a packet size is 1400.
- This begs the question on why PMTUD need to alter the MTU at this point? It seems inefficient to do so because we are clearly sending data that will fit the tunnel MTU of 1400. I can understand it if we sent a 1500 byte packet.
- Headend’s Tunnel interface sets new MTU of 1334 via the PMTUD feature.
HEADEND#sh
int tun1 | in Path
Path MTU Discovery, ager 10 mins, min MTU
92
Path destination 3.3.3.1: MTU 1334, expires
00:09:23
|
- Headend’s IP statistics shows that it dropped the packet because it couldn’t fragment it.
HEADEND#sh ip
traffic | i Frag|frag
Frags: 0 reassembled, 0 timeouts, 0
couldn't reassemble
0 fragmented, 0 fragments, 2 couldn't fragment
|
- The LAN Switch re-attempts to send traffic to the Branch site with the new MTU of 1334. It still fails because we have a MTU of 1300 somewhere in the path.
LAN_SWITCH#ping
10.100.100.254 size 1334 df
Type escape
sequence to abort.
Sending 5,
1334-byte ICMP Echos to 10.100.100.254, timeout is 2 seconds:
Packet sent
with the DF bit set
.M.M.
Success rate
is 0 percent (0/5)
|
- Internet router received a 1400 byte packet and sends an ICMP unreachable to source because it can’t pass the interface with MTU of 1300.
INTERNET_R01#
* Jul 27 22:59:31.764: IP:
tableid=0, s=1.1.1.1 (Vlan100), d=3.3.3.1 (FastEthernet0), routed via FIB
* Jul 27 22:59:31.764: IP:
s=1.1.1.1 (Vlan100), d=3.3.3.1 (FastEthernet0), g=3.3.3.1, len 1400, forward,
proto=50
* Jul 27 22:59:31.764: ICMP: dst (3.3.3.1) frag.
needed and DF set unreachable sent to 1.1.1.1
|
- Internet router drops packet because it couldn’t fragment.
INTERNET_R01#sh
ip traffic | i Frag|frag
Frags: 0 reassembled, 0 timeouts, 0
couldn't reassemble
0 fragmented, 0 fragments, 1 couldn't fragment
|
- This time the Headend’s Tunnel PMTU process is doing the following:
- Router receives an ICMP unreachable from the Internet router with the MTU of 1300.
- Router send an ICMP unreachable to itself to adjust the MTU.
- Tunnel PMTUD process calculates the new MTU based on MTU of 1250 and drops it down to 1222. I’m unsure how it got the 1250 value.
- Router sends a new ICMP unreachable to the source telling it to drop the MTU to 1222.
HEADEND#
Jul 27 23:24:01.752:
ICMP: dst (1.1.1.1)
frag. needed and DF set unreachable rcv from 1.0.0.1 mtu:1300
Jul 27 23:24:03.752:
ICMP: dst (3.3.3.1) frag.
needed and DF set unreachable sent to 1.1.1.1
Jul 27 23:24:03.752:
ICMP: dst (1.1.1.1) frag.
needed and DF set unreachable rcv from 1.1.1.1 mtu:1250
Jul 27 23:24:03.752:
Tunnel1: dest 3.3.3.1, received frag needed (mtu 1250), adjusting soft state MTU from 0 to
1222
Jul 27 23:24:03.752:
Tunnel1: tunnel endpoint for transport dest 3.3.3.1, change MTU from 0 to 1222
Jul 27 23:24:05.757:
ICMP: dst
(10.100.100.254) frag. needed and DF set unreachable sent to
10.1.1.253mtu:1222
Jul 27 23:24:07.761:
ICMP: dst
(10.100.100.254) frag. needed and DF set unreachable sent to
10.1.1.253mtu:1222
|
- Headend’s Tunnel interface sets new MTU of 1222 via the PMTUD feature.
HEADEND#sh
int tun1 | In Pa
Path MTU Discovery, ager 10 mins, min MTU
92
Path destination 3.3.3.1: MTU 1222, expires
00:01:06
|
- When we use the IPSec Overhead Calculator with a payload size of 1222, after encryption and GRE, the packet size is 1288. Now this will fit over the 1300 MTU link.
Note: The Tunnel PMTUD
process must know the exact overhead calculations to be able to set the correct
MTU. As an example, I set the payload
size 1 byte higher and the total is now bigger than 1300.
- The LAN Switch re-attempts to send traffic to the Branch site with the new MTU of 1222. This time it succeeds.
LAN_SWITCH#p
10.100.100.254 size 1222 df
Type escape
sequence to abort.
Sending 5,
1222-byte ICMP Echos to 10.100.100.254, timeout is 2 seconds:
Packet sent
with the DF bit set
!!!!!
Success rate
is 100 percent (5/5), round-trip min/avg/max = 1/4/9 ms
|
INTERNET_R01#
* Jul 27 23:59:53.800: IP:
tableid=0, s=1.1.1.1 (Vlan100), d=3.3.3.1 (FastEthernet0), routed via FIB
* Jul 27 23:59:53.800: IP:
s=1.1.1.1 (Vlan100), d=3.3.3.1 (FastEthernet0), g=3.3.3.1, len 1288, forward
|
BRANCH#
Jul 27 23:55:37.862:
ICMP: echo reply sent, src 10.100.100.254, dst 10.1.1.253, topology BASE,
dscp 0 topoid 0
Jul 27 23:55:37.866:
ICMP: echo reply sent, src 10.100.100.254, dst 10.1.1.253, topology BASE,
dscp 0 topoid 0
Jul 27 23:55:37.870:
ICMP: echo reply sent, src 10.100.100.254, dst 10.1.1.253, topology BASE,
dscp 0 topoid 0
Jul 27 23:55:37.874:
ICMP: echo reply sent, src 10.100.100.254, dst 10.1.1.253, topology BASE,
dscp 0 topoid 0
Jul 27 23:55:37.882:
ICMP: echo reply sent, src 10.100.100.254, dst 10.1.1.253, topology BASE,
dscp 0 topoid 0
|
Scenario 3 (Tunnel
PMTUD Enabled, ICMP Unreachable Blocked)
In this scenario, tunnel PMTUD is configured but ICMP
unreachables are purposely blocked in the simulated Internet infrastructure. The tunnel MTU is set at 1400 and the test traffic
uses at 1400 byte packet sent from Headend to Branch.
To simulate a situation where the ICMP unreachables are blocked or lost within a network, an ACL is added to prevent any ICMP
unreachables from reaching the Headend router from the Internet router.
ip
access-list extended BLOCK_UNREACHABLES
deny
icmp any any unreachable log
permit ip any any
|
INET_SWITCH#conf
t
Enter
configuration commands, one per line.
End with CNTL/Z.
INET_SWITCH(config)#int
f0/5
INET_SWITCH(config-if)#ip
access-group BLOCK_UNREACHABLES in
INET_SWITCH(config-if)#end
|
- With the block in place, let see what happens now. We ping with size of 1400 and DF bit set from LAN Switch to Branch network. It fails.
LAN_SWITCH#ping
10.100.100.254 size 1400 df
Type escape
sequence to abort.
Sending 5,
1400-byte ICMP Echos to 10.100.100.254, timeout is 2 seconds:
Packet sent
with the DF bit set
.M.M.
Success rate
is 0 percent (0/5)
|
- With Tunnel PMTUD still enabled, the router is doing the following:
- Router sends an ICMP unreachable to itself to adjust MTU.
- Tunnel PMTUD process calculates the new MTU based on the configured policy and current MTU of 1400 and drops it down to 1334.
- Router sends an ICMP unreachable to the source telling it the new MTU is 1334.
HEADEND#
Jul 27 18:28:21.312:
ICMP: dst (3.3.3.1)
frag. needed and DF set unreachable sent to 1.1.1.1
Jul 27 18:28:21.312:
ICMP: dst (1.1.1.1)
frag. needed and DF set unreachable rcv from 1.1.1.1 mtu:1362
Jul 27 18:28:21.312:
Tunnel1: dest 3.3.3.1, received frag needed (mtu 1362), adjusting soft state MTU from 0 to
1334
Jul 27 18:28:21.312:
Tunnel1: tunnel endpoint for transport dest 3.3.3.1, change MTU from 0 to 1334
Jul 27 18:28:23.317:
ICMP: dst (10.100.100.254)
frag. needed and DF set unreachable sent to 10.1.1.253mtu:1334
Jul 27 18:28:25.322:
ICMP: dst (10.100.100.254)
frag. needed and DF set unreachable sent to 10.1.1.253mtu:1334
|
- Headend’s Tunnel interface sets new MTU of 1334 via the PMTUD feature.
HEADEND#sh
int tun1 | in Path
Path MTU Discovery, ager 10 mins, min MTU
92
Path destination 3.3.3.1: MTU 1334, expires
00:07:13
|
- The LAN Switch re-attempts to send traffic to the Branch site with the new MTU of 1334. It still fails because we have a MTU of 1300 somewhere in the path.
LAN_SWITCH#ping
10.100.100.254 si 1334 df
Type escape
sequence to abort.
Sending 5,
1334-byte ICMP Echos to 10.100.100.254, timeout is 2 seconds:
Packet sent
with the DF bit set
.....
Success rate
is 0 percent (0/5)
|
- The Internet router receives a 1400 byte packet. A MTU 1300 path exists so an ICMP unreachable is sent to the Headend router (1.1.1.1).
INTERNET_R01#
* Jul 27 18:21:02.661: IP:
tableid=0, s=1.1.1.1 (Vlan100), d=3.3.3.1 (FastEthernet0), routed via FIB
* Jul 27 18:21:02.661: IP:
s=1.1.1.1 (Vlan100), d=3.3.3.1 (FastEthernet0), g=3.3.3.1, len 1400, forward,
proto=50
* Jul 27 18:21:02.661: ICMP: dst (3.3.3.1) frag.
needed and DF set unreachable sent to 1.1.1.1
|
INET_SWITCH#
* Jul 27 20:20:36.813:
%SEC-6-IPACCESSLOGDP: list BLOCK_UNREACHABLES denied icmp 1.0.0.1 ->
1.1.1.1 (3/4), 1 packet
|
At this point the Headend router doesn’t see the ICMP unreachable message so it does nothing to react. It doesn’t know to drop the MTU down to 1300 and continues to use the MTU value of 1334 as previously set by the Tunnel PMTU feature. Without any further mechanisms to correct the MTU, the source will continue to send packets with a size of 1334 and will be dropped, which will create a serious network performance problem (due to TCP retransmits etc.) This scenario demonstrates firsthand that receiving the ICMP is key for this feature to work correctly.
Conclusion
Besides the fact that this tunnel PMTUD feature seems to be a bit chatty and inefficient at times, the main caveat I would like to mention here is that for this to work successfully in the practical sense, we have to rely on our Internet service providers to generate and forward ICMP unreachables. In my opinion, that’s a tall order. We frequently hear of ISPs blocking ICMP’s for security reasons and with most common
deployments of DMVPN (or any IPSec VPN network) being created over the Internet, I think its a bad idea to use it in the first place.
The best practice, in my opinion, is to lower tunnel MTU and
re-adjust MSS for sites that have an Internet service with a lower than normal
MTU. It’s a quick and easy fix. Otherwise, if there a specific need for this
feature, you just have to be aware of your network environment and ensure the
ICMP unreachable prerequisite is met.
No comments:
Post a Comment