The test conducted below was my attempt to observe and
document EVPN’s all-active load balancing capabilities under normal operating
conditions. By examining the BGP routing and Wireshark
traces, my objective was to get a detailed understanding
of how the MPLS labels were exchanged and used in an EVPN network to achieve the load balancing
behavior.
Test Plan
- Simulate traffic flow from CE_R27 to CE_R29
- Observe interface statistics to determine traffic routing
- Observe BGP routing and label exchange
- Perform targeted packet captures on PE links
- Examine packet captures for label usage
Test Traffic
Continuous ping from host CE_R27 to CE_R29.
CE_R27#ping 172.16.50.2
rep 2147483647
Type escape
sequence to abort.
Sending
2147483647, 100-byte ICMP Echos to 172.16.50.2, timeout is 2 seconds:
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
.. snip ..
|
Traffic
Observation
CE_R27’s Traffic
Distribution
CE_R27’s G1 and G2 Ether-channeled interface. The input rate was double the output.
CE_R27#sh int
po50 | in rate
Queueing strategy: fifo
5 minute input rate 50000 bits/sec, 55 packets/sec
5 minute output rate 25000 bits/sec, 28 packets/sec
|
CE_R27’s G1 Ethernet interface to PE_MXR01. This traffic flow selected G1 as the egress
interface and sent 28 packets per second (pps).
The input rate was identical to the output, so this would suggest these
were the legitimate replies.
CE_R27#sh int
g1 | in rate
Queueing strategy: fifo
5 minute input rate 25000 bits/sec, 28 packets/sec
5 minute output rate 25000 bits/sec, 28 packets/sec
|
CE_R27’s G2 Ethernet interface to PE_MXR03. This interface was receiving 28 pps of extra
traffic.
CE_R27#sh int
g2 | in rate
Queueing strategy: fifo
5 minute input rate 25000 bits/sec, 28 packets/sec
5 minute output rate 0 bits/sec, 0 packets/sec
|
PE_MXR01’s Traffic Distribution
PE_MXR01’s AC interface to CE_R27 (Gig1). This AC interface’s input/output rates were
in-line with the CE.
admin@PE_MXR01>
show interfaces ge-0/0/2.500 detail | grep pps
Input
packets:
2613229 28 pps
Output packets: 696154 28 pps
|
PE_MXR01 MPLS interface to P_R03. Majority of traffic was via P3 in the MPLS
core.
admin@PE_MXR01>
show interfaces ge-0/0/1.44 detail | grep pps
Input
packets:
698531 28 pps
Output packets: 2615863 28 pps
|
PE_MXR01 MPLS interface to P_R01. No traffic out to the P1 core router.
admin@PE_MXR01>
show interfaces ge-0/0/1.45 detail | grep pps
Input
packets:
2 0 pps
Output packets: 0 0 pps
|
PE_MXR02’s Traffic Distribution
PE_MXR02’s AC interface to CE_R28. The destination AC interface was seeing 28
pps in both directions. This would
indicate that the end host only received and replied 28 pps worth of traffic,
not anything more.
admin@PE_MXR02>
show interfaces ge-0/0/2.500 detail |grep pps
Input
packets:
2265249 28 pps
Output packets: 2485439 28 pps
|
PE_MXR02’s MPLS interface to P_R03. At this point in the network, the output rate
doubles. If the AC interface towards
CE_R28 saw only 28 pps of outbound traffic, this would suggest this PE
duplicated traffic.
admin@PE_MXR02>
show interfaces ge-0/0/1.46 detail |grep pps
Input
packets:
2513443 28 pps
Output packets: 3069218 56 pps
|
PE_MXR02’s MPLS interface to P_R04. No traffic to P4.
admin@PE_MXR02>
show interfaces ge-0/0/1.47 detail |grep pps
Input
packets:
23965 0 pps
Output packets: 0 0 pps
|
PE_MXR03’s Traffic
Distribution
PE_MXR03’s AC interface to CE_R27 (Gig2). The redundant AC interface to CE_R27 only saw
outbound traffic. This was the 28 pps of
extra traffic that was sent from PE_MXR02.
admin@PE_MXR03>
show interfaces ge-0/0/3.500 detail |grep pps
Input
packets:
0 0 pps
Output packets: 170575 28 pps
|
PE_MXR03’s MPLS interface to P_R02. Again, this was the 28 pps of extra traffic
received from PE_MXR02.
admin@PE_MXR03>
show interfaces ge-0/0/1.48 detail |grep pps
Input
packets: 2412297 28 pps
Output packets: 493 0 pps
|
PE_MXR03’s MPLS interface to P_R01. No traffic to P1.
admin@PE_MXR03>
show interfaces ge-0/0/1.49 detail |grep pps
Input
packets:
3 0 pps
Output packets: 0 0 pps
|
Wireshark Traffic
Analysis
Capture 1 – Link
between PE_MXR01 and P_R03
Capture 1, Frame 2 was the ICMP request from CE_R27 to CE_R28 as seen between PE_MXR01 and P_R03. This request was
sent to CE_R28's MAC address of 00:0c:29:49:aa:8c.
Based on the Type 2 route lookup for this destination MAC, a zero ESI would indicate that
the destination was single-homed and therefore a Type 1 aliasing route lookup wasn't
necessary. As seen from the capture, its label usage (bottom label 299776, top label 328) was consistent with the route lookup.
admin@PE_MXR01>
show route table bgp.evpn.0 extensive
bgp.evpn.0: 8
destinations, 8 routes (8 active, 0 holddown, 0 hidden)
.. snip..
2:112.112.112.112:50::500::00:0c:29:49:aa:8c/304 MAC/IP (1 entry, 0
announced)
*BGP Preference: 170/-101
Route Distinguisher:
112.112.112.112:50
Next hop type: Indirect, Next
hop index: 0
Address: 0xb7a31f0
Next-hop reference count: 6
Source: 112.112.112.112
Protocol next hop:
112.112.112.112
Indirect next hop: 0x2
no-forward INH Session ID: 0x0
State: <Active Int Ext>
Local AS: 2345 Peer AS: 2345
Age: 4:59 Metric2: 1
Validation State: unverified
Task: BGP_2345.112.112.112.112
AS path: I
Communities: target:2345:50
Import Accepted
Route Label: 299776
ESI: 00:00:00:00:00:00:00:00:00:00
Localpref: 100
Router ID: 112.112.112.112
Secondary Tables:
EVPN_CUSTOMER_G_ELAN_500.evpn.0
Indirect next hops: 1
Protocol next hop:
112.112.112.112 Metric: 1
Indirect next hop: 0x2 no-forward INH
Session ID: 0x0
Indirect path
forwarding next hops: 1
Next hop
type: Router
Next hop:
10.1.1.81 via ge-0/0/1.44
Session Id:
0x0
112.112.112.112/32
Originating RIB: inet.3
Metric: 1 Node path count: 1
Forwarding
nexthops: 1
Nexthop: 10.1.1.81 via
ge-0/0/1.44
|
admin@PE_MXR01>
show route table bgp.evpn.0
bgp.evpn.0: 8
destinations, 8 routes (8 active, 0 holddown, 0 hidden)
+ = Active
Route, - = Last Active, * = Both
.. snip ..
2:112.112.112.112:50::500::00:0c:29:49:aa:8c/304 MAC/IP
*[BGP/170] 00:17:37,
localpref 100, from 112.112.112.112
AS path: I,
validation-state: unverified
> to 10.1.1.81 via
ge-0/0/1.44, Push 328
|
Capture 2 – Link between
P_R03 and PE_MXR02
Capture2, Frame 1 was the ICMP request from CE_R27 to CE_R28 as seen from P_R03 to PE_MXR02. This request was sent to CE_R28's MAC address of 00:0c:29:49:aa:8c.
As P_R03 forwarded the packet to PE_MXR02, it popped the top
label of 328. The VPN label of 299776
was looked up, associated to the EVPN instance and the packet was then delivered to CE_R28. At this point, everything looked normal.
admin@PE_MXR02>
show route table mpls.0 label 299776
mpls.0: 52
destinations, 52 routes (52 active, 0 holddown, 0 hidden)
+ = Active
Route, - = Last Active, * = Both
299776
*[EVPN/7] 00:18:26, routing-instance EVPN_CUSTOMER_G_ELAN_500,
route-type Ingress-MAC,
vlan-id 500
to table
EVPN_CUSTOMER_G_ELAN_500.evpn-mac.0
|
Capture 2, Frame 2 was the ICMP reply from CE_R28 to CE_R27 as seen from PE_MXR02 to P_R03. According to Wireshark’s analysis (arrows),
this was Frame 1’s corresponding reply.
The initial Type 2 route lookup would indicate the destination host was multi-homed and as a result, the Type 1 aliasing route's labels should have been used to load balance the return traffic (label 300640 to PE_MXR01 and label 300512 to PE_MXR03).
admin@PE_MXR02>
show route table bgp.evpn.0 extensive
bgp.evpn.0: 9
destinations, 9 routes (9 active, 0 holddown, 0 hidden)
.. snip ..
1:111.111.111.111:50::112233445566778899::0/192
AD/EVI (1 entry, 0 announced)
*BGP Preference: 170/-101
Route Distinguisher:
111.111.111.111:50
Next hop type: Indirect, Next
hop index: 0
Address: 0xb7a5950
Next-hop reference count: 10
Source: 111.111.111.111
Protocol next hop:
111.111.111.111
Indirect next hop: 0x2
no-forward INH Session ID: 0x0
State: <Active Int Ext>
Local AS: 2345 Peer AS: 2345
Age: 20:54:43 Metric2: 1
Validation State: unverified
Task:
BGP_2345.111.111.111.111
AS path: I
Communities: target:2345:50
Import Accepted
Route Label: 300640
Localpref: 100
Router ID: 111.111.111.111
Secondary Tables:
EVPN_CUSTOMER_G_ELAN_500.evpn.0
Indirect next hops: 1
Protocol next hop:
111.111.111.111 Metric: 1
Indirect next hop:
0x2 no-forward INH Session ID: 0x0
Indirect path
forwarding next hops: 1
Next hop
type: Router
Next hop:
10.1.1.89 via ge-0/0/1.46
Session Id:
0x0
111.111.111.111/32
Originating RIB: inet.3
Metric: 1 Node path count: 1
Forwarding
nexthops: 1
Nexthop: 10.1.1.89 via
ge-0/0/1.46
1:113.113.113.113:50::112233445566778899::0/192
AD/EVI (1 entry, 0 announced)
*BGP Preference: 170/-101
Route Distinguisher:
113.113.113.113:50
Next hop type: Indirect, Next
hop index: 0
Address: 0xb7a7630
Next-hop reference count: 8
Source: 113.113.113.113
Protocol next hop:
113.113.113.113
Indirect next hop: 0x2
no-forward INH Session ID: 0x0
State: <Active Int Ext>
Local AS: 2345 Peer AS: 2345
Age: 21:12:15 Metric2: 1
Validation State: unverified
Task:
BGP_2345.113.113.113.113
AS path: I
Communities: target:2345:50
Import Accepted
Route Label: 300512
Localpref: 100
Router ID: 113.113.113.113
Secondary Tables:
EVPN_CUSTOMER_G_ELAN_500.evpn.0
Indirect next hops: 1
Protocol next hop:
113.113.113.113 Metric: 1
Indirect next hop:
0x2 no-forward INH Session ID: 0x0
Indirect path forwarding
next hops: 1
Next hop
type: Router
Next hop:
10.1.1.89 via ge-0/0/1.46
Session Id:
0x0
113.113.113.113/32
Originating RIB: inet.3
Metric: 1 Node path count: 1
Forwarding
nexthops: 1
Nexthop:
10.1.1.89 via ge-0/0/1.46
2:111.111.111.111:50::500::00:1e:e5:c8:0f:f1/304
MAC/IP (1 entry, 0 announced)
*BGP Preference: 170/-101
Route Distinguisher:
111.111.111.111:50
Next hop type: Indirect, Next
hop index: 0
Address: 0xb7a5950
Next-hop reference count: 10
Source: 111.111.111.111
Protocol next hop:
111.111.111.111
Indirect next hop: 0x2
no-forward INH Session ID: 0x0
State: <Active Int Ext>
Local AS: 2345 Peer AS: 2345
Age: 3 Metric2: 1
Validation State: unverified
Task:
BGP_2345.111.111.111.111
AS path: I
Communities: target:2345:50
Import Accepted
Route Label: 300640
ESI: 00:11:22:33:44:55:66:77:88:99
Localpref: 100
Router ID: 111.111.111.111
Secondary Tables:
EVPN_CUSTOMER_G_ELAN_500.evpn.0
Indirect next hops: 1
Protocol next hop:
111.111.111.111 Metric: 1
Indirect next hop:
0x2 no-forward INH Session ID: 0x0
Indirect path
forwarding next hops: 1
Next hop type: Router
Next hop:
10.1.1.89 via ge-0/0/1.46
Session Id:
0x0
111.111.111.111/32
Originating RIB: inet.3
Metric: 1 Node path count: 1
Forwarding
nexthops: 1
Nexthop:
10.1.1.89 via ge-0/0/1.46
|
However, the Wireshark capture displayed PE_MXR02's frame using a label stack
based on
the PE_MXR03’s Type3 Ingress-IM route’s label (bottom 300528, top 330). Neither of the
Type 2 label or Type 1 aliasing label were used. This was not in-line
with
the expected behavior for EVPN aliasing.
admin@PE_MXR02>
show route table bgp.evpn.0 extensive
bgp.evpn.0: 9
destinations, 9 routes (9 active, 0 holddown, 0 hidden)
.. snip ..
3:113.113.113.113:50::500::113.113.113.113/248 IM (1 entry, 0
announced)
*BGP Preference: 170/-101
Route Distinguisher:
113.113.113.113:50
PMSI: Flags 0x0: Label 300528: Type INGRESS-REPLICATION
113.113.113.113
Next hop type: Indirect, Next
hop index: 0
Address: 0xb7a6af0
Next-hop reference count: 8
Source: 113.113.113.113
Protocol next hop:
113.113.113.113
Indirect next hop: 0x2
no-forward INH Session ID: 0x0
State: <Active Int Ext>
Local AS: 2345 Peer AS: 2345
Age: 34:37 Metric2: 1
Validation State: unverified
Task:
BGP_2345.113.113.113.113
AS path: I
Communities: target:2345:50
Import Accepted
Localpref: 100
Router ID: 113.113.113.113
Secondary Tables:
EVPN_CUSTOMER_G_ELAN_500.evpn.0
Indirect next hops: 1
Protocol next hop:
113.113.113.113 Metric: 1
Indirect next hop:
0x2 no-forward INH Session ID: 0x0
Indirect path
forwarding next hops: 1
Next hop
type: Router
Next hop:
10.1.1.89 via ge-0/0/1.46
Session Id:
0x0
113.113.113.113/32
Originating RIB: inet.3
Metric: 1 Node path count: 1
Forwarding
nexthops: 1
Nexthop:
10.1.1.89 via ge-0/0/1.46
|
admin@PE_MXR02>
show route table bgp.evpn.0
bgp.evpn.0: 9
destinations, 9 routes (9 active, 0 holddown, 0 hidden)
+ = Active
Route, - = Last Active, * = Both
.. snip ..
3:113.113.113.113:50::500::113.113.113.113/248
IM
*[BGP/170] 00:32:50,
localpref 100, from 113.113.113.113
AS path: I,
validation-state: unverified
> to 10.1.1.89 via
ge-0/0/1.46, Push 330
|
Furthermore, the EVPN instance output from PE_MXR02
confirmed both Type 1 and Type 2 labels were received from their respective
PEs. As for why this PE ignored these labels to forward traffic was unknown.
admin@PE_MXR02>
show evpn instance extensive
Instance:
EVPN_CUSTOMER_G_ELAN_500
Route Distinguisher: 112.112.112.112:50
Per-instance MAC route label: 299776
MAC database status Local Remote
MAC advertisements: 1 1
MAC+IP advertisements: 0 0
Default gateway MAC advertisements: 0
0
Number of local interfaces: 1 (1 up)
Interface name ESI Mode Status AC-Role
ge-0/0/2.500 00:00:00:00:00:00:00:00:00:00 single-homed Up Root
Number of IRB interfaces: 0 (0 up)
Number of bridge domains: 2
VLAN
Domain ID Intfs / up IRB intf
Mode MAC sync IM route label SG sync
IM core nexthop
500 1
1 Extended Enabled 299840 Disabled
501 1 1 Extended Enabled 299856 Disabled
Number of neighbors: 2
Address MAC MAC+IP AD
IM ES Leaf-label
111.111.111.111 1 0 2 2 0
113.113.113.113 0 0 2 2 0
Number of ethernet segments: 1
ESI: 00:11:22:33:44:55:66:77:88:99
Status: Unresolved
Number of remote PEs connected: 2
Remote PE
MAC label Aliasing label Mode
113.113.113.113
300512 300512
all-active
111.111.111.111
300640 300640
all-active
Instance:
__default_evpn__
Route Distinguisher: 112.112.112.112:0
Number of bridge domains: 0
Number of neighbors: 0
|
Capture 2, Frame 3 was not part of the original ICMP
request. This reply appeared to be replicated from PE_MXR02.
The difference with this frame vs. the legitimate reply was that the packet’s label stack again used the Ingress-IM labels (bottom 303216, top 329) to forward towards PE_MXR01.
Although PE_MXR02 appeared to “load-balance” the reply traffic, (i.e., Frame 2 going towards PE_MXR03 and Frame 3 towards PE_MXR01) it clearly did not do so using the aliasing method. It simply replicated this traffic and sent it via another path. The strange thing was that despite the ICMP test traffic being known unicast and also receiving valid labels (from the Type 2 MAC and Type 1 EAD per EVI routes), the PE still ignored these labels.
Although PE_MXR02 appeared to “load-balance” the reply traffic, (i.e., Frame 2 going towards PE_MXR03 and Frame 3 towards PE_MXR01) it clearly did not do so using the aliasing method. It simply replicated this traffic and sent it via another path. The strange thing was that despite the ICMP test traffic being known unicast and also receiving valid labels (from the Type 2 MAC and Type 1 EAD per EVI routes), the PE still ignored these labels.
Perhaps PE_MXR02 misclassified this frame (as well as Frame 2)
as BUM traffic and therefore used the IM label for Ingress replication? I’m unsure why this would
happen. If it was classified as BUM, I would think this PE would have also replicated this frame and sent it to the
other host connected off PE_MXR03 (i.e., CE_R29). However this was not seen in any of the
captures.
admin@PE_MXR02>
show route table bgp.evpn.0 extensive
bgp.evpn.0: 9
destinations, 9 routes (9 active, 0 holddown, 0 hidden)
.. snip ..
3:111.111.111.111:50::500::111.111.111.111/248 IM (1 entry, 0
announced)
*BGP Preference: 170/-101
Route Distinguisher:
111.111.111.111:50
PMSI: Flags 0x0: Label 303216: Type INGRESS-REPLICATION
111.111.111.111
Next hop type: Indirect, Next
hop index: 0
Address: 0xb7a6790
Next-hop reference count: 10
Source: 111.111.111.111
Protocol next hop:
111.111.111.111
Indirect next hop: 0x2
no-forward INH Session ID: 0x0
State: <Active Int Ext>
Local AS: 2345 Peer AS: 2345
Age: 34:34 Metric2: 1
Validation State: unverified
Task:
BGP_2345.111.111.111.111
AS path: I
Communities: target:2345:50
Import Accepted
Localpref: 100
Router ID: 111.111.111.111
Secondary Tables:
EVPN_CUSTOMER_G_ELAN_500.evpn.0
Indirect next hops: 1
Protocol next hop:
111.111.111.111 Metric: 1
Indirect next hop:
0x2 no-forward INH Session ID: 0x0
Indirect path
forwarding next hops: 1
Next hop
type: Router
Next hop:
10.1.1.89 via ge-0/0/1.46
Session Id:
0x0
111.111.111.111/32
Originating RIB: inet.3
Metric: 1 Node path count: 1
Forwarding
nexthops: 1
Nexthop:
10.1.1.89 via ge-0/0/1.46
|
admin@PE_MXR02>
show route table bgp.evpn.0
bgp.evpn.0: 9
destinations, 9 routes (9 active, 0 holddown, 0 hidden)
+ = Active
Route, - = Last Active, * = Both
.. snip ..
3:111.111.111.111:50::500::111.111.111.111/248
IM
*[BGP/170] 00:32:47,
localpref 100, from 111.111.111.111
AS path: I,
validation-state: unverified
> to 10.1.1.89 via
ge-0/0/1.46, Push 329
|
Conclusion
From a 40,000 foot view, the traffic appeared to be load
balanced as expected. However after I examined the traffic more closely, this was not the case. At this point I’m unsure
why this occurred. This was not normal
behavior from my understanding of all-active operation. This all well could be a simple misconfiguration on my part,
however I couldn’t find a lot of information on the Internet about this
particular issue. I will keep searching
for the answer though…