Sunday, February 18, 2018

EVPN All-Active Multi-Homing - Load Balancing Traffic Analysis

The test conducted below was my attempt to observe and document EVPN’s all-active load balancing capabilities under normal operating conditions.  By examining the BGP routing and Wireshark traces, my objective was to get a detailed understanding of how the MPLS labels were exchanged and used in an EVPN network to achieve the load balancing behavior.

Test Plan
  • Simulate traffic flow from CE_R27 to CE_R29
  • Observe interface statistics to determine traffic routing
  • Observe BGP routing and label exchange
  • Perform targeted packet captures on PE links
  • Examine packet captures for label usage



Test Traffic

Continuous ping from host CE_R27 to CE_R29.

CE_R27#ping 172.16.50.2 rep 2147483647
Type escape sequence to abort.
Sending 2147483647, 100-byte ICMP Echos to 172.16.50.2, timeout is 2 seconds:
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
.. snip ..


Traffic Observation

CE_R27’s Traffic Distribution

CE_R27’s G1 and G2 Ether-channeled interface.  The input rate was double the output. 

CE_R27#sh int po50 | in rate
  Queueing strategy: fifo
  5 minute input rate 50000 bits/sec, 55 packets/sec
  5 minute output rate 25000 bits/sec, 28 packets/sec

CE_R27’s G1 Ethernet interface to PE_MXR01.  This traffic flow selected G1 as the egress interface and sent 28 packets per second (pps).  The input rate was identical to the output, so this would suggest these were the legitimate replies.

CE_R27#sh int g1 | in rate 
  Queueing strategy: fifo
  5 minute input rate 25000 bits/sec, 28 packets/sec
  5 minute output rate 25000 bits/sec, 28 packets/sec

CE_R27’s G2 Ethernet interface to PE_MXR03.  This interface was receiving 28 pps of extra traffic.

CE_R27#sh int g2 | in rate
  Queueing strategy: fifo
  5 minute input rate 25000 bits/sec, 28 packets/sec
  5 minute output rate 0 bits/sec, 0 packets/sec


PE_MXR01’s Traffic Distribution

PE_MXR01’s AC interface to CE_R27 (Gig1).  This AC interface’s input/output rates were in-line with the CE.

admin@PE_MXR01> show interfaces ge-0/0/2.500 detail | grep pps
     Input  packets:              2613229                   28 pps
     Output packets:               696154                   28 pps

PE_MXR01 MPLS interface to P_R03.  Majority of traffic was via P3 in the MPLS core.

admin@PE_MXR01> show interfaces ge-0/0/1.44 detail | grep pps    
     Input  packets:               698531                   28 pps
     Output packets:              2615863                   28 pps

PE_MXR01 MPLS interface to P_R01.  No traffic out to the P1 core router.

admin@PE_MXR01> show interfaces ge-0/0/1.45 detail | grep pps   
     Input  packets:                    2                    0 pps
     Output packets:                    0                    0 pps


PE_MXR02’s Traffic Distribution

PE_MXR02’s AC interface to CE_R28.  The destination AC interface was seeing 28 pps in both directions.  This would indicate that the end host only received and replied 28 pps worth of traffic, not anything more.

admin@PE_MXR02> show interfaces ge-0/0/2.500 detail |grep pps
     Input  packets:              2265249                   28 pps
     Output packets:              2485439                   28 pps

PE_MXR02’s MPLS interface to P_R03.  At this point in the network, the output rate doubles.  If the AC interface towards CE_R28 saw only 28 pps of outbound traffic, this would suggest this PE duplicated traffic.

admin@PE_MXR02> show interfaces ge-0/0/1.46 detail |grep pps   
     Input  packets:              2513443                   28 pps
     Output packets:              3069218                   56 pps

PE_MXR02’s MPLS interface to P_R04.  No traffic to P4.

admin@PE_MXR02> show interfaces ge-0/0/1.47 detail |grep pps    
     Input  packets:                23965                    0 pps
     Output packets:                    0                    0 pps



PE_MXR03’s Traffic Distribution

PE_MXR03’s AC interface to CE_R27 (Gig2).  The redundant AC interface to CE_R27 only saw outbound traffic.  This was the 28 pps of extra traffic that was sent from PE_MXR02.

admin@PE_MXR03> show interfaces ge-0/0/3.500 detail |grep pps
     Input  packets:                    0                    0 pps
     Output packets:               170575                   28 pps

PE_MXR03’s MPLS interface to P_R02.  Again, this was the 28 pps of extra traffic received from PE_MXR02.

admin@PE_MXR03> show interfaces ge-0/0/1.48 detail |grep pps    
     Input  packets:              2412297                   28 pps
     Output packets:                  493                    0 pps

PE_MXR03’s MPLS interface to P_R01.  No traffic to P1.

admin@PE_MXR03> show interfaces ge-0/0/1.49 detail |grep pps   
     Input  packets:                    3                    0 pps
     Output packets:                    0                    0 pps



Wireshark Traffic Analysis

Capture 1 – Link between PE_MXR01 and P_R03

Capture 1, Frame 2 was the ICMP request from CE_R27 to CE_R28 as seen between PE_MXR01 and P_R03.  This request was sent to CE_R28's MAC address of 00:0c:29:49:aa:8c.  




Based on the Type 2 route lookup for this destination MAC, a zero ESI would indicate that the destination was single-homed and therefore a Type 1 aliasing route lookup wasn't necessary.  As seen from the capture, its label usage (bottom label 299776, top label 328) was consistent with the route lookup.

admin@PE_MXR01> show route table bgp.evpn.0 extensive

bgp.evpn.0: 8 destinations, 8 routes (8 active, 0 holddown, 0 hidden)

.. snip..

2:112.112.112.112:50::500::00:0c:29:49:aa:8c/304 MAC/IP (1 entry, 0 announced)
        *BGP    Preference: 170/-101
                Route Distinguisher: 112.112.112.112:50
                Next hop type: Indirect, Next hop index: 0
                Address: 0xb7a31f0
                Next-hop reference count: 6
                Source: 112.112.112.112
                Protocol next hop: 112.112.112.112
                Indirect next hop: 0x2 no-forward INH Session ID: 0x0
                State: <Active Int Ext>
                Local AS:  2345 Peer AS:  2345
                Age: 4:59       Metric2: 1
                Validation State: unverified
                Task: BGP_2345.112.112.112.112
                AS path: I             
                Communities: target:2345:50
                Import Accepted        
                Route Label: 299776    
                ESI: 00:00:00:00:00:00:00:00:00:00
                Localpref: 100         
                Router ID: 112.112.112.112
                Secondary Tables: EVPN_CUSTOMER_G_ELAN_500.evpn.0
                Indirect next hops: 1  
                        Protocol next hop: 112.112.112.112 Metric: 1
                        Indirect next hop: 0x2 no-forward INH Session ID: 0x0
                        Indirect path forwarding next hops: 1
                                Next hop type: Router
                                Next hop: 10.1.1.81 via ge-0/0/1.44
                                Session Id: 0x0
                        112.112.112.112/32 Originating RIB: inet.3
                          Metric: 1                       Node path count: 1
                          Forwarding nexthops: 1
                                Nexthop: 10.1.1.81 via ge-0/0/1.44


admin@PE_MXR01> show route table bgp.evpn.0

bgp.evpn.0: 8 destinations, 8 routes (8 active, 0 holddown, 0 hidden)
+ = Active Route, - = Last Active, * = Both

.. snip ..

2:112.112.112.112:50::500::00:0c:29:49:aa:8c/304 MAC/IP       
                   *[BGP/170] 00:17:37, localpref 100, from 112.112.112.112
                      AS path: I, validation-state: unverified
                    > to 10.1.1.81 via ge-0/0/1.44, Push 328



Capture 2 – Link between P_R03 and PE_MXR02

Capture2, Frame 1 was the ICMP request from CE_R27 to CE_R28 as seen from P_R03 to PE_MXR02.  This request was sent to CE_R28's MAC address of 00:0c:29:49:aa:8c.  





As P_R03 forwarded the packet to PE_MXR02, it popped the top label of 328.  The VPN label of 299776 was looked up, associated to the EVPN instance and the packet was then delivered to CE_R28.  At this point, everything looked normal.

admin@PE_MXR02> show route table mpls.0 label 299776

mpls.0: 52 destinations, 52 routes (52 active, 0 holddown, 0 hidden)
+ = Active Route, - = Last Active, * = Both

299776             *[EVPN/7] 00:18:26, routing-instance EVPN_CUSTOMER_G_ELAN_500, route-type Ingress-MAC, vlan-id 500
                      to table EVPN_CUSTOMER_G_ELAN_500.evpn-mac.0



Capture 2, Frame 2 was the ICMP reply from CE_R28 to CE_R27 as seen from PE_MXR02 to P_R03.  According to Wireshark’s analysis (arrows), this was Frame 1’s corresponding reply. 





The initial Type 2 route lookup would indicate the destination host was multi-homed and as a result, the Type 1 aliasing route's labels should have been used to load balance the return traffic (label 300640 to PE_MXR01 and label 300512 to PE_MXR03).

admin@PE_MXR02> show route table bgp.evpn.0 extensive   

bgp.evpn.0: 9 destinations, 9 routes (9 active, 0 holddown, 0 hidden)

.. snip ..

1:111.111.111.111:50::112233445566778899::0/192 AD/EVI (1 entry, 0 announced)
        *BGP    Preference: 170/-101
                Route Distinguisher: 111.111.111.111:50
                Next hop type: Indirect, Next hop index: 0
                Address: 0xb7a5950
                Next-hop reference count: 10
                Source: 111.111.111.111
                Protocol next hop: 111.111.111.111
                Indirect next hop: 0x2 no-forward INH Session ID: 0x0
                State: <Active Int Ext>
                Local AS:  2345 Peer AS:  2345
                Age: 20:54:43   Metric2: 1
                Validation State: unverified
                Task: BGP_2345.111.111.111.111
                AS path: I
                Communities: target:2345:50
                Import Accepted
                Route Label: 300640
                Localpref: 100
                Router ID: 111.111.111.111
                Secondary Tables: EVPN_CUSTOMER_G_ELAN_500.evpn.0
                Indirect next hops: 1
                        Protocol next hop: 111.111.111.111 Metric: 1
                        Indirect next hop: 0x2 no-forward INH Session ID: 0x0
                        Indirect path forwarding next hops: 1
                                Next hop type: Router
                                Next hop: 10.1.1.89 via ge-0/0/1.46
                                Session Id: 0x0
                        111.111.111.111/32 Originating RIB: inet.3
                          Metric: 1                       Node path count: 1
                          Forwarding nexthops: 1
                                Nexthop: 10.1.1.89 via ge-0/0/1.46


1:113.113.113.113:50::112233445566778899::0/192 AD/EVI (1 entry, 0 announced)
        *BGP    Preference: 170/-101
                Route Distinguisher: 113.113.113.113:50
                Next hop type: Indirect, Next hop index: 0
                Address: 0xb7a7630
                Next-hop reference count: 8
                Source: 113.113.113.113
                Protocol next hop: 113.113.113.113
                Indirect next hop: 0x2 no-forward INH Session ID: 0x0
                State: <Active Int Ext>
                Local AS:  2345 Peer AS:  2345
                Age: 21:12:15   Metric2: 1
                Validation State: unverified
                Task: BGP_2345.113.113.113.113
                AS path: I
                Communities: target:2345:50
                Import Accepted
                Route Label: 300512
                Localpref: 100
                Router ID: 113.113.113.113
                Secondary Tables: EVPN_CUSTOMER_G_ELAN_500.evpn.0
                Indirect next hops: 1
                        Protocol next hop: 113.113.113.113 Metric: 1
                        Indirect next hop: 0x2 no-forward INH Session ID: 0x0
                        Indirect path forwarding next hops: 1
                                Next hop type: Router
                                Next hop: 10.1.1.89 via ge-0/0/1.46
                                Session Id: 0x0
                        113.113.113.113/32 Originating RIB: inet.3
                          Metric: 1                       Node path count: 1
                          Forwarding nexthops: 1
                                Nexthop: 10.1.1.89 via ge-0/0/1.46


2:111.111.111.111:50::500::00:1e:e5:c8:0f:f1/304 MAC/IP (1 entry, 0 announced)
        *BGP    Preference: 170/-101
                Route Distinguisher: 111.111.111.111:50
                Next hop type: Indirect, Next hop index: 0
                Address: 0xb7a5950
                Next-hop reference count: 10
                Source: 111.111.111.111
                Protocol next hop: 111.111.111.111
                Indirect next hop: 0x2 no-forward INH Session ID: 0x0
                State: <Active Int Ext>
                Local AS:  2345 Peer AS:  2345
                Age: 3  Metric2: 1
                Validation State: unverified
                Task: BGP_2345.111.111.111.111
                AS path: I
                Communities: target:2345:50
                Import Accepted
                Route Label: 300640
                ESI: 00:11:22:33:44:55:66:77:88:99
                Localpref: 100
                Router ID: 111.111.111.111
                Secondary Tables: EVPN_CUSTOMER_G_ELAN_500.evpn.0
                Indirect next hops: 1
                        Protocol next hop: 111.111.111.111 Metric: 1
                        Indirect next hop: 0x2 no-forward INH Session ID: 0x0
                        Indirect path forwarding next hops: 1
                                Next hop type: Router
                                Next hop: 10.1.1.89 via ge-0/0/1.46
                                Session Id: 0x0
                        111.111.111.111/32 Originating RIB: inet.3
                          Metric: 1                       Node path count: 1
                          Forwarding nexthops: 1
                                Nexthop: 10.1.1.89 via ge-0/0/1.46


However, the Wireshark capture displayed PE_MXR02's frame using a label stack based on the PE_MXR03’s Type3 Ingress-IM route’s label (bottom 300528, top 330).  Neither of the Type 2 label or Type 1 aliasing label were used.  This was not in-line with the expected behavior for EVPN aliasing.

admin@PE_MXR02> show route table bgp.evpn.0 extensive   

bgp.evpn.0: 9 destinations, 9 routes (9 active, 0 holddown, 0 hidden)

.. snip ..

3:113.113.113.113:50::500::113.113.113.113/248 IM (1 entry, 0 announced)
        *BGP    Preference: 170/-101
                Route Distinguisher: 113.113.113.113:50
                PMSI: Flags 0x0: Label 300528: Type INGRESS-REPLICATION 113.113.113.113
                Next hop type: Indirect, Next hop index: 0
                Address: 0xb7a6af0
                Next-hop reference count: 8
                Source: 113.113.113.113
                Protocol next hop: 113.113.113.113
                Indirect next hop: 0x2 no-forward INH Session ID: 0x0
                State: <Active Int Ext>
                Local AS:  2345 Peer AS:  2345
                Age: 34:37      Metric2: 1
                Validation State: unverified
                Task: BGP_2345.113.113.113.113
                AS path: I
                Communities: target:2345:50
                Import Accepted
                Localpref: 100
                Router ID: 113.113.113.113
                Secondary Tables: EVPN_CUSTOMER_G_ELAN_500.evpn.0
                Indirect next hops: 1
                        Protocol next hop: 113.113.113.113 Metric: 1
                        Indirect next hop: 0x2 no-forward INH Session ID: 0x0
                        Indirect path forwarding next hops: 1
                                Next hop type: Router
                                Next hop: 10.1.1.89 via ge-0/0/1.46
                                Session Id: 0x0
                        113.113.113.113/32 Originating RIB: inet.3
                          Metric: 1                       Node path count: 1
                          Forwarding nexthops: 1
                                Nexthop: 10.1.1.89 via ge-0/0/1.46


admin@PE_MXR02> show route table bgp.evpn.0   

bgp.evpn.0: 9 destinations, 9 routes (9 active, 0 holddown, 0 hidden)
+ = Active Route, - = Last Active, * = Both

.. snip ..

3:113.113.113.113:50::500::113.113.113.113/248 IM           
                   *[BGP/170] 00:32:50, localpref 100, from 113.113.113.113
                      AS path: I, validation-state: unverified
                    > to 10.1.1.89 via ge-0/0/1.46, Push 330


Furthermore, the EVPN instance output from PE_MXR02 confirmed both Type 1 and Type 2 labels were received from their respective PEs.  As for why this PE ignored these labels to forward traffic was unknown.

admin@PE_MXR02> show evpn instance extensive           
Instance: EVPN_CUSTOMER_G_ELAN_500
  Route Distinguisher: 112.112.112.112:50
  Per-instance MAC route label: 299776
  MAC database status                     Local  Remote
    MAC advertisements:                       1       1
    MAC+IP advertisements:                    0       0
    Default gateway MAC advertisements:       0       0
  Number of local interfaces: 1 (1 up)
    Interface name  ESI                            Mode             Status     AC-Role
    ge-0/0/2.500    00:00:00:00:00:00:00:00:00:00  single-homed     Up         Root
  Number of IRB interfaces: 0 (0 up)
  Number of bridge domains: 2
    VLAN  Domain ID   Intfs / up    IRB intf   Mode             MAC sync  IM route label  SG sync  IM core nexthop
    500                  1    1                Extended         Enabled   299840          Disabled
    501                  1    1                Extended         Enabled   299856          Disabled
  Number of neighbors: 2
    Address               MAC    MAC+IP        AD        IM        ES Leaf-label
    111.111.111.111         1         0         2         2         0
    113.113.113.113         0         0         2         2         0
  Number of ethernet segments: 1
    ESI: 00:11:22:33:44:55:66:77:88:99
      Status: Unresolved
      Number of remote PEs connected: 2
        Remote PE        MAC label  Aliasing label  Mode
        113.113.113.113  300512     300512          all-active
        111.111.111.111  300640     300640          all-active

Instance: __default_evpn__
  Route Distinguisher: 112.112.112.112:0
  Number of bridge domains: 0
  Number of neighbors: 0



Capture 2, Frame 3 was not part of the original ICMP request.  This reply appeared to be replicated from PE_MXR02.  The difference with this frame vs. the legitimate reply was that the packet’s label stack again used the Ingress-IM labels (bottom 303216, top 329) to forward towards PE_MXR01.

Although PE_MXR02 appeared to “load-balance” the reply traffic, (i.e., Frame 2 going towards PE_MXR03 and Frame 3 towards PE_MXR01) it clearly did not do so using the aliasing method.  It simply replicated this traffic and sent it via another path.  The strange thing was that despite the ICMP test traffic being known unicast and also receiving valid labels (from the Type 2 MAC and Type 1 EAD per EVI routes), the PE still ignored these labels.

Perhaps PE_MXR02 misclassified this frame (as well as Frame 2) as BUM traffic and therefore used the IM label for Ingress replication?  I’m unsure why this would happen.  If it was classified as BUM, I would think this PE would have also replicated this frame and sent it to the other host connected off PE_MXR03 (i.e., CE_R29).  However this was not seen in any of the captures.




admin@PE_MXR02> show route table bgp.evpn.0 extensive   

bgp.evpn.0: 9 destinations, 9 routes (9 active, 0 holddown, 0 hidden)

.. snip ..

3:111.111.111.111:50::500::111.111.111.111/248 IM (1 entry, 0 announced)
        *BGP    Preference: 170/-101
                Route Distinguisher: 111.111.111.111:50
                PMSI: Flags 0x0: Label 303216: Type INGRESS-REPLICATION 111.111.111.111
                Next hop type: Indirect, Next hop index: 0
                Address: 0xb7a6790
                Next-hop reference count: 10
                Source: 111.111.111.111
                Protocol next hop: 111.111.111.111
                Indirect next hop: 0x2 no-forward INH Session ID: 0x0
                State: <Active Int Ext>
                Local AS:  2345 Peer AS:  2345
                Age: 34:34      Metric2: 1
                Validation State: unverified
                Task: BGP_2345.111.111.111.111
                AS path: I
                Communities: target:2345:50
                Import Accepted
                Localpref: 100
                Router ID: 111.111.111.111
                Secondary Tables: EVPN_CUSTOMER_G_ELAN_500.evpn.0
                Indirect next hops: 1
                        Protocol next hop: 111.111.111.111 Metric: 1
                        Indirect next hop: 0x2 no-forward INH Session ID: 0x0
                        Indirect path forwarding next hops: 1
                                Next hop type: Router
                                Next hop: 10.1.1.89 via ge-0/0/1.46
                                Session Id: 0x0
                        111.111.111.111/32 Originating RIB: inet.3
                          Metric: 1                       Node path count: 1
                          Forwarding nexthops: 1
                                Nexthop: 10.1.1.89 via ge-0/0/1.46


admin@PE_MXR02> show route table bgp.evpn.0   

bgp.evpn.0: 9 destinations, 9 routes (9 active, 0 holddown, 0 hidden)
+ = Active Route, - = Last Active, * = Both

.. snip ..

3:111.111.111.111:50::500::111.111.111.111/248 IM           
                   *[BGP/170] 00:32:47, localpref 100, from 111.111.111.111
                      AS path: I, validation-state: unverified
                    > to 10.1.1.89 via ge-0/0/1.46, Push 329


Conclusion

From a 40,000 foot view, the traffic appeared to be load balanced as expected.  However after I examined the traffic more closely, this was not the case.  At this point I’m unsure why this occurred.  This was not normal behavior from my understanding of all-active operation.  This all well could be a simple misconfiguration on my part, however I couldn’t find a lot of information on the Internet about this particular issue.  I will keep searching for the answer though…