Sunday, December 27, 2015

Designing & Implementing BGP Communities for Policy Control

As cloud based services became increasingly popular in the enterprise, such as AWS or Microsoft Azure, it was only a matter of time before we started to do some major integration with these service providers.  In fact, part of our IT strategy was to utilize these services when we can.  So in an effort to adapt to these changes, new tools were required to help us manage the network differently.

One subject that was really important to us was the traffic engineering aspect.  We realized that peering with these cloud providers would require a lot of route management due to the additional prefixes we would learn from them.  It was essential for us to be able to apply policy to cloud traffic differently from internal traffic without the need for custom configurations and manual intervention.  This was when we looked into creating BGP communities.

During the research and design phase, I didn’t find a lot of information on BGP communities as it applied to an enterprise network, so I figured that I would write about my experience.  I’ll try to explain some of the methods that were used in developing these communities and how they were implemented in the network.


Community Overview

The document below played an instrumental role in my understanding of BGP communities as a network scaling technique.  It presented wealth of information on community design used in large service provider networks.  It’s a great document and I highly recommend reading it.


As another technique, I would also recommend looking at the various service provider’s list of published communities to gather some good ideas (couple of examples are below).



To highlight some key ideas that were used in my implementation, some slides were taken from the NANOG document.


Community Types

Grouping communities into two distinct categories is an essential design technique.  Doing so can easily display a community’s intent on the network.




Informational Communities

Informational communities relays information about a route, such as how the route was learned or identify its geographic location etc.  This type of community can be designed to be very flexible and can be encoded to convey a lot of information.




Action Communities

Action communities are intended to invoke a specified policy change.  It can be used to control routes in many ways such as, route advertisements, suppression or BGP attribute modification.  These communities can be very powerful when designed correctly.





Per-ASN Communities

Using the Per-ASN technique also has many uses.  It can represent an action to a target ASN within your network (e.g., 65000:xx - apply xx action to ASN 65000) or can target a peer ASN’s (e.g.,  65001:4436 – 65001 can represent 1x prepend to target ASN 4436).



Community Design

Here’s a detailed look into the BGP communities that were used in my environment.  Please be aware that this was designed specifically for my network and is by no means a standard, so everyone must develop their own requirements and leverage communities to address your own network management challenges.  This post was only meant as a practical use case for communities used in an enterprise network.

Based on my research, there seemed to be two overarching design approaches.  One was to simply encode as much information in each string.  For example using community 65000:ABBCD, we can encode 4 pieces of information. The A can represent one piece of information, BB as another, C as another and so on. Using this method can possibly reduce the number of communities to manage, however it can introduce some limitations and operational complexities.  Such as…
  • Increasingly difficult to decipher communities.
  • Amending community becomes more difficult.
  • A requirement to write sophisticated matching regex.
The other approach to this was to create an individual self-contained community with the intent to convey a single piece of information (or action).  In this case, a community of 65000:xxx could represent one piece of information whereas 65000:yyy could represent another.  The point of this method was to create simple communities but to create more of them to convey the same amount of information.  Obviously there would be more communities to manage, however it would be less complicated for the following reasons. 
  • Easier to build matching community filters & policies. No need for complex regex.
  • Easier to read and decipher. Can easily tell from informational vs. action
The design approach I used was the latter.  I felt it was easier to create multiple “simple” communities rather than creating fewer complex communities.  An example can be seen below (IP addresses and ASN’s were masked for security purposes).

Router#sh ip bgp 1.1.0.0/16
BGP routing table entry for 1.1.0.0/16, version 1050925
Paths: (1 available, best #1, table default)
Multipath: eBGP
  Advertised to update-groups:
     101        119        120        191        209      
  Refresh Epoch 1
  65000 1234, (received & used)
    192.168.0.2 from 192.168.0.2 (192.168.0.2)
      Origin incomplete, metric 0, localpref 100, valid, internal, best
      Community: 65005:887 65534:996 65534:998 65534:999 65534:50001 65534:50051 65534:50100 65534:50803
      rx pathid: 0, tx pathid: 0x0

In the highlighted community section, 8 community strings were attached to a BGP route.  Here you can easily distinguish the 4 action communities (ASN:xxx) vs. the  4 informational communities (ASN:xxxxx).

But in any case, both methods will work just fine.  You can use a combination of the two if you wish.  Again there is no standard so you just have to find a happy medium that addresses your needs.  Just make sure with whatever you choose, incorporate a design that cas accommodate future growth!  Ensure you allocate some reservations for each community because you never know what other requirements will come your way.


Informational Communities
  • ASN value will always be 65534
    • There wasn’t any need for targeted ASN informational communities
  • Community value will always use 5 digit strings in length
    • To easily differentiate between informational vs. action
    • For Example: 65534:xxxxx
  • All informational communities will start with 5 to indicate it's informational only following 4 digit strings for community definition
    • For example: 65534:5xxxx
    • Reserves the other strings for future use (e.g., 65534:4xxxx, 65534:3xxxx etc.)
  • Provides information such as:
    • POP Identifier
      • Location ID
    • Region
      • Geographic region, such as North America, Europe etc.
    • Route Source
      • Primary, Secondary or Tertiary network path
    • Route Relationship
      • Peering type, such as Internal, External, Public or Private etc. 
    • Route Type
      • Internal prefix (Branch, DC etc.)
      • External prefix (Cloud, Internet etc.)
    • Vendor Type
      • Define 3rd party vendors such as AWS or MS Azure etc.
    • Service Type
      • Define service on a per vendor basis, such as AWS S3 vs. MS Azure DB

Action Communities

  • No Target ASN
    • ASN value will be 65534 to denote an action to whole network
    • For example: 65534:xxxx will invoke an action to all peers
  • Target ASN
    • ASN value will be the actual ASN being targeted
    • For example, if ASN 65000 needs to be targeted for an action, a community with 65000:xxxx needs to be applied to a route
  • Community value 2 - 4 digits stings
    • To easily ID the community as action only
    • For example: 65534:xx or 65000:xxx
    • 4 digit string reserved for future use (i.e., 65534:xxxx) 
  • Action Type
    • Export Control
      • Export, Announce, Redistribute
      • No Export, Discard, Blackhole
    • BGP Attribute Manipulation
      • Prepend
      • Local Preference

Informational Community: POP ID

POP Code (P) - 65534:50PPP

Value
Region
100-139
POP Definition
140-198
Future Use
199
Reserved

(Used value range 100 – 139 for global POP definition. For example, 100 can represent RegionA POP1 and 110 as RegionB POP1 etc.)


Informational Community: Region ID

Continent Code (C) + ISO 3166 Country Code (c) - 65534:5Cccc

Value
Continent/SubContinent Code
1
North America
2
Europe
3
Asia
4
Australia
5
South America
6
Africa
7
Middle East
8
Reserved for future use
9
Reserved for future use


Informational Community: Route Source

Internal Route Source (S) - 65534:5000S

Value
Description
1
Primary
2
Secondary
3
Tertiary
4
Unknown/Other

(A value of 1 would indicate the route was sourced from the primary path, whereas 2 would be from backup path.  This can be used to easily identify if a circuit or routing issue has occurred.)


Informational Community: Route Relationship

Route Peering Relationship (R) - 65534:5000R

Value
Description
Example
5
Internal Peering
Peering to Internal resources (DC, Branch etc.), shares internal prefixes
6
Internal Private Peer
Peering to Cloud providers, shares internal only prefixes
7
Internal Public Peer
Peering to Cloud providers, shares internal & public prefixes
8
External Private Peer
Peering to Cloud providers, shares only public prefixes
9
External Public Peer
Peering to Internet providers, shares only public prefixes


Informational Community: Route Type

Internal Route Type (T) - 65534:500TT

Value
Description
10
Default route
11-87
Route classification
88-98
Reserved for future use
99
Reserved for unknown route type

(Used value range 11 – 87 to define the different routes types. For example, 11 for core routes and 12 for datacenter routes etc.)


Informational Community: Vendor Type

Vendor Route Type (V) - 65534:50VVV

Value
Description
800
Unknown Vendor
801 - 889
Reserved for Vendor classification
890 - 899
Reserved for future use

(Used value range 801 – 889 to define the different vendors. For example, 801 for VendorA and 802 for VendorB etc.)


Informational Community: Service Type

Service Type (s) - 65534:50sss

Value
Description
900
Unknown service
901-989
Service classification
990-999
Reserved for future use

(Used value range 901 – 989 to define the different service types per vendor. For example, 901 for VoIP services for VendorA and 910 for DB services for VendorB etc.)


Action Community: Export

All Peer Export (E) - 65534:EEE

Value
Description
880 - 885
Reserved for future use
886 – 889
Protocol Redistribution

Target Peer Export (E) - Target ASN:EEE

Value
Description
880 - 885
Reserved for future use
886 – 889
Protocol Redistribution

(Note: eBGP exporting was enabled by default so this export community was used to enable redistribution between BGP and other protocols, such as EIGRP or OSPF at various interconnection points.)

(Used value range 886 – 889 to target specific protocol redistribution points in the network. For example, 889 for BGP to EIGRP redistribution between interconnect A to B and 888 for interconnect A to C etc.)


Action Community: No Export

No Export All (N) - 65534:NNN

Value
Description
999 - 996
No Export at all peers
995 - 990
Reserved for future use


Target No Export (N) - Target ASN:NNN
Value
Description
999 - 996
Target No Export
995 - 990
Reserved for future use

(Used value range 999 – 996 to specify the various eBGP peers that should not export routes. For example, 999 defined no export to peerA and 998 to peerB etc.)  


Action Community: Prepend

Target Prepend (P) - Target ASN:PP

Value
Description
10
Reserved for future use
11-19
Prepend out 1x out various peers
20
Reserved for future use
21-29
Prepend out 2x out various peers
30
Reserved for future use
31-39
Prepend out 3x out various peers
40
Reserved for future use
41-49
Prepend out 4x out various peers

(Used the value range (e.g., 11 – 19) to specify the various eBGP peers that should prepend. For example, 11 can be assigned to prepend 1x out peerA and 12 can be used to prepend out 1x out on peerB and so on.)


Action Community: Local Preference

Local Preference (L) - Target ASN:LL or ASN:LLL

Value
Description
50
Set Local Preference to 50
90
Set Local Preference to 90
150
Set Local Preference to 150
200
Set Local Preference to 200
250
Set Local Preference to 250
300
Set Local Preference to 300
350
Set Local Preference to 350

(The value number is equal to the Local preference being set.)


Action Community: Blackhole

Blackhole All (B) - 65534:BBB

Value
Description
666
Blackhole/Discard everywhere


Selective Blackhole (b) – Target ASN:bbb

Value
Description
660 - 665
Blackhole/Discard at various peers

(Used value range 660 – 665 to target the various eBGP peers that should discard routes.  For example, 660 to discard at peerA and 661 to discard at peerB etc.)  


Implementation

Once all the community definitions have been made, the filters and policies can be created. When embarking on this task, my advice is to:
  • Use easy to understand names and descriptions for community filters and policies. Don't make the implementation more complex than it needs to be. Always be aware and make things user friendly because in most cases, you're not the only one that will need to support this.
  • Keep community filters as simple as possible.  Avoid using complicated RegEx for matching.
  • Document and publish all your communities. Make sure they are easy to find.
  • Refrain from make too many changes after the communities have been defined.  Once the policies are written and implemented in production, it will prove to be difficult or next to impossible to change.
  • Make sure to test everything in a lab!  I can't stress this enough.  Poorly written policies or out of sequence entries can inadvertently suppress routes or can easily create sub-optimal routing situations.
When determining policy points on the network, first consider where your network services live, such as the datacenters.  Apply policies to these interconnects to maximize the community’s effects then work your way to other areas of the network if needed.  In my use case, the policies were applied at the various service & regional boundaries.  For example:
  • Core Interconnects
  • Internet Transit Provider Interconnects
  • Cloud Providers Interconnects
  • Datacenters Interconnects
  • Branch Network Interconnects
However, your mileage may vary. Since every network is different, everyone will need to evaluate their own requirements and build to suit your own needs.