How would you like to do IP Multicast without PIM or RP’s? Seriously, let’s use Shortest Path Bridging and make it easy!

 

Why do we need to do this? What’s wrong with today’s network?

Anyone who has deployed or managed a large PIM multicast environment will relate to the response to this question. PIM works on the assumption of an overlay protocol model. PIM stands for Protocol Independent Multicast, which means that it can utilize any IP routing table to establish a reverse path forwarding tree. These routes can be created with any independent unicast routing protocol such as RIP or OSPF, or even be static routes or combinations thereof. In essence, there is an overlay of the different protocols to establish a pseudo-state within the network for the forwarding of multicast data. As any network engineer who has worked with large PIM deployments will attest, they are sensitive beasts that do not lend themselves well to topology changes or expansions of the network delivery system. The key word in all of this is the term ‘state’. If it is lost, then the tree truncates and the distribution service for that length of the tree is effectively lost. Consequently, changes need to be done carefully and be well tested and planned. And this is all due to the fact that the state of IP multicast services is effectively built upon a foundation of sand.

The first major point to realize is that most of today’s Ethernet switching technology still operates with the same basic theory of operation as the original IEEE 802.1d bridges. Sure, there have been enhancements of VLAN’s and tagged trunking that allow us to slice a multi-port bridge (which is what an Ethernet switch really is) up into virtual broadcast domains and extend those domains outside of the switch and between other switches. But, by and large the original operational process is based on ‘learning’. The concept of a learning bridge is shown in the simple illustration below. As a port on a bridge receives an Ethernet frame it remembers the source MAC address as well as the port it came in on. If the destination MAC address is known it will forward out to the port that it is last known to be on. As shown in the example below source MAC “A” is received on port 1. As the destination MAC “B” is known to be on port 2, the bridge will forward accordingly.

 

Figure 1. Known Forwarding

But MAC “A” also sends out a frame to destination MAC “C”. Since MAC “C” is unknown to the bridge, it will flood the frame to all ports. As a result of the flooding, MAC “C” responds and is found to be on port 3. The bridge records the information into its forwarding information base and forwards the frame accordingly from that point on. Hence, this method of bridging is known as ‘flood based learning’. As one can readily see, it is a critical function for normal local area network behavior. No one argues the value or even the neccesity of learning in the bridged or switched environment. The problem is that the example above was circa 1990.

 Figure 2. Unknown Flooding

As the figure below shows, adding in Virtual LAN’s and multi-port high speed switches makes things much more complex. The reality of it is that as the networking core grows larger, the switches in the middle get busier and busier. The forwarding tables need to be larger and larger, where end to end VLAN’s are no longer tractable so layer 3 boundaries via IP routing are introduced to segment the network domains. In the end, little MAC “A” is just one of the tens of thousands of addresses that traverse the core. In essence, there is no ‘state’ for MAC “A” (or any other MAC address for that matter).

 Figure 3. Unknown Flooding in a Routed VLAN topology

Additionally, recall that multicast is a destination address paradigm. IP multicast groups translate to destination MAC addresses at the Ethernet forwarding level. Due to the fact that it is a destination address, there needs to be a resolution to a unicast source address. This is not a straight forward process. It involves the overlay of services on top of the Ethernet forwarding environment. These services provide for the resolution of the source as well as the build of a reverse path forwarding environment and the joining of that path to any pre-existing distribution tree. In essence these overlay services embed a sort of ‘state’ to the multicast forwarding service. These overlays are also very dependent on timers for the operating protocols and the fine tuning of these timers according to established best practice to maintain the state of the service. When this state is lost or becomes ambiguous however, nasty things happen to the multicast service. This is the primary reason why multicast is so problematic in todays typical enterprise environment.

The protocols most often used to establish unicast routing service are, OSPFv2 or v3 (Open Shortest Path First – v2 being for IPv4 and v3 being for IPv6) for establishing the unicast routing tables for IP. OSPF runs over Ethernet and establishes end to end forwarding paths on top of the stateless frame based flood and learn environment below. On top of this, PIM (Protocol Independent Multicast) is run to establish the actual multicast forwarding service. Source resolution is provided by a function known as a ‘RP’ or Rendevous Point. This is an established service that registers sources for multicast and provides the ‘well known’ point within the PIM domain to establish source resolution. As a result, in PIM sparse mode all first joins to a multicast group from a given edge router is always via the RP. Once the edge router begins to receive packets it is able to discern the actual unicast IP address of the sending source. With this information the edge PIM router or the designated router (DR) will then build a reverse path forward back to the source or the closest topological leg of an existing distribution tree. At the L2 edge, end stations signal their interest in a given service by a protocol known as Internet Group Management Protocol or simply IGMP. In addition, most L2 switches can be aware of this protocol and actually allow for discretionate forwarding to interested receivers without flooding to all ports in a given VLAN. This process is known an IGMP snooping. In PIM sparse mode, the version of IGMP typically used is IGMPv2 which is non-source specific (This is *,G mode, where * means that the source address is not known.) Once the source is resolved by the RP the state changes to S,G – where the source is now known. All of this is shown in the diagram below.

 

Figure 4. Protocol Independent Multicast Overlay Model

As can be readily seen, this is a complex mix of technologies to establish the single service offering. As a result large multicast environments tend to be touchy and require a comparitively large operational budget and staff to keep running. Large changes to network topology can wreak havoc with IP multicast environments. As a result such changes need to be thought through and carefully planned out. Not all changes are planned however. Network outages force topological changes that can often adversely affect the stability of the IP multicast service. The reason for this is the degree of protocol overlay and the need for correlation of the exact state of the network. As an example, a flapping unicast route could adversely affect an end to end multicast service. Additionally, this problem could be caused at the switch element level by a faulty link, port or module. Mutual dependencies in these types of solutions lend themselves to difficult troubleshooting and diagnostics. This translates to longer mean time to repair and overall higher operational expense.

 

 There must be a better way…

As we noted previously, IP multicast is all about state. Yet at the lowest forwarding element level the operational aspects are stateless. It seems that a valid path forward is to evolve this lowest level to become more stateful and deterministic in the manner in which traffic is handled. In essence, the control plane of Ethernet Switching needs to evolve.

Control Plane Evolution

IEEE has established a set of standards that allows for the evolution of the Ethernet switching control plane into a much more stateful and deterministic model. There are three main innovations that enable this evolution.

Link State Topology Awareness – IS-IS

Universal Forwarding Label –The B-MAC

Provisioned Service Paths – Individual Service Identifiers

This is all achieved by introducing link state protocol (IS-IS) to Ethernet switching as well as the concept of provisioned service paths. These innovations, when combined with a MAC encapsulation method known as MAC in MAC (IEEE 802.1ah) allow for a radical change to the Ethernet switching control plane without abandoning its native dichotomy of control and data forwarding within the network element itself. This means that the switch remains an autonomous forwarding element, able to make its own decisions as to how to forward data most effectively. Yet, at the same time the new stateful nature of the control plane allows for very deterministic control of the data forwarding environment. The end result is a vast simplification of the Ethernet control plane that yields a very stateful and deterministic environment. This environment can then optionally be equipped with a provisioning server infrastructure that provides an API environment between the switching network and any applications that require resources from it. As applications communicate their requirements through the API, the server instructs the network on how to provision paths and resources. Yet importantly, if the network experiences failures, the switch elements know how to behave and have no need to communicate back to the provisioning server. They will automatically find the best path to facilitate any existing sessions and will use this modified topology for any new considerations.  In this model the best of both worlds is found. There is deterministic control of network services, but the network elements remain in control of how to forward data and react to changes in network topology.

 Figure 5. Stateful topology with the use of IS-IS

This technology is known as Shortest Path Bridging, the IEEE standard 802.1aq. As its name implies, it is an Ethernet switching technology that switches by the shortest available path between two end points. The anology here are the IP link state routing protocols OSPFv2 for IPv4 and OSPFv3 for IPv6. In link state protocols each node advertises its state as well as any extended reachability. By these updates, each node gains a complete perspective of the network topology. Each element then runs the Dyjkstra shortest path algorithm to identify the shortest loop free path to every point within the network.

When one looks at the stateless methods of Ethernet forwarding and the need for such antiquated protocols such as Spanning Tree one can not help but see it as a path of promise. The problem is that OSPF v2 and OSPFv3 are ‘monolithic’ routing protocols, meaning that they were designed exclusively to route IP. IEEE knew this of course and found a very good link state protocol that was open and extensible. That protocol is IS-IS (Intermediate System – Intermediate System) from the OSI suite.  One of the first areas of interest is that IS-IS establishes adjacencies with L2 Hello’s, NOT L3 LSA’s like OSPF. The second is that it uses extensible type, length, values (TLV) to move information between switch elements like topology, provisioned paths or even L3 network reachability.  In other words, the switches are ‘topology aware’. Once we have this stateful topology of Ethernet switches, we now can determine what network path data are to take for different application services.

The next step IEEE had to deal with was implementing a universal labelling scheme for the network that provides all of the information that a switch element needs to forward the data. Fortunately, there was a pre-existing standard, IEEE 802.1ah (MAC-in-MAC) that provides just this type of functionality. The standard was initially established as a provider/customer demarcation for metro Ethernet managed service offerings. The standard works on the concept of encapsulation of the outer edge (customer) Ethernet frame (C-MAC) into an inner core (provider) frame (B-MAC) that is transported and then stripped off on the other end of the inner core to yield a totally transparent end to end service. This process is shown in the illustration below.

 

Figure 6. The use of 802.1ah B-MAC as a universal forwarding label in conjunction with IS-IS

The benefits to this model are the immense amount of scalability and optimization that happens in the network core. Once a data frame is encapsulated, it can be transported anywhere within the SPB domain without the need to learn. How this is accomplished is by combining 802.1ah and IS-IS together with another modification and extension of virtualization. We will cover this next.

Recall that IS-IS allows for the establishment of adjacencies at the L2 Hello level and that information moves through these updates by the use of Type, length values or TLV’s. As we pointed out earlier, some of these TLV’s are used for network reachability of those adjacencies. Well, these adjacencies are all based on the B-MAC’s of the SPB switches within the domain. Only those addresses are populated into the forwarding information databases at the establishment of adjacency and the running of the dyjkstra algorithm to establish loop-free shortest paths to every point on the network. As a result, the core Link State Database (LSDB) is very small and is only updated at new adjacencies such as new interfaces or switches. The important point is that it is NOT updated with end system MAC addresses. As a result, a core can support 10’s of thousands of outer C-MAC’s while only requiring a 100 or so B-MAC’s in the network core. The end result is that any switch in the SPB network can look at the B-MAC frame and know exactly what to do with it without the need to flood and learn or reference some higher level fabric controller.

There is one last thing required however. Remember that we still need to learn MAC’s. At the edge of the SPB network we need to assume that there are normal IEEE 802.3 switches and end systems that need to be supported. So how does one end system establish connectivity across the SPB domain without flooding? This is where the concept of constrained multicast comes in. The simplest way to discuss constrained multicast is based on the concept of provisioned service paths. These provisioned paths or I-SID’s (Individual Service Identifiers) are similar to VLAN’s in that they contained a broadcast domain, but they operate differently as they are based on subsets of the dykstra forwarding trees mentioned previously. As the example below shows, now when a station wishes to communicate with another end system, it simply sends out an ARP request. That ARP request is then forwarded out to all required points for the associated I-SID.

 

Figure 7. The ‘Constrained Multicast’ Model using 802.1ah and IS-IS

The end system on the other side receives the request and then responds establishing a unicast session over the same shortest path. As a result, the normal Ethernet ‘flood and learn’ process can still be facilited on the outside of the SPB domain without the need to flood and learn in the core. This vastly simplifies the network core, allows for determistic forwarding behavior as well as provides for the ability for separated virtual network services. The reason for this is shown in the diagram below with a little better detail on the B-MAC for SPB and the legacy standards that it builds upon. As can be seen, the concept of the I-SID is a pseudo evolution of the parent Q tag in the 802.1Q-in-Q standard. The I-SID value is contained within the actual B-MAC and consequently tells a core switch everything it needs to know, including whether or not it needs to replicate it for constrained multicast functionality. Note that the two most difficult problems of multicast distribution are solved. The first being source resolution and the second being the RPF build.

 

Figure 8. IEEE 802.1ah and its relation to other ‘Q’ standards

Once these technologies were merged together into a cohesive standard framework known as IEEE 802.1aq Shortest Path Bridging (MACinMAC) or SPBm, we have as a result a very stateful and scalable switching infrastructure that lends itself very well to the building and distribution of multicast services. In addition, SPB can offer many other different types of services ranging from full IP routing to private IP VPN services. All provisioned at the edge as a series of managed services across the network core. With these layer three services comes the need for the distribution of multicast services across the L3 boundaries. This is true L3 IP multicast routing. Interestingly, SPBm provides some very unique approaches to solving the problem. Again, let us take note that the two most important problems have already been solved.

The figure below shows a SBPm network that is providing multicast distribution between two IP subnets. One of the subnets is a L2 VSN (an I-SID that is associated with VLAN’s). The other subnet is a peripheral network that is reachable by IP shortcuts via IS-IS. Note that as a stream becomes active in the network, the BEB that has the source dynamically allocates an I-SID to multicast stream and that information becomes known via the distribution of IS-IS TLV’s. At the edge of the network the Backbone Edge Bridges (BEB’s) are running IGMP snooping out to the L2 Ethernet edge. The edge SPB BEB in effect becomes the querier for the L2 edge. As receivers signal their interest in a given IP multicast group they are handled by the BEB to which they are connected. which looks for ISIS LSDB (Link State Database) which advertize the multicast stream within the context of the VSN to which the receiver belongs. Once the BEB advertizing the stream and the I-SID are found in the LSDB – the BEB connected to the receiver uses standard ISIS-SPB TLVs to receive traffic for the stream. The dynamically assigned I-SID values start at 16000001 and works up. Provisioned services use values less than 16,000,000. In the case of the L3 traversal, the I-SID is dynamically extended to provide for the build of the L3 multicast distribution tree. 802.1aq supports up to 16,777,215 I-SID’s.  

Figure 9. IP Multicast with SPB/IS-IS using IP Shortcuts and L2 VSN

As the diagram above shows, for an end station to receive multicast from the source, it merely uses this dynamic I-SID to extend the service to end stations 10.10.10.11 and 10.10.10.12 which are members same subnet over the L2 VSN. Conversely, receiver 10.10.11.10 will use the same dynamic I-SID built using the information provided by IS-IS to establish the end to end reverse forwarding path. In this model, IP multicast becomes much more stateful and integrated into the switch forwarding element. This results in a far greater build out capacity for the multicast service. It also provides for a much more agile multicast environment when dealing with topology changes and network outages. Switch element failures are handled with ease because the layered mutual dependence model has been removed. If a failure occurs within the core or edge of the network, the service is able to heal seamlessly due to the fact that the information required to preserve service is already known by the all of the elements involved. Due to the fact that the complete SPBm domain is topology aware, each switch member knows what it has to do in order to maintain established service. As long as a path exists between the two end points, Shortest Path Bridging will use it to maintain service. This is the result of true integration of link state routing into the Ethernet forwarding control plane.

What goes on behind closed doors…

In addition to providing constrained and L3 multicast, SPB also provides for the ability to deliver ‘ship in the night’ IP VPN environments. With SPBm’s native capabilities it becomes very easy to extend multicast distribution into these environments as well. Normally, multicast distribution within an IP VPN environment is notoriously complex dealing with yet more overlays of technology. Within SPBm networks however the task is comparitively simple. As the diagram below illustrates, a L3 VSN (IP VPN) is nothing more than a set of VRF’s that are associated with a common I-SID. Here we run IGMP on the routed interfaces that connect to the edge VLAN’s. Note that IGMP snooping is not used here as the local BEB interface will be a router. IGMP, SPB and IS-IS perform as before and the dynamic I-SID simply uses the established Dyjkstra path to provide the multicast service between the VRF’s. Important to note though is that this service is invisible to rest of the IP forwarding environment. It is a dark network that has no routes in and no routes out. Such networks are useful for video surveillance networks that require absolute separation from the rest of the networking environment. Note though that some services may be required from the outside world. This can be accomodated by policy based routing.

 

Figure 10. IP Multicast with SPB/IS-IS using L3 VPN

As the figure illustrates, the users within the L3VSN have access to subnets 10.10.120.0/24, 10.10.130.0/24, 10.10.140.0/24 and 10.10.150.0/24 within the network which is useful for services that require complete secure isolation such as IP multicast based video surveillance. The end result is a very secure closed system multicast environment that would be very difficult to build with legacy technology approaches.

I can see clearly now…

Going back to figure 4 that illustrates the legacy PIM overlay approach, we see that there are several demarcations of technology that tend to obscure the end to end service path. This creates complexities in troubleshooting and overall operations and maintenance. Note that at the edge we are dealing with L2 Ethernet switching and IGMP snooping, then we hop across the DR to the world of OSPF unicast routing. Over this and at the same demarcation we have the PIM protocol. Each demarcation and layer introduces another level of obscurity where the service has to be ‘traced and mapped’ into each technology domain. As a result, intermittent multicast problems can go on for quite some time until the right forensics are gathered to resolve the root cause of the problem.

With SPB, many if not all of these demarcations and overlays are eliminated. As a result, something that is somewhat of a Holy Grail in networking occurs. This is called ‘services transparency’. The end to end network path for a given service can be readiy established and diagnosed without referring to protocol demarcations and ‘stitch points’. As previously shown, IP multicast services are a primary beneficiary to this network evolution. The elimination of protocol overlays provides for a stateful data forwarding model at the level where it makes the most sense; at the data forwarding element itself.

Network diagnostics becomes vastly simplified as a result. End to end latency and connectivity becomes a very straight forward endeavor. Additionally, diagnosing the multicast service path, some thing that is notoriously nasty with PIM, becomes very straight forward and even predictable. Tools such as IEEE 802.1ag and ITU Y.1731 provide diagnostics on network paths, end to end and nodal latencies and all of this can be established end to end along the serivce path without any technology demarcations.

In Summary

IEEE 802.1aq Shortest Path Bridging is proving itself to be much more than a next generation data center mesh protocol. As previous articles have shown, the extensive reach of the technology lends itself well to metro and regional distribution as well as true wide area. Additional capabilities added to SPB such as the ability to deliver true L3 IP multicast without the use of a multicast routing overlay such as PIM clearly demonstrates the extensbility of the protocol as well as its exteremely practical implementation uses. The convergence of the routing intelligence directly into the switch forwarding logic result is an environment which can provide for extremely fast (sub-second) stateful convergence which is of definite benefit to the IP multicast service model. As such, IP multicast evironments can benefit fomr enhanced state which in turn results in increased performance and scale.

End to end services transparency provides for a clear diagnostic environment that eliminates the complexities of protocol overlay models. This drastic simplification of the protocol architecture results in the ability for direct end to end visability of IP multicast services for the first time.

So when someone asks “IP Multicast without PIM? No more RP’s?” You can respond with “With Shortest Path Bridging, of course!”

I would also urge you follow the blog site of esteemed colleague, Paul Unbehagen. Chair and Author of the IEEE 802.1aq “Shortest Path Bridging” Standard. you can find it at:

http://paul.unbehagen.net/

 

For more information please feel free to visit http://www.avaya.com/networking

Also please visit our VENA video on YouTube that provides further detail and insight. you can find this at: http://www.youtube.com/watch?v=ZSbycaOvy5I

 

Advertisements

3 Responses to “How would you like to do IP Multicast without PIM or RP’s? Seriously, let’s use Shortest Path Bridging and make it easy!”

  1. Johnny Hermansen Says:

    Great stuff ED!!!!!

  2. Frances Says:

    Those who work closely with Avaya call recording should take advantage of the information found in this post.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s


%d bloggers like this: