Archive for November, 2012

The evolution of E-911

November 2, 2012

                                                 NG911 and the evolution of ESInet


If you live within North America and have ever been in a road accident or had a house fire then you are one of the fortunate ones who had the convenience and assurance of 911 services. I am old enough to remember how these types of things were handled prior to 911. Phones (dial phones!) had dozens of stickers for Police, Fire and Ambulance. If there were no stickers then one had to resort to a local phone book that hopefully had an emergency services section. To think of how many lives that has been saved by this simple three digit number is simply boggling. Yet to a large degree we all now take this service for granted and assume it will just work as it always has regardless of the calling point. We also seem to implicitly assume that all of the next generation capabilities and intelligence that is available today can just automatically be utilized within its framework. This article is intended to provide a brief history of 911 services and how they have evolved up to the current era of E911. It will also talk about the upcoming challenges for extending the service into a true multi-tenant, multi-service framework that can leverage the latest technology offerings. In short, we are talking about the advent of Next Generation (NG911) Emergency Services infrastructure.

Conceptually, 911 is very simple. As the figure below illustrates, a person reporting an emergency calls the three digit number. The original intent was to provide the public with a single point of contact for all emergencies. Prior to 911, you would have a number for Police, a number for Fire, and a number for Medical; and to make matters worse, each jurisdiction would have it’s own unique number. That would be a dozen numbers to remember for a your town and 3 of your neighboring towns. It was out of this that “E”911 was born to deliver even more functionality. In addition to providing a single ubiquitous number regardless of where you were located, it provided ‘selective routing’, or automatic routing based on the originating numbers documented location in the telephone company database. It also provided some new intelligence on the wire, called Automatic NumberIdentification or ANI. You probably are more familiar with it’s street name of ‘Caller ID’.

Figure 1. Traditional 911 PSAP

This however was back in the days of land line type phone services. This is a growing minority in this age of mobile communications. Originally embodied by the advent of cellular phones, the industry has evolved to facilitate both local and wide area wireless technologies as well as PDA’s, tablets and yes, still cell phones. The problem with this is that the old original 911 model became increasingly broken and in need of an update to handle this new mobile phenomenon. Think of it, if I am driving down Interstate 90 in Boston and I called 911, how do they know that I am in Boston and not in Ontario, New York where my billing address is? At first, there was no such capability and some folks lost their lives due to longer response times incurred. For a while the first thing the 911 contact needed to do was validate and confirm the location assuming that if it was mobile there was no other way. Fortunately, this led to the evolution of PHASE 1 Cellular E911 services which allowed for the correlation of cellular 911 calls to a particular antennae face on a tower. Each cellular carrier has 3 antennas on each tower that provide service to a 120 degree arc of the compass. When a call is received in a particular sector, it is routed to the PSAP that has the primary coverage of that sector. PSAPs can also transfer calls between themselves, so if a call was misrouted once in a while, it could be easily warm handed off to the proper authority. There are several technologies that allow for this and they are summarized briefly in the illustration below.

Figure 2. Methods for mobile device location

As one can readily imagine, a wireless provider can tell which cell your device is operating in when the call to 911 is made. This can be a fairly vast geography however. The actual number varies depending on the technology but typically can be a radius of 10 to 20 miles. Accuracy is gained by leveraging different radio antenna sources by a method known as triangulation where a closer proximity can be gained by usng multiple signal points of reference. Lately, in the newer Droids and iPhones additional GPS capabilities lend to an accuracy of meters.

Another evolution is Assisted GPS or A-GPS. A-GPS works on the merging of GPS and network related technoloigies to increase the accuracy and decrease the ‘fix time’ to determine location. A-GPS uses network related resources and in turn use satelite services when signal conditions are poor due to signal weakness or interference. The typical A-GPS device will not only have GPS hardware but also Internet access as well. Most modern SmartPhones and PDA’s fit this capability mix. As a result a mobile user’s location can be determined with a great degree of accuracy.


But that was then…

Recent (within the decade) large scale emergencies both man made and otherwise have taught us a few things about events of this proportion. First, infrastructure is damaged and along with it communications elements. At times, communications can be lost all together for extended periods of time. Second, events of this scale require coordinated logistics between multiple organizations and their resources. When we put these two things together we see a real issue in that coordinated logistics requires reliable communications! Events like NYC 9/11, Katrina and even the BP Oil Spill crisis have shown that no single agency can address all of the needs that require response. In short, the ability to communicate effectively is paramount to effective large scale emergency response. Partcularly in large scale events of wide spread geographic proportions.

None the less, these events serve to remind us that they can render useless much of the technology we today take for granted. Additionally, the traditional E911 network is closed in architecture and very regional in the way it is deployed. This makes wide scale geographic coordination of information & resources very difficult. Emergencies that cross PSAP boundaries will often require additional impromtu adhoc communications that often lack context or clarity.

Now let’s add in the new abilities that technology brings to the table. Big Data analytics is my personal and professional favorite. In emergency situations, information is essential, but too much information without context will tend to slow down the emergency response. Contextual prioritized information and the timely delivery of it has been shown to increase both the timeliness and the accuracy of the response. A later example will clarify. The major point here is that E911, which was architected to handle the mobile emergency call, is still effective for that purpose but not effective for these upcoming challenges. NG911 is intended for the ‘other side’ of the equation, the agencies and services (fire, hazmat, medical response) that will require detailed and reliable communications and information to most effectively deal with the situation at hand.

All of this means that the supporting network must be capable of multi-service and multi-tenency. We will cover these two terms in the next few paragraphs. Both of these terms are part of the normal service provider nomenclature. Multi-service is ability of the network to deliver appropriate service level assurance for the proper operation of end to end applications. The categories most often thought of are voice, video and data but can be more granular to include data for certain application types so that some applications can be prioritized over another. Multi-tenancy is the ability to support multiple user, service or even application groups and keep the resources that they use totally separate from one another. At the same time, there may be applications that do have the requirement to cross tenant boundaries, such as IP voice or email but will be constrained to cross over a security demarcation were such rules can be enforced. Rule number one of multi-tenancy is tenant A should never see tenant B’s traffic or visa versa unless otherwise provisioned to do so as per above. Also tenant A should never be able to impinge on the resources allocated to tenant B, again unless otherwise provisioned. These are not easy bars to reach with traditional networking technology and practices. Typically, in order to do this to the scale required, we require a complex mix of technologies such as those shown in the diagram below. MPLS IP VPN services has really been the only technology that has been up to par to meet these requirements. Unfortunately, this means that many state and local governments are either forced to depend on a 3rd party public service provider or directly implement MPLS themselves. Those that do find that the technology is expensive, complex and requires an inordinately high staff count to properly implement and maintain it.

Recently however, there is another technology that has been ratified by the IEEE known as ‘Shortest Path Bridging’ or IEEE 802.1aq. This standard provides for a radical evolution to the Ethernet forwarding control plane that allows for both multi-tenency and multi-service capabilities without the complexities of legacy approaches. Previous articles have discussed both the methods and services that allow for these capabilities. As a result, we will not go into these areas with any depth here. To summarize, this is all achieved by introducing a link state protocol (IS-IS) to Ethernet switching as well as the concept of provisioned service paths. These innovations, when combined with a MAC encapsulation method known as MAC in MAC (IEEE 802.1ah) that serves as a universal forwarding label, allow for a radical change to the Ethernet switching control plane without abandoning its native dichotomy of control and data forwarding within the network element itself. This means that the switch remains an autonomous forwarding element, able to make its own decisions as to how to forward data most efficiently and effectively. Yet, at the same time the new stateful nature of the 802.1aq control plane allows for a very deterministic control of the data forwarding environment. The end result is a vast simplification of the Ethernet control plane that yields a very stateful and deterministic environment.

The figure below shows a comparison between MPLS and SPB. Note that there is a vast simplification in the number of protocol state machines required in order to support a given service. This simplification not only results in ease of use but also drastically increases the reachable scale for Ethernet as well. This is important for ESInets as the number of agencies and entities that will require access will increase as time moves on and NG911 technology evolves.

Figure 3. A comparison of MPLS to SPB


Ships in the Night, but I may want to jump ship if required…

As we look closer at the concepts of multi-tenancy for emergency services we see that the requirements can be fairly dynamic. As an example, during normal working operations entities may be quite separate from one another. Normal day to day operations might not require a lot of cross communications. There may be some common services that might be used such as email or Voice over IP as is often the case with State and Local Government, but by and large each agencies applications as well as traffic are largely separate.

During emergencies however this normal pattern may not apply. Certain entities may need to be in very tight logistical coordination and as a result have to communicate in a very seamless fashion with applications that may straddle agency boundaries. A good example is a hazardous chemical spill. In a typical scenario you will have a large number of agencies or entities involved in the response. For instance, there will obviously be the police to cordon off the area and maintain a ‘do not cross’ line. You will also have the Fire Dept. with particular HazMat teams that are matched according to experience. You may also have several area hospitals that are alerted and set up with triage teams to handle the exposed victims as well as ambulance services to provide transport. Obviously, the teams selected should have previous experience with such events and preferably even with the particular substance involved. The ability to match experience to requirements is a very key element to a successful response. This is where data analytics plays a key role. Another key element is to enable these teams to communicate effectively and with as much context and supporting data as possible, but it has to be filtered so as not to overload response personnel with superfilous information.

The figure below illustrates some of the potential that SPB could bring to the table to address these requirements. As shown, each entity in question has their own isolated L3 IP VPN environment that provides for normal day to day operations. As an emergency occurs however, a new L3 IP VPN environment can be created for the event response teams. Members of these teams will be selected and provided with enhanced credentials to access this new IP VPN environment. Note that these teams will have bi-directional communication capabilities. Both normal day to day services such as email as well as dedicated or special services for the emergency response can be provided to this team. Additionally, as they use these dedicated services they are isolated from the other VPN environments both from a service and resource perspective. This is important, as the applications that are being used during the emergency response might be high bandwidth such as video or insistent such as east/west flows within the data centers to support outbound data for field application use. In either case they most definitely will be critical and require absolute gaurantee of services reliability.

Figure 4. A hazardous material spill emergency

As the figure above shows, this new L3 IP VPN environment will exist for as long as required by the emergency response teams and can even exist for as long after the event as neccesary for forensics and/or audit investigations. Further, if additional entities are discovered to be required during the course of the event or for investigations afterwards, it becomes very easy and straight forward to extend the L3 IP VPN to include these new members without the need to do major rearchitecting of the service. As shown below, investigatory units from both the police and fire are required after the event has transpired. At each agency new memberships to the special L3 IP VPN environment are added and the personnel that are assigned to the investigation units are provided access via centralized or distributed access controls. These virtual Service networks are then added to the L3 IP VPN environment to facilitate their ability to communicate with the wider team. Note also, that certain critical real-time elements such as 911 dispatch, ambulance and emergency triage are no longer required in the post event L3 VPN so they are effectively dropped from the membership but can easily be added again if required. The main point in all of this is that unlike MPLS which has very complicated and somewhat rigid provisioning practices that prohibit such dynamic behavior, SPB due to its vast simplification of the protocol substrate allows for quick re-provisioning of the network environment without the complexity. Indeed the whole solution approach has a profound consequence; it has been largely relegated to the practice and federation of identity management.

Figure 5. Post event forensics L3 VPN


When the world is falling apart…

As we have learned from various wide scale emergencies both man made and otherwise such as NYC 9/11 and Katrina and more recently Sandy, there is often significant infrastructure damage that occurs with a disaster event. Such damage can be a critical impediment to the responding emergency teams. Often complex logistical data is provided by response data centers that correlate and filter information out to the field teams. Failures in the response center or in the network path between the field teams and the response center can cause a major set of logistical complications and possibly cost additional lives.

In one of my previous articles titled “Data Storage: The Foundation and potential Achilles Heel of Cloud Computing”. I illustrated the critical importance of the data footprint and the requirement of mobile virtual machaines to have access to these data stores regardless of location. Also, many applications are composite instances that are the result of several server exchanges on the data center back end. This is further complicated that in order to provide a truly resilient data fabric, multiple data centers are required at geographically dispersed locations. As a result, these data stores need to be replicated and updated on a very consistent basis, sometimes up to a full data journaling or copy on write requirement. Additionally, Virtual Machines need to be migrated or at the very worst whole scale site recovery be initiated. As this occurs, mapping to data stores must be preserved including all required network paths. Also, as the migration of the VM, Cluster or whole Data Center occurs, users will require the adequate communication paths to seamlessly continue in the use of the applications they require to do thier jobs. The figure below illustrates these critical relationships and the communication paths required to facilitate them.

Figure 6. Required Services and Communication Paths

Interestingly SPB provides a very optimal solution in that the networking technology is ‘topology aware’. As such, its convergence time is extremely fast, ranging in the 100’s of milliseconds. This not only includes layer 2 services like VLAN’s but layer 3 services such as IP VPN’s and IP multicast as well. As major outages occur within the fabric each individual SPB node will natively make the forwarding decision based on its shortest path knowledge of the network. If a path exists, SPB will use it. As the diagram below shows, several major outages can occur at multiple points in the end to end topology but if the mesh fabric is engineered correctly, there will always be an alternate route that is available for use. As a result whole regions of the network can fail without an overall failure to the network as a whole. Redundant links can be wireline (optical) or wireless such as microwave. As long as they provide point to point communications links for the SPB nodes and allow for the protocol to establish adjacencies they are candidate technologies for transport linkage.

Figure 7. Shortest Path Resiliency

As is shown above, both data centers and users have valid communication paths available despite the fact that a good portion of the network is down. This is an important trait for reliable communication infrastructures, particulary those that are used during emergencies. Note that through all of this the normal NG911 service is running as normal with no disruption of services or outage of call services.

Give me the Bull Horn please

In emergencies it is often a very strongly desired trait to broadcast alerts to all members of a given team or set of teams. This capability can increase the effectiveness of the field response teams but also may very well save their lives. In the past this feature was leveraged via LMR or Land Mobile Radio. While such technology still has valid use and is often gatewayed at the edge for voice communications, other packet based technologies can deliver richer information such as video and graphics such as weather and radar maps or building blueprints. The major limitation for these newer forms of wireless communication are that they require an IP multicast infrastructure which is difficult to scale and support. Additionally, major network outages tend to adversely affect the multicast service often to the point of rendering it unusable. As mentioned earlier, SPB can provide convergence of multicast services on the order of 100’s of millisenconds. This is accomplished by eliminating the typical protocol overlay model of networking shown in figure 3 and creating a collapsed route switching substrate which is Shortest Path Bridging. As the network is shortest path tree aware, it is also multicast distribution tree aware. My previous article discusses multicast in SPB and the major advancements in scale, performance and convergence time it provides. The diagram below shows a more symbolic representation in a major alert going out from the response center to not only the field response teams to the NG911 PSAP’s as well.

Figure 8. SPB Multicast used to provide all points alerts via multicast

With traditional networking technologies this would be a very difficult proposition, requiring the interaction of multiple virtualized PIM domains within MPLS. With SPB, it’s inherent multi-tenant capabilties lends to easy distribution of multicast trees each in separate VPN environments within one network domain. Additionally, there is the benefit of sub-second convergence of the network in lieu of failure or outage which would be fast enough to be totally transparent to the multicast services running over it whether they are audio, video, graphics or data. These traits are highly desireable and lend themselves well to critical communications infrastructure. Real time services become much more reliable when a resilient scalable networking technology is used as the infrastructure substrate. This also compliments the various layers of resiliency that can be built into other functional parts of the end to end solution such as servers and storage as well as whole data centers. The end result is a very strong resiliency plan that can stand the worst of impact and still survive… as long as valid SPB links exists.

In Summary

Let’s face it. No one ever wants to call 911. But when an emergency occurs we are always thankful for the service that it provides. Like many, I recall a time prior to the service. Many more have known nothing but. As this critical civil service moves into the future and begins to leverage the new technologies that are available it will become more and more important to pay attention to the network infrastructure that will support them. The ‘Cloud’ works by the reach of the network. The services remain up by the resiliency that the network provides. In reality, this is nothing new. Service Providers have been using such pratices for years. The thing that is really new is that IEEE 802.1aq Shortest Path Bridging provides for an infrastructure that is no longer out of reach for most State and Local Governments who are now analyzing the network requirements for true NG 911 and ESInet evolution.

I would like to thank my esteemed colleague Mark Fletcher, a fellow Avaya Engineer for his input and mentoring for this article. Mark has extensive experience in E911 and is an industry recognized expert in his field.