A Comparison of Hybrid Switching Systems 1. Introduction --------------- The essential function of a data network is to forward packets from source to destination. Traditionally there have been two distinct mechanisms used for packet forwarding, switching at the data-link layer, or layer 2, and routing at the network layer, or layer 3. Switches are usually engineered to have low cost, high performance, and low management overhead. In contrast, routers are typically designed to provide greater functionality, control and configurability at the expense of these other factors. Consequently, common network architectures use switches within small local administrative domains, and routers to control traffic across these domains. The first real attack on this architecture came from ATM, which was designed as a switched data-link layer protocol with additional signaling protocols to manage large networks end-to-end. ATM was not successful at displacing existing architectures, in part because its connection oriented model was a poor match for the popular datagram model employed by IP and IPX. Nevertheless, the desire to take advantage of low cost scalable ATM switching hardware was clearly a major factor leading to the development of new ``hybrid switching'' systems. In this paper, I'll discuss three such systems, Ipsilon's IP Switching, Cisco's Tag Switching, and 3Com's Fast IP. Each of these schemes provides a different way of mapping the network layer forwarding mechanism onto the data-link layer. The primary goal in this mapping process is to take advantage of the lower cost and higher performance that comes from hardware forwarding implementations. For each system I will concentrate on issues relating to scalability and support for Quality of Service (Qos). I will not discuss hardware support for network layer routing (ala Rapid City), although the emergence of such hardware may challenge the implicit economic assumption behind hybrid switching. 2. Ipsilon's IP Switching ------------------------- Ipsilon was the first to popularize the idea of hybrid switching with its IP Switching architecture. The Ipsilon approach maps traffic from layer 3 to layer 2 based on dynamic traffic ``flows''. A flow is an extended stream of packets sent from one host to another. This abstraction captures the notion that datagram networks act on dependent groups of packets that are driven by extended application conversations. Some flows are very short (e.g requesting a DNS lookup), while others can be relatively long (e.g. downloading a web page). Ipsilon capitalizes on the fact that common traffic patterns favor moderate to long lived flows that are simple to identify. At a high-level, an IP switch recognizes these long lived flows and maps them onto switch identifiers than can then be forwarded directly in hardware. Short-lived flows are routed in the traditional fashion. At the core of an Ipsilon network is a high speed layer 2 switch, such as ATM (we will assume ATM in this discussion although Ipsilon is also looking at Frame Relay and DEC provides support for an FDDI switch). A ``switch controller'' is connected to a dedicated port on this switch and implements the mapping and routing functions. When a host sends data through an IP switch it uses a well known virtual circuit (VC) identifier. The switch forwards all such traffic to the switch controller, which in turn looks up the next hop and output port and sends the packet back through the switch. In addition to this standard routing function, the switch controller is also responsible for ``flow classification''. Using heuristics based on source and destination address, TCP port number, and so on, the switch controller classifies those packets which have a high probability of being part of a long lived flow. Once such a flow has been identified, the switch controller sends a message to the _upstream_ switch controller associating the flow with a unique VC. The protocol for providing this flow information is called the Ipsilon Flow Management Protocol (IFMP)[Newman et al 96a]. When the upstream switch controller receives this message it will send all future packets from the specified flow using the specified VC. These packets are still routed using the switch controller until it receives a complementary message from the _downstream_ switch controller. At this point the switch controller can remove itself from the flow processing task and simply instruct the switch to directly switch all packets from a particular incoming VC to a particular port and outgoing VC (the Ipsilon General Switch Management Protocol - GSMP is described in [Newman et al 96b]). Mappings between flows and VC's are made with an explicit timeout and if they are not explicitly refreshed then the mapping will be broken and the VC's can be reused. While it is possible for this mapping process to be applied end-to-end between hosts (and some vendors such as Adaptec provide ATM host drivers to support this), few enterprises have end-to-end ATM networks. Consequently, Ipsilon sells add-on ethernet flow modules which take incoming ethernet frames and convert them into ATM cells to be switched on an ATM backbone. The ethernet flow modules perform flow classification and participate in IFMP conversations, so such traffic can be switched as well. 2.1 IP Switching Scalability ---------------------------- The fundamental argument supporting the scaling of IP switching is that hardware crossbars and multi-stage switches can scale to high port densities and high bandwidths while this is more difficult and expensive for traditional routers. On careful examination however there are a number of other concerns which affect this scheme's scalability. The two most immediate concerns have to do with the traffic based nature of Ipsilon's mapping policy, namely how many packets are switched, and how many distinct flows can the system handle. The effectiveness of IP switching is directly a function of the percentage of flows that are long lived. In Ipsilon's experiments they found that 84% of packets were long enough and simple enough to be switched [Newman et al, 96c, Newman et al 96d]. These results represent data from a particular exchange point (NLANR) at a particular time and are tightly tied to the success of TCP-based connection oriented applications such as ftp and http. Should traffic patterns change to have shorter bursts, or should this happen artificially from high degrees of aggregation then IP switching may perform less well (although clearly current Internet trends favor their scheme). For instance, Scalable Reliable Multicast (SRM) protocols such as used by the popular white board application ``wb'', rely on distributed application updates which may be provided by any client in a multicast group. Ipsilon supports IP multicast by mapping long-lived multicast transmitters to particular ATM multicast VC's, and so may not be well suited to the random traffic produced by this new communications style. As packet rates and line bandwidths scale up exponentially the switch controller may also become a bottleneck. In the current implementation, the switch controller is simply a 200Mhz Pentium Pro running a specialized version of FreeBSD, and is capable of routing roughly 50,000 to 100,000 packets per second. If 84% of the packets are switched, then somewhere between 3 million and 7 million packets per second the switch controller cannot handle the remaining 16% of the packets which must be routed. Similarly, each switch controller must be able to support the overhead of current routing protocols (e.g. OSPF) and so a particular IP switch may have its performance limited by ``route flap'' in large networks (since Ipsilon does not support BGP and is clearly oriented towards the enterprise market this may not be a critical concern). Also there is a question of the number of flows which real implementations of IP switching can handle. The ATM cell format specifies only 16 bits for the VCI field, limiting the number of unique flows a switch can handle to ~65000, and many switches do not even support all 16 bits. Since each flow receives its own VC, IP switching is limited in the number of simultaneous flows that can be supported. If the timeouts on flow to VC mappings are too long this pressure is even higher, yet if the timeouts are too aggressive then some flows may churn between the switched and routed state. 2.2 IP Switching QoS -------------------- Ipsilon is developing support to differentiate between qualities of service (QoS). The details of their implementation are not yet clear, but there are two basic elements. First, the flow classification process is modified to account for flow qualities (which may be communicated via the Resource Reservation Protocol (RSVP) for individual flows or by a network manger using Class Based Queueing (CBQ)) and GSMP is extended to allow the flow to VC mapping to contain information about QoS. In current generation ATM switches, such QoS information is usually limited to either best-effort or real-time. Next generation switches are likely to support some variant of Weighted Fair Queuing (WFQ) with a small number of queues. One interesting challenge will be mapping higher level QoS parameters onto those provided by the particular underlying switch services. 3 Cisco Tag Switching --------------------- The Cisco tag switching scheme is considerably different from Ipsilon's. Cisco chose to base their layer-3 to layer-2 mapping on network topology rather than network traffic[Rekhter et al 97]. In a tag switched network, edge routers forward packets according to standard routing practice. However, in the output stage a forwarded packet is labeled using a small tag that indicates the route to be taken. Intermediate switches use this tag to forward packets at wire speed until they reach their destination. The routers are responsible for synchronizing their tag databases and programming the switches using either extensions to traditional routing protocols (e.g. BGP) or a specialized Tag Distribution Protocol (TDP)[Doolan et al 96]. Tag switching allows these tag mappings to be driven by upstream nodes, or by downstream nodes, and they can be constructed in advance or on demand. Cisco seems to prefer downstream tag allocation unlike Ipsilon because they believe it is easier to integrate with ATM (at least in part because they have tried hard to allow their switches to support both tag switching and traditional ATM signaling). Much like IP switching tag mappings are temporary and must be refreshed to stay alive. Cisco promotes tag switching as being independent of network and data-link protocols but this is to a large extent a marketing issue. Currently ATM is the only established layer-2 media which has the tagging capabilities required to support tag switching without any modifications to the frame format. To support tag embedding within other protocols requires either a non-standard ``shim'' layer (e.g. a few works at the from of the ethernet packet) or the re-purposing of protocol fields (e.g. Cisco has suggested that the IPv6 flowspec field could be used for tag switching). 3.1 Tag Switching Scalability ----------------------------- Tag switching has answered a number of the scalability questions of IP switching, but presents several unique problems as well. Because tag switching is driven by topology instead of traffic, it's need to VC's is proportional to the number of routes as opposed to the number of flows (ie multiple flows may be aggregated under a single tag). However, since our current Internet routing tables are close to the maximum VC space it seems reasonable that VC reuse may become an issue for tag switching as well. Another advantage is that because tag switch routers generally classify packets pro-actively (at the edge) instead of reactively (as flows are discovered) they can switch almost all packets instead of a limited percentage. The importance of this advantage, of course, depends on the characteristics of the network traffic. Finally, the original tag switching proposal allows tags to be ``stacked'' (ie. multiple tags in a single packet) to efficiently support hierarchical routing. For instance, a tag switch router might label a packet with two tags, the first to forward the packet to a new administrative domain, the second a tag unique to that domain specifying a local host or network (similar to the concept of loose source routing). However, there are a few other technical problems presented by tag switching, notably dealing with routing loops and performance at packet exchange points. The IP protocol was designed with a Time-to-Live field (ttl) to control the lifetime of a packet within the network. Keeping packet lifetimes short is important so that sequence numbers can be safely reused and so network bandwidth is not wasted in the event of routing loops. Consequently routers in IP networks are responsible for decrementing the ttl field and discarding any IP packet for which ttl becomes zero. Tag switches are programmed using the topology information computed by edge routers so they are subject to loops, however they don't decrement ttl (an ATM switch has no idea what kind of network layer it is forwarding). Therefore, its possible for a network layer routing loop to create a data-link layer forwarding loop. Without the ttl mechanism a tag switched network may waste enormous amounts of bandwidth from these short-lived routing loops. Consider that if there is a loop in the route from A to some destination B. Traffic sent to B from A will accumulate within the path of the loop and will persist until the loop disappears, creating congestion for any other packets in transit through those links (Even if they are not destined for B). Inter-domain network exchange points present another problem. Cisco positions tag switching primarily as a solution for scaling wade area networks. However, it does not address one of the critical performance problems of these networks, namely scaling packet forwarding through inter-domain exchange points (e.g. the MAE's, FIX's, etc). Because such exchange points perform route aggregation, multiple flows will arrive there with the same tag but ultimately different destinations (route aggregation is necessary to keep the size of the routing problem small, in this case at the expense of the forwarding problem). Consequently, at these aggregation and de-aggregation points, tags must be ``split'' and the individual flows must be routed (undermining the benefit of using tag switching). In fact, this need to multiplex flows and tags requires that each of Cisco's tag switches must be capable of performing traditional IP forwarding tasks (ie. you can't just put your routers on the edges, the switches must be able to route if they have to as well). 3.2 Tag Switching QoS --------------------- Cisco's solution for QoS seems similar to Ipsilon's. For data flows with explicit quality constraints, individual tags are allocated, and these tags are mapped onto the underlying switch QoS features. Cisco suggests that there may be a limited number of quality equivalence classes, and so it may be possible to share even these specific QoS tags as well (if not, the QoS tags will be managed in a purely flow based fashion). One significant difference between the Cisco scheme and Ipsilon's is in which boxes can control the QoS decision. IP switching requires each switching controller to perform an independent classification, while Cisco's classification happens primarily at the edges of the network. While purely edge-based classification is simpler, distributing the problem allows transit networks to enforce their own QoS policies independently. 4. 3Com's Fast IP ----------------- 3com is a relative newcomer to the hybrid switching field and technical documentation on Fast IP is scarce[Hart 97]. Unsurprisingly (since 3com controls the host ethernet market and not the router market) the 3com scheme is host-centric. Normal intra-subnet communications work as usual, as do the initial packets for an inter-subnet transmission (Fast IP requires routers to be present to forward packets for which layer 2 paths are not found) Once a inter-subnet flow has started however, the sending host uses the distributed Next Hop Routing Protocol (dHNRP) to discover if there is an end-to-end layer 2 path between it and the destination. The low level mapping uses draft IEEE VLAN standards. Roughly speaking, each subnet is mapped to a VLAN id which are learned by the switches they pass through. The dHNRP protocol is a way for a sending host to enquire about the VLAN id's of the destination and the intervening hops. If a complete VLAN id path can be constructed then the host labels future packets for that destination with the appropriate id (and they are switched directly). 3com claims that this strategy doesn't require routers to be upgraded and allows the incremental deployment of host software (and presumably switches supporting VLAN's) Fast IP Scalability ------------------- 3com targets Fast IP for the small enterprise and readily admits that Fast IP (as all VLAN based standards) does not scale to large networks. They've announced a partnership with Cascade, who's IP Navigator WAN Switching architecture is compatible with Fast IP. IP Navigator is even more poorly documented than Fast IP, but its clear that it uses Ipsilon's IFMP at the edges of the network to incoming flows with the appropriate VLAN id's. 4.1 Fast IP QoS --------------- 3com has a very different approach to QoS than other vendors. 3com sees the need for traffic differentiation at the granularity of hosts rather than traffic classes or individual flows. Under the 3com scheme, a priority is associated with a particular host either statically, according to which uses logs in, or some other host priority assignment policy. All traffic from that host then receives that priority. This particular approach to QoS seems poorly considered, as it doesn't consider the different network demands placed by different applications. Moreover, since it is host-centric it makes the network management problem harder rather than simpler. Since one of the principle benefits of QoS support is thought to be better management of network resources (e.g. using CBQ to give telnet higher priority than ftp) this seems to be a significant deficiency. 5. Conslusion ------------- In this paper I've looked at three relatively new approaches for mapping layer 3 forwarding decisions onto layer 2 switches. Ipsilon is currently the only one of these vendors to be shipping a product and is clearly the most mature. Cisco's proposal has a number of elegant features but is still not completely worked out, and is really geared more towards the WAN environment than the campus or enterprise. 3com's offering attempts to leverage its control over the host networking market, while keeping compatibility with existing campus network infrastructures. However, its host-centric structure is a network manager's nightmare. Finally, the economic assumptions behind hybrid switching (layer 3 is slow, layer 2 is fast) may be seriously threatened by an emerging generation of Gigabit Ethernet switches with custom layer-3 switching hardware. [Newman et al, 96a] P. Newman, W. Edwards, R. Hinden, E. Hoffman, F. Ching Liaw, T. Lyon, and G.Minshall Ipsilon Flow Management Protocol Specification for IPv4 Version 1.0 RFC 1953 May 1996 [Doolan et al 96] P. Doolan, B. Davie, D.Katz, Y.Rekhter, and E. Rosen Tag Distribution Protocol Internet Draft, September 1996 [Hart 97] John Hart Fast IP: The Foundation for 3D Networking 3COM White paper, March 1997. [Newman et al, 96b] P. Newman, W. Edwards, R. Hinden, E. Hoffman, F. Ching Liaw, T. Lyon, and G.Minshall Ipsilon's General Switch Management Protocol Specification Version 1.1 RFC 1987 August 1996 [Newman et al, 96c] P. Newman, T.Lyon and Greg Minshall Flow Labeled IP: A Connectionless Approach to ATM Proceedings of the 1996 IEEE Infocom conference San Francisco, CA, Marc 1996. [Newman et al, 96d] P. Newman, T.Lyon and Greg Minshall Flow Labeled IP: Connectionless ATM Under IP Networld+Interop, LasVegas, April 1996 [Rekhter et al 96] Y. Rekhter, B. Davie, D. Katz, E. Rosen, G. Swallow and D. Farinacci Tag Swtiching Architecture - Overview Internet draft, January 1997