A Comparison of Hybrid Switching Systems


1. Introduction
---------------

The essential function of a data network is to forward packets from
source to destination.  Traditionally there have been two distinct
mechanisms used for packet forwarding, switching at the data-link layer, or
layer 2, and routing at the network layer, or layer 3.  Switches are
usually engineered to have low cost, high performance, and low
management overhead.  In contrast, routers are typically designed to
provide greater functionality, control and configurability at the
expense of these other factors.  Consequently, common network
architectures use switches within small local administrative domains,
and routers to control traffic across these domains.   

The first real attack on this architecture came from ATM, which
was designed as a switched data-link layer protocol with additional
signaling protocols to manage large networks end-to-end.  ATM was not
successful at displacing existing architectures, in part because
its connection oriented model was a poor match for the popular
datagram model employed by IP and IPX.  Nevertheless, the desire to
take advantage of low cost scalable ATM switching hardware was clearly
a major factor leading to the development of new ``hybrid switching''
systems.

In this paper, I'll discuss three such systems, Ipsilon's IP
Switching, Cisco's Tag Switching, and 3Com's Fast IP.  Each of these
schemes provides a different way of mapping the network layer
forwarding mechanism onto the data-link layer.  The primary goal in
this mapping process is to take advantage of the lower cost and higher
performance that comes from hardware forwarding implementations.
For each system I will concentrate on issues relating to scalability
and support for Quality of Service (Qos).  I will not discuss hardware
support for network layer routing (ala Rapid City), although
the emergence of such hardware may challenge the implicit economic
assumption behind hybrid switching. 

2. Ipsilon's IP Switching
-------------------------

Ipsilon was the first to popularize the idea of hybrid switching with
its IP Switching architecture.  The Ipsilon approach maps traffic from
layer 3 to layer 2 based on dynamic traffic ``flows''.  A flow is an
extended stream of packets sent from one host to another.  This
abstraction captures the notion that datagram networks act on
dependent groups of packets that are driven by extended application
conversations.  Some flows are very short (e.g requesting a DNS
lookup), while others can be relatively long (e.g. downloading a web
page).  Ipsilon capitalizes on the fact that common traffic patterns
favor moderate to long lived flows that are simple to identify.  
At a high-level, an IP switch recognizes these long lived flows and
maps them onto switch identifiers than can then be forwarded directly
in hardware. Short-lived flows are routed in the traditional fashion.

At the core of an Ipsilon network is a high speed layer 2 switch, such
as ATM (we will assume ATM in this discussion although Ipsilon is also
looking at Frame Relay and DEC provides support for an FDDI switch).
A ``switch controller'' is connected to a dedicated port on this switch
and implements the mapping and routing functions.  

When a host sends data through an IP switch it uses a well known
virtual circuit (VC) identifier.  The switch forwards all such traffic
to the switch controller, which in turn looks up the next hop and
output port and sends the packet back through the switch.  In addition
to this standard routing function, the switch controller is also 
responsible for ``flow classification''.  Using heuristics based on
source and destination address, TCP port number, and so on, the switch
controller classifies those packets which have a high probability of
being part of a long lived flow. 

Once such a flow has been identified, the switch controller sends a
message to the _upstream_ switch controller associating the flow with
a unique VC.  The protocol for providing this flow information is
called the Ipsilon Flow Management Protocol (IFMP)[Newman et al 96a].
When the upstream switch controller receives this message it will send
all future packets from the specified flow using the specified VC.
These packets are still routed using the switch controller until it
receives a complementary message from the _downstream_ switch
controller.  At this point the switch controller can remove itself
from the flow processing task and simply instruct the switch to
directly switch all packets from a particular incoming VC to a
particular port and outgoing VC (the Ipsilon General Switch Management
Protocol - GSMP is described in [Newman et al 96b]).  Mappings 
between flows and VC's are made with an explicit timeout and if they
are not explicitly refreshed then the mapping will be broken and the
VC's can be reused.  

While it is possible for this mapping process to be applied
end-to-end between hosts (and some vendors such as Adaptec provide
ATM host drivers to support this), few enterprises have end-to-end ATM
networks.  Consequently, Ipsilon sells add-on ethernet flow modules
which take incoming ethernet frames and convert them into ATM cells
to be switched on an ATM backbone.  The ethernet flow modules perform
flow classification and participate in IFMP conversations, so such
traffic can be switched as well.  


2.1 IP Switching Scalability
----------------------------

The fundamental argument supporting the scaling of IP switching is
that hardware crossbars and multi-stage switches can scale to high
port densities and high bandwidths while this is more difficult and
expensive for traditional routers.  On careful examination however
there are a number of other concerns which affect this scheme's
scalability.  

The two most immediate concerns have to do with the traffic based
nature of Ipsilon's mapping policy, namely how many packets are
switched, and how many distinct flows can the system handle.

The effectiveness of IP switching is directly a function of the
percentage of flows that are long lived.  In Ipsilon's experiments
they found that 84% of packets were long enough and simple enough to
be switched [Newman et al, 96c, Newman et al 96d].
These results represent data from a particular exchange point (NLANR)
at a particular time and are tightly tied to the success of TCP-based
connection oriented applications such as ftp and http. Should 
traffic patterns change to have shorter bursts, or should 
this happen artificially from high degrees of aggregation then IP
switching may perform less well (although clearly current Internet
trends favor their scheme).  For instance, Scalable Reliable Multicast
(SRM) protocols such as used by the popular white board application
``wb'', rely on distributed application updates which may be provided by
any client in a multicast group.  Ipsilon supports IP multicast by
mapping long-lived multicast transmitters to particular ATM multicast
VC's, and so may not be well suited to the random traffic produced
by this new communications style. 

As packet rates and line bandwidths scale up exponentially the switch
controller may also become a bottleneck. In the current
implementation, the switch controller is simply a 200Mhz Pentium Pro
running a specialized version of FreeBSD, and is capable of routing
roughly 50,000 to 100,000 packets per second.  If 84% of the packets
are switched, then somewhere between 3 million and 7 million packets
per second the switch controller cannot handle the remaining 16% of
the packets which must be routed.  Similarly, each switch controller
must be able to support the overhead of current routing protocols
(e.g. OSPF) and so a particular IP switch may have its performance
limited by ``route flap'' in large networks (since Ipsilon does not
support BGP and is clearly oriented towards the enterprise market this
may not be a critical concern).

Also there is a question of the number of flows which real
implementations of IP switching can handle.  The ATM cell format
specifies only 16 bits for the VCI field, limiting the number of unique
flows a switch can handle to ~65000, and many switches do not even support
all 16 bits.  Since each flow receives its own VC, IP switching is
limited in the number of simultaneous flows that can be supported.  If
the timeouts on flow to VC mappings are too long this pressure is even 
higher, yet if the timeouts are too aggressive then some flows may
churn between the switched and routed state.  

2.2 IP Switching QoS
--------------------

Ipsilon is developing support to differentiate between qualities of
service (QoS).  The details of their implementation are not yet clear,
but there are two basic elements.  First, the flow classification
process is modified to account for flow qualities (which may be
communicated via the Resource Reservation Protocol (RSVP) for
individual flows or by a network manger using Class Based Queueing (CBQ))
and GSMP is extended to allow the flow to VC mapping to contain
information about QoS.  In current generation ATM switches, such
QoS information is usually limited to either best-effort or
real-time. Next generation switches are likely to support some variant
of Weighted Fair Queuing (WFQ) with a small number of queues.  One
interesting challenge will be mapping higher level QoS parameters onto
those provided by the particular underlying switch services.  

3 Cisco Tag Switching
---------------------

The Cisco tag switching scheme is considerably different from
Ipsilon's.  Cisco chose to base their layer-3 to layer-2 mapping on
network topology rather than network traffic[Rekhter et al 97].  In a tag
switched network, edge routers forward packets according to standard routing
practice.  However, in the output stage a forwarded packet is labeled
using a small tag that indicates the route to be taken.  Intermediate
switches use this tag to forward packets at wire speed until they
reach their destination.  The routers are responsible for synchronizing
their tag databases and programming the switches using either
extensions to traditional routing protocols (e.g. BGP) or a
specialized Tag Distribution Protocol (TDP)[Doolan et al 96].

Tag switching allows these tag mappings to be driven by upstream
nodes, or by downstream nodes, and they can be constructed in advance
or on demand.  Cisco seems to prefer downstream tag allocation unlike
Ipsilon because they believe it is easier to integrate with ATM (at
least in part because they have tried hard to allow their switches to
support both tag switching and traditional ATM signaling).  Much like
IP switching tag mappings are temporary and must be refreshed to stay
alive. 

Cisco promotes tag switching as being independent of network and
data-link protocols but this is to a large extent a marketing issue.
Currently ATM is the only established layer-2 media which has the
tagging capabilities required to support tag switching without any
modifications to the frame format.  To support tag embedding within
other protocols requires either a non-standard ``shim'' layer (e.g. a
few works at the from of the ethernet packet) or the re-purposing of
protocol fields (e.g. Cisco has suggested that the IPv6 flowspec field
could be used for tag switching).

3.1 Tag Switching Scalability
-----------------------------

Tag switching has answered a number of the scalability questions of
IP switching, but presents several unique problems as well.  

Because tag switching is driven by topology instead of traffic, it's
need to VC's is proportional to the number of routes as opposed to the
number of flows (ie multiple flows may be aggregated under a 
single tag).  However, since our current Internet routing tables are
close to the maximum VC space it seems reasonable that VC reuse may
become an issue for tag switching as well.  Another advantage is that
because tag switch routers generally classify packets pro-actively (at
the edge) instead of reactively (as flows are discovered) they can
switch almost all packets instead of a limited percentage.  The
importance of this advantage, of course, depends on the
characteristics of the network traffic.  Finally, the original tag
switching proposal allows tags to be ``stacked'' (ie. multiple tags 
in a single packet) to efficiently support hierarchical routing.  For
instance, a tag switch router might label a packet with 
two tags, the first to forward the packet to a new administrative
domain, the second a tag unique to that domain specifying a local host
or network (similar to the concept of loose source routing).

However, there are a few other technical problems presented by
tag switching, notably dealing with routing loops and performance at
packet exchange points. 

The IP protocol was designed with a Time-to-Live field (ttl) to
control the lifetime of a packet within the network.  Keeping packet
lifetimes short is important so that sequence numbers can be safely
reused and so network bandwidth is not wasted in the event of routing
loops.  Consequently routers in IP networks are responsible for
decrementing the ttl field and discarding any IP packet for which ttl
becomes zero.  Tag switches are programmed using the topology
information computed by edge routers so they are subject to loops,
however they don't decrement ttl (an ATM switch has no idea what kind
of network layer it is forwarding). Therefore, its possible for a
network layer routing loop to create a data-link layer forwarding
loop.  Without the ttl mechanism a tag switched network may waste
enormous amounts of bandwidth from these short-lived routing loops.
Consider that if there is a loop in the route from A to some destination B.
Traffic sent to B from A will accumulate within the path of the loop 
and will persist until the loop disappears, creating congestion for
any other packets in transit through those links (Even if they are not
destined for B).   

Inter-domain network exchange points present another problem. Cisco
positions tag switching primarily as a solution for scaling wade 
area networks.  However, it does not address one of the critical
performance problems of these networks, namely scaling packet forwarding
through inter-domain exchange points (e.g. the MAE's, FIX's, etc).
Because such exchange points perform route aggregation, multiple
flows will arrive there with the same tag but ultimately different
destinations (route aggregation is necessary to keep the size of the
routing problem small, in this case at the expense of the forwarding
problem). Consequently, at these aggregation and de-aggregation
points, tags must be ``split'' and the individual flows must be routed
(undermining the benefit of using tag switching).  In fact, this need
to multiplex flows and tags requires that each of Cisco's tag switches
must be capable of performing traditional IP forwarding tasks (ie. you
can't just put your routers on the edges, the switches must be able to
route if they have to as well).

3.2 Tag Switching QoS
---------------------

Cisco's solution for QoS seems similar to Ipsilon's.  For data flows
with explicit quality constraints, individual tags are allocated, and
these tags are mapped onto the underlying switch QoS features.  Cisco
suggests that there may be a limited number of quality equivalence
classes, and so it may be possible to share even these specific QoS
tags as well (if not, the QoS tags will be managed in a purely flow
based fashion).

One significant difference between the Cisco scheme and Ipsilon's is
in which boxes can control the QoS decision.  IP switching requires
each switching controller to perform an independent classification,
while Cisco's classification happens primarily at the edges of the
network.  While purely edge-based classification is simpler,
distributing the problem allows transit networks to enforce their own
QoS policies independently.

4. 3Com's Fast IP
-----------------

3com is a relative newcomer to the hybrid switching field and
technical documentation on Fast IP is scarce[Hart 97].
Unsurprisingly (since 3com controls the host ethernet market and not
the router market) the 3com scheme is host-centric.  Normal
intra-subnet communications work as usual, as do the initial packets
for an inter-subnet transmission (Fast IP requires routers to be
present to forward packets for which layer 2 paths are not found)
Once a inter-subnet flow has started however, the sending host uses
the distributed Next Hop Routing Protocol (dHNRP) to discover if there
is an end-to-end layer 2 path between it and the destination.  The low
level mapping uses draft IEEE VLAN standards. Roughly speaking, each
subnet is mapped to a VLAN id which are learned by the switches they
pass through.  The dHNRP protocol is a way for a sending host to
enquire about the VLAN id's of the destination and the intervening
hops.  If a complete VLAN id path can be constructed then the host
labels future packets for that destination with the appropriate
id (and they are switched directly). 3com claims that this strategy 
doesn't require routers to be  upgraded and allows the incremental
deployment of host software (and presumably switches supporting VLAN's) 

Fast IP Scalability
-------------------

3com targets Fast IP for the small enterprise and readily admits that
Fast IP (as all VLAN based standards) does not scale to large
networks.  They've announced a partnership with Cascade, who's IP
Navigator WAN Switching architecture is compatible with Fast IP. IP
Navigator is even more poorly documented than Fast IP, but its clear
that it uses Ipsilon's IFMP at the edges of the network to incoming
flows with the appropriate VLAN id's.

4.1 Fast IP QoS
---------------

3com has a very different approach to QoS than other vendors.  3com
sees the need for traffic differentiation at the granularity of hosts
rather than traffic classes or individual flows.  Under the 3com
scheme, a priority is associated with a particular host either
statically, according to which uses logs in, or some other host
priority assignment policy.  All traffic from that host then receives
that priority.   

This particular approach to QoS seems poorly considered, as it doesn't
consider the different network demands placed by different
applications.  Moreover, since it is host-centric it makes the
network management problem harder rather than simpler.  Since one of
the principle benefits of QoS support is thought to be better
management of network resources (e.g. using CBQ to give telnet higher
priority than ftp) this seems to be a significant deficiency.

5. Conslusion
-------------

In this paper I've looked at three relatively new approaches for
mapping layer 3 forwarding decisions onto layer 2 switches.  Ipsilon
is currently the only one of these vendors to be shipping a product
and is clearly the most mature.  Cisco's proposal has a number of
elegant features but is still not completely worked out, and is really
geared more towards the WAN environment than the campus or enterprise.
3com's offering attempts to leverage its control over the
host networking market, while keeping compatibility with existing
campus network infrastructures.  However, its host-centric structure
is a network manager's nightmare.  Finally, the economic
assumptions behind hybrid switching (layer 3 is slow, layer 2 is fast)
may be seriously threatened by an emerging generation of Gigabit
Ethernet switches with custom layer-3 switching hardware. 


[Newman et al, 96a]
        P. Newman, W. Edwards, R. Hinden, E. Hoffman, F. Ching Liaw,
        T. Lyon, and G.Minshall
        Ipsilon Flow Management Protocol Specification for IPv4 Version 1.0
        RFC 1953 May 1996

[Doolan et al 96]
        P. Doolan, B. Davie, D.Katz, Y.Rekhter, and E. Rosen
        Tag Distribution Protocol
        Internet Draft, September 1996

[Hart 97]
        John Hart
        Fast IP: The Foundation for 3D Networking
        3COM White paper, March 1997.

[Newman et al, 96b]
        P. Newman, W. Edwards, R. Hinden, E. Hoffman, F. Ching Liaw,
        T. Lyon, and G.Minshall
        Ipsilon's General Switch Management Protocol Specification Version 1.1
        RFC 1987 August 1996

[Newman et al, 96c]
        P. Newman, T.Lyon and Greg Minshall
        Flow Labeled IP: A Connectionless Approach to ATM
        Proceedings of the 1996 IEEE Infocom conference
        San Francisco, CA,  Marc 1996.

[Newman et al, 96d]
        P. Newman, T.Lyon and Greg Minshall
        Flow Labeled IP: Connectionless ATM Under IP
        Networld+Interop, LasVegas, April 1996

[Rekhter et al 96]
        Y. Rekhter, B. Davie, D. Katz, E. Rosen, G. Swallow and
        D. Farinacci
        Tag Swtiching Architecture - Overview
        Internet draft, January 1997