Bridge-Path vs. Route-Path Server Load Balancing
by Tony Bourke
11/14/2000
All you have to do is walk into any data center and you'll see them:
server load balancers. These devices are integral parts of today's Web sites,
and cornerstones to their scalability and reliability. Perhaps the most
challenging aspect of SLB (server load balancing) for network administrators
is figuring out the best way to implement the device in a
given network. This aspect of SLB is probably the least understood of any
issue, and the plethora of SLB products and vendors with their own diverse
installation methods only compounds the issue. Most network administrators
understand the basic premise of SLB, which is to make many servers appear as
one to an end user by distributing the traffic load to multiple servers.
What is not as well understood is how load balancers fit into a given network
architecture. There are many different ways of implementing SLB in a network,
and usually a given product is capable of several implementation methods.
There is method to the madness, however, since most SLB implementations can be
simplified to fall under two categories: Bridge-path and route-path. Of
those two, I think route-path is the better way to go. Before I go into why
I prefer route-path, let's take a look at why they are different.
A load balancer works by taking traffic on a VIP (Virtual IP), sending that
traffic to an available server, and then sending that server traffic back over
the Internet to its destination. The critical part here is that, in most
cases, traffic must traverse the load balancer on the way back out to the
Internet.

Figure1-1: VIP traffic and the load balancer
In this diagram, the traffic from an Internet user hits the VIP on the load
balancer in step 1. In step 2, the traffic is diverted to an available Web
server. The Web server responds and sends the traffic back to the user in
step 3, passing through the load balancer on the way out. In step 4,
the traffic leaves the load balancer on its final journey to the user.
In most cases, traffic must traverse the load balancer on its way back out to
the Internet. This third step in Figure 1-1 is the fundamental difference
between route-path and bridge-path, and is a major factor in how load
balancers are integrated into a network. With the bridge-path method, the load
balancer is in the Layer 2 path of traffic bound for the Internet, acting as a
bridge between two separate networks. With route-path, the load balancer is in
the Layer 3 path of outbound server traffic and is the server's default route.
Route-path has several advantages over bridge-path, which makes route-path
much more attractive. It's more flexible, easier to integrate into a given
network, and offers a number of different configurations. Bridge-path, on
the other hand, has several limitations, which restrict the available
configurations in a given network as well as cause issues with redundancy.
One of the biggest limitations with bridge-path is the basic nature of Layer
2 traffic, which is that you cannot have more than one path to a given target.
If there is more than one path, one of two things will most likely happen:
A bridging loop will be created, flooding the network with continuously
amplified Layer 2 frames (if you've ever seen this happen, you know it's a
hoot); or STP (Spanning-Tree Protocol) will shut down one of those paths,
which could also shut down other portions of a network. In a scenario where
there are two Layer 2 devices for redundancy, one of the Layer 2 devices must
be inactive to prevent a bridging loop.
Let's take the following example of a bridge-path load balancing implementation (Figure 1-2). In this example, we have a pair of load balancers that sit
in front of the servers. Any traffic must traverse the load balancers on the
way in and out. The load balancers act as a bridge between the public and
server networks, and everything is on the same IP subnet (208.20.20.0/24).
One unit is active while the other unit sits in standby mode (not
forwarding Layer 2 packet).

Figure 1-2: Bridge-path flat-based SLB implementation
example
(Click on image for a larger view)
In this redundant bridge-path load balancing scenario, it is a requirement
that only one unit be active in forwarding Layer 2 packets. Because of this
limitation with the bridge-path method, only one pair of load balancers can
be employed in a given network configuration. Two sets of load balancers
would have two active load balancers in a group of four, which would still
create multiple Layer 2 paths and thus a bridging loop.
Redundancy speed is also an issue with the bridge-path method. Depending on
the vendor and overall network infrastructure, STP is usually part of the
overall Layer 2 redundancy, and STP is not known for its speed. Fail-over
time for STP, depending on implementation, can take well over ten seconds.
That is a veritable eternity where an Internet site is concerned. However, it
should be mentioned that many SLB products that employ the bridge-path method
also have some other way of doing redundancy, which is usually somewhat
quicker.
With the route-path method, traffic flow is controlled very easily by
setting the default route on a server. In this way, the load balancers do
not need to be in the direct Layer 2 path, as in bridge-path. Let's take
the following example of a route-path SLB implementation (Figure 1-3), which
gives the same SLB functionality as the bridge-path example.

Figure1-3: Route-path flat-based SLB implementation
example
(Click on image for a larger view)
The load balancers are hung off of the Layer 2 switches, not the Layer 2
path of the servers. Instead, the default route of the servers is set to the
floating IP address that exists on the active load balancer. This ensures
that traffic passes through the load balancers on the way out. As with the
bridge-path example, everything is on the same subnet, although now we are
dealing with only one VLAN, whereas with bridge-path we were dealing with two
networking segments (public network and server network).
Redundancy is much simpler and easier to implement on the Layer 3 level.
VRRP (Virtual Router Redundancy Protocol), or a similar protocol is the usual
method of redundancy. VRRP utilizes floating IPs between two units, with one
unit having the floating IP active while the other is in standby, in case the
active unit fails. Fail-over can typically occur in five seconds or less.
With the route-path method, you have the choice of using either one subnet
for the VIPs and real servers (called flat-based SLB), as used in the two
previous examples, or two subnets with the VIPs on a public network and the
real servers on a separate, usually private subnet (called NAT-based SLB).
With the bridge-path method, the VIPs and real servers must be on the same
subnet. Since the load balancer acts as a router in the route-path method,
you can perform the router function of NAT from one subnet to another.
Figure 1-4 is an example of a NAT-based SLB implementation:

Figure 1-4: NAT-based SLB Implementation
(Click on image for a larger view)
In this scenario, the load balancers are connected to two separate VLANs
using separate links, one being the public network and the other being the
private network, using a total of two ports (even on a switch-based load
balancer). The public network uses public IP address space while the
private server network uses private nonrouted RFC 1918 address space.
The VIPs exist in the public network with the servers exclusively on
the private network. The load balancer performs a NAT on the inbound traffic,
and NATs it back on the way out. There are floating IPs between the load
balancers, as VIPs accepting traffic on the public network, or as the default
route of the servers on the private network. This NAT-based scenario has
several advantages security-wise, including the ability to make the load
balancers' firewalls by employing packet filtering.
There is a third way to handle traffic on the outbound called DSR (Direct
Server Return). With DSR, traffic actually doesn't travel through the load
balancer on the way out. Through some networking trickery involving loopback
interface configuration and a process known as MAT (MAC Address Translation),
traffic is sent to the end user from the server already rewritten with the
source address of the VIP on the load balancer. With this step already
handled by the server, traffic can go unabated to the Internet. Since
traffic for most Internet sites is outbound rather than inbound, this
represents a significant savings of resources for the load balancer.
Normally, if a site's traffic ratio is one packet in for every ten packets out,
the load balancer handles all eleven packets. With DSR, the outbound traffic
never hits the load balancer, and only handles one out of every eleven packets.
The configuration is much like route-path, except the default route of the
servers is not the load balancer; rather it's the IP address of the router
servicing that subnet (and the load balancer itself). Configuring DSR
requires some expertise in Layer 2 and Layer 3 dynamics and is much more
complicated, so it's generally a good idea to use it only when there is a
specific need. DSR does not usually work with a bridge-path scenario.
In general, switch-based products on the market tend to be more bridge-path
oriented, while the PC-based appliance load balancers tend to support only
route-path. However, most of the switch-based vendors also support the
route-path method, and one PC product that I know of does
bridge-path exclusively. Of course, these features vary depending on the
vendor.
Given the easier redundancy, more flexible configurations, and added
functionality, it's easy to see why I prefer the route-path method. As with
any network setup, your specific requirements will dictate how load balancing
is implemented. There may be cases where bridging-path might be more
appropriate than routing-path, but I think the majority of the time
route-path would be the optimal configuration. As with any network setup,
your mileage will vary.