Scaling Web applications with Windows 2000 Advanced Server's Network Load Balancing
By Rick Strahl
Last Updated: 10/14/2000
Updated version of this article for Windows Server 2003:
With ever larger Web applications being built to service tremendous amounts of simultaneous users pounding away at Web sites, the issue of scaling applications beyond a single machine is often on the mind of Web application developers and network administrators. While hardware seems to be ever increasing to the point that high power single machines can handle tremendous loads there will always be those apps that push beyond what a single machine can service. In addition, for many administrators and IT planners it's often not good enough to say that a server can handle x number of users, but they want to have redundancy, backup and overflow support so that a Web server or hardware failure or an unexpected surge of visitors doesn't cripple the corporate Web site. In this article, Rick discusses the issues of scalability and how load balancing services can help provide redundancy and extra horse power to large Web sites that need to reach beyond a single box.
When building Web applications that have the potential to 'go big', scalability should be on the top of the list of features for a Web site. There are many things that can be done to scale a Web application starting with a smart application design that maximizes the hardware it runs on, proper tuning and smart site layouts that minimize the traffic hitting your site. With today's multi-processor hardware and Windows 2000's ability to use up to 16 processors (in the yet to be released DataCenter Server, 8 in Advanced Server) a single machine can go a long way to serving a tremendous amount of backend hits. However, some applications eventually reach a point where a single machine is just not enough regardless of how much hardware you throw at it. Applications of this scope also need backup and redundancy that requires multiple machines for piece of mind of the administrators managing the Web server hardware. In this article I'll discuss one solution to scaling to multiple machines using the Microsoft Network Load Balancing Service that comes with Windows 2000 Advanced Server and above. This built-in tool provides an easy mechanism for spreading TCP/IP traffic over multiple machines relatively easily.
Windows 2000 includes a number of very different features for load balancing:
- Cluster Service
The Cluster Service is used for load balancing application servers and are used primarily to provide redundant backups. You see Cluster Services implemented for SQL Server to provide multi-machine support and replication features.
- COM+ Load Balancing
COM+ Load Balancing allows COM components deployed as MTS components to be installed on multiple machines and have them balanced through central server that decides on which machine components are invoked. This feature has been discussed a bit in the technical computer press, but was pulled at the last minute before Windows 2000 shipped. It's now scheduled to be released with Data Center Server later this year.
- Network Load Balancing
The focus of this article is on Network Load Balancing (NLB), which provides IP based load balancing for services such as HTTP/HTTPS, FTP, SMTP and so on. In this scenario, a single 'virtual' IP address handles incoming network traffic and balances it across a cluster of machines.
In the past there have been a number of products that have provided IP based load balancing such as Resonate Central Dispatch, F5's Big IP and Cisco's Local Redirector. In addition there are pure hardware based solutions such as routers that provide round robin DNS services. Router solutions tend to be 'dumb' in that they simply change IP addresses for any hit that comes in software tools tend to be smart using a machine polling mechanism to check and see which servers are available and how loaded those servers are. Some newer routers provide both the routing hardware as well as load balancing software in their firmware. All these solutions work well and have proven themselves in production environments. Unfortunately, many of these services are very expensive and rather hard to install and administer.
Network Load Balancing in Windows 2000 Advanced Server (and higher) is the new kid on the block and promises to bring down the cost of load balancing into the affordable range for companies that are in the non-Fortune 500 set. Keep in mind though that the Windows Network Load Balancing service doesn't provide some of the bells and whistles of some of the other tools. For example, Resonate provides dynamic rebalancing of hits based on load on the server, live graphical status reporting and administration and routing specific URLs to specific machines. Like many Windows services, the Network Load Balancing service is bare bones, but it has the key features that you need to take advantage of load balancing quickly and comparatively cheaply. Many of the older tools were very expensive because they fall squarely into the Enterprise domain where big dollars are expected to be paid for system management software. Many of these run into 10's of thousands of dollars for only a few load balanced machines. On the other hand, the Network Load Balancing service ships with Windows 2000 Advanced Server and above, which makes it affordable for smaller organizations to explore and test multiple server load balancing scenarios.
Farming the Web
The concept behind Network Load Balancing is pretty simple: You have a 'virtual' IP address that is configured on all the servers that are participating in the load balancing 'cluster' (a loose term that's unrelated to the Cluster Service mentioned above). Whenever a request is made on this virtual IP a network driver service intercepts the request for the IP address and re-routes the request to one of the machines in the Load Balancing cluster based on rules that you can configure for each machine in the cluster. Load balancing is provided at the protocol level, which allows any TCP/IP based service to be included in this scenario. Network Load Balancing is Microsoft's term for this technology. The process is also known as a Web Server Farm, or IP Dispatching.
Figure 1 A network load balancing cluster routes requests to a single virtual IP to available servers in the load balancing cluster. Note that each machine is self-sufficient and runs independent of the others. The database sits on a separate box(es) accessible by all servers. ##LOADBALANCING.JPG##
Network Load Balancing (NLB) facilitates the process of creating a Web Server Farm. A Web Server farm is a redundant cluster of several Web servers serving a single IP address. The most common scenario is that each of the servers is identically configured running the Web server and whatever local Web applications running on the Web server. With IIS this might be custom ISAPI extensions or an application built with Active Server Pages. Each machine has its own copy of HTML, ASP and other script and image files. If backend applications like COM objects are running locally, these also are running on each one of the servers in the cluster. In other words, each server is a fully functioning Web site that can operate on its own regardless of whether it's in the pool or not. The key is redundancy in addition to load balancing if any machine in the cluster goes down, the virtual IP address will re-balance the incoming requests to the still running servers in the cluster. The servers in the cluster need to be able to communicate with each other to exchange information about their load and even more basic checks to see if a server went down.
Each server in the cluster is self-contained, which means it should be able to function without any other in the cluster with the exception of the database (which is not part of the NLB cluster). This means each server must be configured separately and run the Web server as well as any Web server applications that are running. If you're running a static site, all HTML files and images must be replicated across servers. If you are using ASP, those ASP pages must also be replicated. Source control programs like Visual SourceSafe can make this process relatively painless by allowing you to deploy updated files of a project (in Visual Interdev or FrontPage for example) to multiple locations simultaneously.
If you have COM components as part of your Web application things get more complicated, since the COM objects must be installed and configured on each of the servers. This isn't as simple as copying the file, but will also require re-registering the servers, plus potentially move any additional support files (DLLs, configuration files if needed, non-sql data files etc.). If you're accessing databases you also need to configure the appropriate DSNs to allow each server to access the data source. In addition, if you're using In-Process components you'll have to shut down the Web server to unload the components. You'll likely want to set up some scripts or batch files to perform these tasks in an automated fashion pulling update files from a central deployment server. You can use the Windows Scripting Host (.vbs or .js files) along with the IIS Admin objects to automate much of this process. This is often tricky and can be a major job especially if you have a large number of cluster nodes and updates are frequent strict operational rules are often required to make this process reliable. In general the update process is likely to occur one machine at a time so that the Web site can continue to run while the changes and updates are made. In this scenario only one machine is taken down, updated with the latest version of the application, then put back online. Then the next machine in the cluster receives the same treatment.
Since multiple redundant machines are involved in a cluster you'll want to have your data in a central location that can be accessed from all the cluster machines. It's likely that you will use a full client/server database like SQL Server in a Web farm environment, but you can also use file based data access like Visual FoxPro tables if those tables are kept on a central location accessed over a plain LAN connection.
Note that in heavy load balancing scenarios running a SQL backend, the database not your application code can become your bottleneck! Without going into details here, you need to think about what happens when you overload the database, which is essentially running on a single box. Max out that box and you have problems that are much harder to address than Web load balancing. At that point you need to think about splitting your databases so that some data can potentially be written to other machines. For redundancy you can use the Microsoft Cluster Service to provide the ability to monitor and sync a backup system that can take over in case of failure of the primary server.
Network Load Balancing is very efficient and can provide you very close to 2:1 performance improvement for each machine added into the cluster there is some overhead involved, but I didn't notice it my performance tests with Microsoft's Web Application Stress tool with each machine adding very close to 100% of its standalone performance to cluster.
You may notice that with this level of redundancy increasing your load balancing capability becomes simply a matter of adding additional machines to the cluster, which gives you practically unlimited application scalability (database allowing) if you need it.
Getting started with Network Load Balancing
Network load balancing in Windows 2000 is fairly easy to set up and run assuming you can manage to decipher the horrible documentation in the online help. In this section I'll take you through a configuration scenario that hopefully will make your installation and configuration more straightforward by highlighting the important aspects of the installation and running process.
Let's start by discussing what you need in order to use NLB. You'll need at least two machines that are running Windows 2000 Advanced Server or better. You'll need at least one network card in each machine. You can also use multiple network cards one for the cluster communication and one for the dedicated IP address for all directly accessed resources. In order to test it's a good idea to have yet another machine that can run a Web stress testing tool which lets you see how the cluster works under load.
I'm going to use two machines here to demonstrate how to set up and run NLB. Assume the IP addresses for these machines are 184.108.40.206 and 220.127.116.11. I'm going to set up NLB on a 'virtual IP' (NLB calls this the 'primary IP address') at 18.104.22.168. In order to set up NLB every machine in the cluster must set up this IP address in addition to it's dedicated machine IP address(es). To do so, right click on Network Neighborhood on the desktop. Click Properties, then Internet Protocol (TCP/IP). The machine must have at least one fixed IP address in order for NLB to work DHCP clients with automatically assigned addresses will not work, so if you're using DHCP make sure to add at least one physical IP address to your machine configuration. Once you have a primary IP address for your machine, click on the Advanced button and proceed to add 22.214.171.124 as a new IP address. Make sure that all subnet masks are the same on all of these IP addresses (255.255.255.0). Figure 2 shows what you should see in the IP display dialog.
Note that even though 126.96.36.199 is a virtual IP address you can tie domain names to it with DNS. So your master domain name such as www.yoursite.com would point at this virtual IP address in the DNS record.
Figure 2 You need to add the virtual IP to your machine's TCP/IP configuration.
You now have an IP address that the virtual IP can bind to. Go back out to the Local Area Connection Properties and notice the Network Load Balancing option in the list of network services. Check the checkbox and click the entry to bring up the service properties.
Figure 3 NLB is provided as a network level service that shows up in Local Area Connection Properties.
The first page of this multi-tab dialog (see figure 4) shows you the cluster parameters. Here you put information about the cluster, such as the virtual IP address, subnet mask, whether to use multicast or singlecast messages. There are also additional options that allow you control the cluster remotely via the command line tools that the service provides. Note that this applet has a number of user interface bugs and quirks that make using this dialog a little less than optimal, so be sure you look at the values before moving on to the next page or clicking OK. In addition, the documentation and help file are very scattered and it's hard to find things. The best help is the '?' help from the dialog's control box. Drop the question mark onto individual fields for the best documentation on individual items in each of the dialogs.
Figure 4 The cluster parameters contain the virtual IP address and multicast support options.
NLB calls the virtual IP the Primary IP address, which is very confusing to say the least because primary IP can mean primary IP of your system or of the cluster. Virtual IP is the term most commonly used in other load balancing packages and I think it describes the concept much better. This Primary IP is the cluster's IP address, which will be used to access all the sites in the cluster. Public DNS entries such as www.yoursite.com will be bound to this 'virtual' IP address. Set this IP address to the new IP we added in Figure 2 for all of our cluster machines in this case 188.8.131.52 and adjust the subnet mask to 255.255.255.0.
The full Internet name is used only for remote administration and is used as an identifier for the machine. If you're using a single network adapter you'll want to enable multicast support to allow the network card to handle traffic both for the cluster and dedicated IP address.
The Host parameters (see figure 5) configure the cluster machine's native IP settings and how the cluster loads. The Dedicated IP Address for is the main physical IP address for the machine that is used to access the machine without going through the cluster. This IP address can be accessed directly or NLB can use it to route virtual IP traffic to. In other words, you can access the machine via 184.108.40.206 or 220.127.116.11. Only the latter of runs through the Load Balancing Cluster.
Another important parameter is the Priority setting, which doubles as a unique ID for the server in the cluster. Each server must have a unique ID and the server with the lowest priority value (which is the highest priority!) handles all of the default network traffic that is not specifically configured in the port rules. This means default network traffic such as files passed between servers and standard network messages that don't fall within the range of rules applied in the port rules described next. It doesn't affect the configured ports, which are handled equal (presumably on a first come, first serve) basis between the cluster machines.
The server with lowest ID performs the default network traffic serviced by the virtual IP. If the first server in the cluster fails the one with the next higher priority will become controller. Note that actual load balancing settings can be configured separately in the Port Rules settings with percentages that should amount to 100% and are not affected by the priority setting. The main thing that's important is that each server gets a unique ID, since the cluster manager reports on the servers using the priority ID (for example when converging IP addresses on startup).
You can dynamically add and delete servers from the cluster with the Active flag keeping the other parameters of the server intact.
Figure 5 The host parameters determine the host's native IP address that the NLB cluster uses to communicate with your server.
To configure individual machines in the cluster you can configure port rules for each of the servers. The rules determine how the cluster balances the load among machines in the cluster with rules for percentage based balancing as well as specific ports being sent to specific machines in the cluster.
Each machine can be configured to service a specific range of IP ports. By default NLB serves all ports and this default setting is probably sufficient if you're servicing specific traffic such as Web access. Web traffic typically comes in on port 80 for standard HTTP with secure HTTPS coming in on port 443. The port range is handy because it could allow you to indirectly control which protocols are load balanced. It also comes handy to route all secure traffic to a specific server so you have to install a secure certificate only on one of the machines in the cluster rather than all of them.
In addition to limiting ports you can also configure how the cluster affinity works. Affinity controls whether incoming requests from a client are always bound to the same cluster node. There are three kinds of affinity settings:
Setting the affinity to None means the next available node in the cluster will service the hit. Requests are never routed to a specific machine in the cluster.
Every request for a particular client is routed to the same IP address providing a 'sticky' server to the client.
- Class C
Like Single this causes a group of Class C IP Addresses to be routed to the same server on every hit.
Affinity may be required for specific state keeping required by servers in order to track users through their site. For example, Active Server Page Sessions are machine specific and in order to continue to use ASP Sessions users must be hitting the same server each time. Affinity can also help performance in some scenarios where users are tracked through a site because of the inherent caching that occurs in the Web applications. For example, SSL/HTTPS connections are much slower if they are always rebuilt from scratch rather than caching some of the certificate information provided to the same clients.
On the other hand using affinity can cause a small bit of overhead in the routing of requests because the IP routing manager potentially has to wait for the machine in question to free up if busy. Affinity is useful when you have stateful operations on your servers that benefit from caching. In very high volume environments scalability will be better without affinity.
Each server has to be assigned a handling priority, which is given in percentages and should combine to 100% for the entire cluster although this is not a requirement. Optionally you can specify Equal, which tries to split the load evenly between all of the servers in the cluster.
The typical scenario for load balancing is that you configure each server in the cluster with a single rule, which other than the port rule percentages will be identical. But you can create multiple rules and have each handled by the machines in the cluster. By default rules are setup for multiple hosts, which means that the rule is handled by multiple machines in the cluster. You can configure a rule to be served by a single host or group of single hosts. For example, you could set up a server to be part of the general cluster, but also configure it to be the single host of the cluster to serve SSL requests on port 443.
Figure 6 The port rules determine how the cluster node handles requests. The settings include whether client requests stick to this specific cluster node (affinity), the priority of how requests are handled and which ports are serviced. Multiple rules can be created for a single node.##portrules.jpg##
It's very important that you set up every server with exactly the same port rules only varying the load factor or your cluster will not work! For example Figure 6 shows port rules for port 80 and 443 as I want to have this server service all of the SSL requests. The other machine in the cluster should not service port 443, but you still need to configure the port rule for port 443 with a load factor of 0%. This was a bit confusing at first and is not described in correctly in the documentation so make sure you remember this when the time comes to set up your own servers.
This can cause serious headaches when you're testing I suggest you make sure you start setting up your servers with the exact same port rules to start with to make sure everything works before experimenting with custom port rules.
Administering the Service
The Network Load Balancing service also comes with a command line utility: WLBS.exe. This utility allows you to view and refresh settings made in the dialogs above in the live cluster. The WLBS program lets you refresh the settings of the cluster and each machine, view current settings, get machines to converge and administer and set settings on machines remotely. To get an idea of the commands available in this archaic interface simply run WLBS from the System Command Prompt to see all of the options.
Figure 7 - The command line based WLBS utility lets you administer the cluster and each of the machines in it.
The two most important commands you need to remember are Query and Reload. Reload takes the settings from made in the dialogs and reloads them from the registry into the currently running cluster server immediately. The documentation states that each cluster machine will occasionally refresh the new settings automatically, but in my tests this was not happening so I used a manual Reload to make it work.
Once you've reloaded settings you have to re-converge the cluster fancy talk for making the cluster see your changes. After reloading I've noticed that the cluster often stopped dead issuing a WLBS Query re-converged the cluster and fired it back up. Query also shows you all the priority Ids of the servers that are currently in the pool. If you disconnect one of the servers in the pool you'll see that ID removed within 10 seconds or so of downtime put it back in and it will show back up and become part of the pool. This is useful for troubleshooting and seeing which servers are available. You can run this on any of the cluster server machines in fact if you have problems you should do this to make sure all the machines can 'see' each other over the network (although this wont guarantee the cluster node is operational).
Note that even if the cluster is not operational (for example if your port rules don't match) you will see the clusters with the query command. There appears to be no way to see exactly whether the cluster is functional and no error information of any sort is provided, so getting the service up and running can take some trial and error.
Putting it all together
Once you've configured your servers in the cluster, it's time to check out and see how this service works.
For my first setup I configured two servers with no affinity and equal load weight. One of the machines is a PII 400 while the other is a rather old notebook P266. With equal balancing both machines should get the same amount of traffic even though the older box is much less capable.
I also set up a separate machine running Microsoft's Web Application Stress Tool (WAST for more info see " Load Testing Web Applications using Microsofts Web Application Stress Tool") to simulate a large number of users hitting my online Web store application on my development servers. The app is running a Visual FoxPro backend (using several pooled COM server instances on each machine) against a SQL Server backend. A separate machine makes this process more realistic, but you could also run this tool on one of the cluster nodes if necessary. I configured WAST for a continuous load of 300 simultaneous users without any delays between requests. Since there are no delays between requests this is a load much closer to 1500-2000 actual users on the site, since users typically take some time between requests before moving on to the next link. In my previous tests the breaking point of the application was right around the 200 user mark on the single PII 400 I figure with the load balancing I can approximately add a 50% gain or around 300 users total.
I let the test run for an hour to get an idea about load and stability of the service. The results were as I expected and I was able to service the 300 users easily. The old P266 machine was loaded very heavily and running at very close to 100% of load while the PII400 was running at just below 50% load. I then rebalanced the handling priorities to 70% on the PII400 and 30% on the P266 and both machines were then running at close to 80% load. I was able to actually bump the user count to 350 and still get good performance with dynamic results returning in under 1.5 seconds.
When I started adding additional users I noticed that the actual load percentages on the servers weren't getting any worse but performance was slowing down regardless with results now taking 2.5 4 seconds to return. After some checking it turns out that the SQL Server backend was starting to lag under the load. Checking the SQL box revealed that this machine was running very close to 90% load continuously mostly in the SQL Server process. So while clients were pounding away on the SQL Server they just sit there idle not taking up any CPU while waiting for the queries to return hence the slower response, but still lighter CPU usage.
My test bed obviously doesn't consist of high end server machines. You can expect much better performance on high end hardware both for the actual Web application and VFP backend as well as the SQL Server. But testing scenarios such as this using a stress testing tool are crucial to find bottlenecks in a Web application. It's not always easy to pinpoint the bottleneck as your weakest link (in this case the SQL Server on a relatively low power machine) can drag down all other aspects of the application.
Finally, I decided to test a failure scenario by pulling the plug on one of my servers. With both clusters running one of the clusters went dead and after 10 seconds all requests ended up going to the still active cluster providing the anticipated redundancy. No requests were lost although the single machine still working was getting very overloaded and response times dropped considerably.
Load Balancing and your Web applications
Running an application on more than one machine introduces potential challenges into the design and layout of the application. If you're Web app is not 100% stateless you will run into potential problems with resources required on specific machines. You'll want to think about this as you design your Web applications rather than retrofitting at the last minute.
For example, if you're using a Visual FoxPro backend and you're accessing local FoxPro data in any way, that data must be shared on a central network drive in order for all of the cluster servers to be able to access those files. This includes 'system' files of the applications itself in Web Connection this would mean things like the log file and the session management table, which would have to be shared on a network drive somewhere. It can also involve things like registry and INI file settings that may be used for configuration of servers. When you build these types of configurations try and build them so that the configuration information can be loaded and maintained from a central store or can be replicated easily on all machines.
If you're using Active Server Pages, you'll have to know that ASP's useful Session and Application objects will not work across multiple machines. This means you either have to run the cluster with Single Affinity to keep clients coming back to the same machine, or you have to come up with a different session management scheme that stores session data in a more central data store such as a database. I personally believe in the latter because most e-Commerce applications already require databases to track users anyway for traffic and site statistics. Adding Session state to these files tends to be trivial without adding significant overhead.
Finally, load balancing can allow you to scale applications with multiple machines relatively easily. To add more load handling capabilities just add more machines. However, remember that when you build applications this way that your weakest link can bring down the entire load balancing scheme. If your SQL backend which all of your cluster nodes are accessing is maxed out, no amount of additional machines in the load balancing cluster will improve performance. The SQL backend is your weakest link and the only way to wring better performance out of it is to upgrade hardware or start splitting databases into separate servers. Other Load Balancing software like Resonate and Local Redirector also have to worry about bottlenecks in the IP manager machine that routes IP requests. NLBS is much better in this respect than many other Load Balancing solutions that require a central manager node since NLBS uses every machine in the cluster as a manager that communicates with the others and every machine is an equal helping in the pickup of IP requests.
Cocked and loaded? Not yet
Windows 2000's Network Load Balancing Service is a welcome addition to the scalability services provided by the operating system. It provides basic load balancing features that are easy to set up and run once you fight through the bad documentation. I hope this document helped in making this process easier. The service is quick and easy to configure and transparently works behind the scenes without any administration fuss.The service works well once configured correctly and performance and stability was excellent. I did not have any problems in several high volume WAST tests that ran over 24 hour periods each.
But several minor bugs in the administrative interface, along with an antiquated command line based administration interface, the fact that changes don't take immediately and cause the cluster to stop responding for a short period of time and the bad documentation make this product feel like a 1.0 release in need of an update. Ironically, I couldn't even find much additional information on the MS Web site or in the knowledgebase. It seems like Microsoft is not really advertising or pushing this technology hard at this time. But as I mentioned, aside from the quirky interface operation of the service was reliable and none of the issues described are showstoppers that should stop you from using this powerful tool.
Rick Strahl is president of West Wind Technologies on Maui, Hawaii. The company specializes in Web and distributed application development and tools with focus on Windows 2000 and Visual Studio. Rick is author of West Wind Web Connection, a powerful and widely used Web application framework for Visual FoxPro, West Wind HTML Help Builder and co-author of Visual WebBuilder. He's also a Microsoft Most Valuable Professional, and a frequent contributor to FoxPro magazines and books. He is co-publisher and co-editor of CoDe magazine, and his book, "Internet Applications with Visual FoxPro 6.0", is published by Hentzenwerke Publishing. For more information please visit: http://www.west-wind.com/.