Large Scale Web Application with Visual FoxPro

Find out about common issues involved in building Web applications that have large hit loads and require a team of developers, artists and HTML designers to be built.

This paper includes discussion of Web development in a team environment and integration of Visual FoxPro into this environment. You'll see how performance and scalability affect Visual FoxPro development with discussion of performance, scalability, load balancing and tuning applications and the Web Server (Internet Information Server). Other topics include security for Web applications, site maintenance and management related to Visual FoxPro and keeping track of site statistics.

There are many types of applications that can be built with Visual FoxPro. When it comes to building applications that perform in a high transaction environment and require sophisticated development scenarios the rules of typical development change somewhat. Most successful commercial Web applications usually fall into this category.

When I talk about large scale Web development realize that this is a relative term that depends on your particular environment and how you approach an application. In general I consider large scale based on two particular issues:

A site can be very heavily used and thus fall into the large scale bucket by the sheer transaction volume and load issues it must deal with. Other applications involve a very complex business environment that requires the sheer size of the application to be large scale. Some applications have both.

This is probably the most common aspect people look at when judging the 'size' of a Web application. What's the volume of traffic, how many hits to this or that page, how many backend hits, how many records go into the database each day. These issues are extremely critical for server based applications as they need to be carefully balanced against the capacity of the system(s) that are running the application. Overload the system and you lock up the Web site with disastrous results and loss of money.

Performance is the glamour spot when talking about big applications, but at least as much attention and effort needs to go into tuning the development process as there is for performance and scalability.

Egghead.com is on its way to becoming one of the biggest computer/software resellers on the Net. Egghead.com consists of three sites the main Egghead site, Surplus Direct Discount Warehouse and Surplus Auction. The two Surplus sites which were acquired by Egghead last year. Both Surplus sites are running Visual FoxPro based applications to drive the Web sites.

All of the sites are ranked within the top 20 of the busiest commerce Internet sites.

In order to demonstrate Visual FoxPro in a live, high volume application I want to show the Surplus Direct Discount Ware site. This is a fairly straight forward shopping site that allows reviewing and ordering from a catalog of inventory online.

This site sells previous version software and hardware - stuff that is 'surplus' to manufacturers and other distributors. Items are sold at rock bottom prices which are advertised via a printed catalog that's sent out by mail, ads in all the major Computer hardware magazines like Computer Shopper.

The company uses one of the heaviest advertising plans on the Internet to promote their site which has been rated as low as #5 Commercial site on the Web by PCMeter (a popular Web visitor rating service) with corresponding traffic on the site. Surplus runs large scale Web advertising programs on Yahoo, Netscape, InfoSeek, Excite, Shareware.com and several other of the highest volume sites that take advertising online.

Both of the Surplus sites run as 'offline' Web sites, meaning they don't access the main business application directly.

The Surplus site runs on several separate machines in a server pool. Each machine is a fully functional Web server plus HTML and Visual FoxPro backends. Each server is fully redundant – if one fails it'll drop out of the server pool, but the site continues to run. Data is stored in SQL Server on a separate server.

Let's take a look at some of the development issues that need to be dealt with when building applications for this high volume Web environment. We'll look at the following topics in more detail:

Performance is extremely critical in high transaction environments. Any application needs to run as fast as it possibly can, but for high transaction environments tuning and making sure code runs at its optimum is crucial as slow requests can tie up valuable resources that might be needed by the next request in line. There are a number of areas that should be focused on.

Since we're dealing with database applications here data optimization is the most important piece to deal with. Database operations tend to be the slowest operations in any Web application and also the most resource intensive, so optimizations here can bring the biggest benefits.

Following are a few suggestions for improving performance on IIS 4.0 that can be important for heavily loaded sites:

When examining load on a site it's crucial to understand how the application is performing on a given machine. When talking about load we're mostly looking at the CPU load that is incurred by the application. This load is affected by all system components such as disk and memory, but shows itself most consistently in the level of CPU usage. As disks get saturated queries slow down and use more CPU power to get to data. As memory runs out more data is stored on disk rather than in memory cache and you get more CPU load to access the data.

With Internet commerce growing at over 100% each year it's very likely that a commercial Web site will run into growing pains. Scalability issues come to the forth especially when it comes to running applications that outrun the single Visual FoxPro server and even more so when having to run more than a single server machine simultaneously in order to handle volume.

When building database Web applications security is important. You wouldn't want to capture orders online including credit card numbers and then have somebody steal the entire order/customer file with that sensitive information.

Security comes in many flavors and applies to different aspects of a Web site. Is the information you pass over the Web safe? And how do you keep people from accessing certain parts of your application?

Windows NT provides excellent, though somewhat complex security features that should address the majority of your security needs. NT allows configuration of files at the file level as well as the directory level. Web directories need to have Read and typically Execute (or Script) rights set to allow Web clients to access the pages.

NT uses an account IUSR_ machine name to identify anonymous users to the Web site and rights must be given to this user for any public areas that public users to your site should be able to access. Beyond that however, make sure you remove any IUSR_ references (they shouldn’t be there in the first place), and also the Everyone account.

Also, be careful in playing with the rights of the IUSR_ account in User manager. When working with IIS and COM it's very easy to give the IUSR_ account Admin rights to get some security issues resolved, which is fine while developing – just don't forget to undo this setting once you put your site online.

Data security should be a top priority on your list. If you keep sensitive data on your Web server first of all make sure that the data is not accessible via a relative path over the Web. Ideally the data should reside in a totally off limits area away from the Web site in an unmapped path. Even better if the data can sit on another machine and be accessed over a non-TCP/IP network connection only you can just about eliminate your risk for data piracy (at the cost of overhead for the network access). For extra security you can also consider putting the data access over a separate network leg and use a non-TCP/IP protocol on that leg to disallow access.

If you must have data in a Web relative path so that the data can be downloaded via an HTML link for authorized personal, make sure you set the proper password rights on these directories to disallow anonymous access by Web users. If you use IE 3 and IIS, NT's Challenge Response mechanism ties securely into NT's security system. With other Web servers security of passwords passed over the Web varies.

NT supports NT Challenge Response for access to files, which means that if you're accessing a page and IUSR_ doesn't have rights NT will try to validate your user account through the local machine or domain if you have IIS configure to run through a specific domain server. If you are a user of the local network you may not be prompted for a password – if you aren't, NT will request a login dialog and validate you. If you type the correct password you're allowed access. Security in this fashion works both at the directory level (which really just delegates down to the file level) and the file level.

Make sure you set the Allow NT Challenge Response option in the IIS configuration.

You can also force authentication from dynamically generated result pages with Basic Authentication. Authentication occurs as part of the HTTP header passed back to the Web server/browser which interprets the header and pops up a validation box.

The actual authentication request is implemented via a special HTTP request that is returned instead of an HTML document. The following code generates the actual password box popup when sent back to the Web server:

Authentication provides a built-in mechanism tied to the Operating system to validate users. Once authenticated you can always check the users Username which is passed along with each subsequent request until the browser is shut down.

You can also implement your own security scheme bypassing authentication altogether and creating an HTML page that asks for login information. You can then capture the login information on your own and validate against a user table denying access if it's invalid. Note though that you need to set some sort of flag that can be checked on each request to make sure the user does not access unauthorized pages or requests directly simply by typing the URL.

By default all the information that travels over the Web is not encrypted in any way. All the information including the HTML form variables and Server information that is returned to your backend programs from the Web browser including authentication information is not encrypted. This means somebody with a protocol analyzer could potentially snatch passwords or ID or credit card numbers while in transit.

Secure server transactions use certificate based encryption based on a private and public key to encrypt all the content that flows between the Web server and browser. Keys are administered by a few 3^rdparty key authority companies at $250 for a year. You create a key request with the server's key manager utility and fill out an online submission form for a key request. (see www.versign.com for more information on obtaining a key). The server sends the key request which is used to generate your private key. This key is returned to you as file and merged with your existing key to provide the secure certificate on your site. Once installed using secure transactions means accessing the HTTPS protocol instead of HTTP - a simple change to your URL is all that is required once the key is in place to make a transaction secure.

do the same thing and can be handled identically in code. The latter is encrypted and secure. To check whether a request is secure you can check the SERVER_PORT or Server Port Server variables.

Not all browser support secure transactions and attempts to access a secure page with a non-secure browser will cause the page to fail. Tell those users to get a browser from this century, Ok? <s>

Do you need secure transactions? If your site captures sensitive information like credit cards - definitely. If you're using a custom password scheme with passwords entered on HTML pages - probably. For general applications? Probably not.

Secure transactions are easy to implement. You simply use the https:// prefix instead of http:// to reference links with. But secure transactions are much, much slower than non-secure transactions. Therefore it's a good idea to use secure transactions only when you need them. For example on the Surplus site, the site runs in secure mode only when actually capturing the order information from the user and for some of the maintainence taks – all other site operations run non-secure.

Building complex Web sites typically involves more people than just programmers. Web applications tend to bring together a variety of skills

Considering the volume and income generated by the Web sites this staff is rather modest.

In this environment where multiple people are involved in the development process Source Control is extremely important to make sure integrity of code and HTML documents is kept intact. Source control is applied on the Visual FoxPro project and the custom ISAPI DLL extensions to the Web Connection framework, as well as the HTML pages. Graphics are not under Source Control for graphics.

HTML generation is probably the most 'different' aspect of Web application development compared to traditional desktop applications. At Surplus it was extremely important to work closely with the HTML design team in creating pages that could be visually maintained by the design staff. It wouldn't have been sufficient to build a FoxPro backend application that does all of the HTML generation internally. Instead the tools needed to provide a mechansim for mixing HTML with minimal code/expression syntax so dynamic information from the database could be displayed on the HTML.

Today you have many options to build HTML based applications whether you use a script based engine like Active Server Pages or a code based engine such as Web Connection.

At Surplus a combination of code and scripting is used. All requests fire a method inside of a class that runs to process the mainline business logic that needs to occur on a request. When the code is complete it calls an HTML page stored on disk and embeds FoxPro expressions into the page. The Script page uses an Active Server like scripting language (using different tags from an older version of Web Connection) to allow embedding of simple expressions like field names or PRIVATE/PUBLIC variables into a page. Any valid string based FoxPro expression can also be embedded – this includes FoxPro native functions as well as User Defined Functions (UDFs). In addition, blocks of code can also be embedded inside of the page, but this is avoided at Surplus due to speed issues with interpreting the code at runtime.

For Web applications HTML is the front end to the user. HTML usage can be simple using basic HTML at the lowest common denominator so all browsers can access pages, or can be advanced taking advantage of the most recent browser enhancements actually embedding advanced functionality on the client side in the HTML page.

At Surplus the focus is on making the page run on as many browsers as possible and creating pages that are small to download, so HTML extensions and scripting are kept to a minimum. This has changed recently as some interface scripting has been added to pages to allow for basic visual effects such as changing buttons etc.

Even the newest HTML standards don't provide the same functionality you'd expect from a typical GUI development environment. DHTML introduced in IE 4.0 takes a huge step in the right direction, but currently building complex forms and user interfaces is a far cry from using say the Form Designer in Visual FoxPro. The event model in the browser is also more limited and trapping events and responding to them is a little more complex and can require a fair amount of code.

Pure HTML makes no provisions for data connectivity! If you're dealing with typical Web server based Web applications like Surplus Direct you're seeing an application that's all driven by the server. The server generates the HTML for a page and recreates the entire page whenever the user makes new choices and updates.

Again, DHTML makes provisions for data connectivity, but at the cost of substantial installation on the client site, which is usually not an option for public commercial applications – noone wants to wait around for 20 minutes to download a set of data ActiveX controls and the client side ADO engine at 28.8k. Most of these technologies also require IE 4 exclusively, which is leaving out a large portion of the market.

The bottom line is that commercial sites will continue to be driven by heavy server side applications that rely on the server accessing the data, generating HTML from the data.

Whenever possible try to build your application in such a way that business logic and HTML are clearly separated and don't reside in the same place. If you're using a tool like Web Connection or FoxISAPI, that should be easy as most of the code will sit in a VFP project and most of the HTML will sit in pages stored in a Web directory. If you're using Active Server Pages it's easy to get pages that heavily mix HTML and code which is a bear to maintain. With ASP it is a good idea to create 'code modules' as ASP include files or use ASP pages that act as router pages that contain code to perform logic and route off to the actual display pages.

The reason for all of this is two-fold: For one it's easier to maintain code in a code environment! I don't care how much better Visual Interdev has gotten in the latest rev it has nothing on the VFP or VB development environments in richness. Also, even with syntax color highlighting (which helps a lot) it's difficult to have to look through a huge HTML page, just to find that 2 line snippet of code that was embedded in the middle of the page. Keeping the code out of the way is also important when passing of pages to the HTML team. Most of the HTML team probably don't know how the database logic works – nor should they have to look at it and be tempted to mess with it. There are some useful things that should be accessible to designers, but this should be kept to a minimum. Typical that should be accessible are database fields, some known (and hopefully documented) variables that might be required and maybe basic operations for handling HTTP headers like redirects and Cookie read/write etc.

I'm a little biased to the code drives the HTML approach of development and this is the approach that's used at Surplus. Basically you have an application that handles each request and then branches off to a script page to handle display of the HTML.

This functionality is actually implemented at the Visual FoxPro code level within the Web Connection Framework that handles the script parsing.

Whether you use Web Connection or Active Server Pages, scripting is a necessary part of development. Scripting makes it possible to keep the display logic in an easily maintainable medium of a simple text file that can be edited and updated simply by copying the file to the Web server. Imagine that every time you change an image in a page you had to recompile your application…

When running high volume site any downtime is a problem that can drive away customers. Hence, it's important to get through any administration tasks as quickly as possible to keep the site running at full operation.

I've shown a lot of functionality in this document and related it to Visual FoxPro and Web Connection, because that's what was used for the Surplus application. So the table below compares some features/functionality of Active Server and Visual FoxPro to put some of the issues discussed into the context of Active Server. Keep in mind that I'm a little biased as the author of Web Connection, but I do believe the points made here are very valid and fair.

Visual FoxPro	Active Server Pages
Business Logic lives mostly in VFP, optionally in COM and minimally in scripted pages. It's possible to test and debug applications without using COM, which makes the development process in VFP a lot easier.	Business logic lives in Scripts and COM objects only. Typical ASP applications keep a lot of business logic in scripts. For more complex operations the only way to extend the functionality is to use COM objects – objects that cannot be unloaded without shutting down the Web server or Web 'application'.
Code drives HTML When using a VFP based solution like Web Connection or FoxISAPI the focus is on using code and objects to address the business logic. The environment encourages working within the development environment and using classes to access business logic. Objects created in code can even be passed forward into scripted pages. HTML and scripting is used more towards the end of displaying the results, although you also have the option of mixing code and HTML. The environment does not encourage this though.	HTML drives Code Active Server relies heavily on scripting to tie together logic. Since HTML and scripts live in the same page it often turns out that the HTML is the driving force of the page using the scripting to figure the display. In my opinion this is backwards and violates mixing business rules with interface code. This type of implementation can be avoided with ASP, but the architecture certainly doesn't encourage it.
Easy HTML generation requires a framework If you're using a VFP based tool you either need to build your own library of high level functions or use a framework supplied by the vendor. This can be good or bad – some tools provide lots of functionality, but more importantly you can use VFP to extend the framework in anyway you see fit.	All HTML works through the scripting engine With ASP the scripting engine and ASP's built in objects allow creating of output and retrieving form and server data. It's built-in and the engine provides just about all the basic functionality you need. ASP does not provide generic page generation for data displays and other high level functions – you have to build that yourself.
Easy development and debugging inside of VFP Building applications within Visual FoxPro makes it possible to use VFP interactively and even debug live requests within VFP including setting break points and stepping through code. Errors show and can be fixed right away.	COM development is complex with no way of debugging live components. Scripts not conducive to lots of code Objects can only run as COM objects and cannot be debugged while running inside of the IIS process. Some debugging for complex object passing cannot be debugged at all as ASP's intrinsic objects are not available for you to test with outside of the IIS environment. Debugging components without a debugger is a drag!
Code updates require a recompile. Scripts can be updated online at any time. Servers created with VFP require recompilation and updating online. With Web Connection it's possible to not shut down the server to update the server. Scripts are just text files and can be updated at any time.	Scripts can uploaded at any time. COM objects require a server shutdown This is probably the strongest feature ASP has going for it: Updating a script is as simple as copying a file and this encompasses both HTML and code. Things get more complex with COM objects though – if you use and update them you have to shut down the Web server or the Web application at least.
Fairly complex one time setup Setting up a VFP based solution is fairly complex as it involves properly registering servers and making sure all the paths are properly set and configured. Troubleshooting setup issues can easily frustrate new and experienced users.	Easy setup for scripts. Complex setup for COM objects. Scripts are easy – set Script rights to a directory and install the files and you're off and running. You need an ODBC data source for data access, but that's about it. COM object setup is complex as you have to deal with security issues and proper server configuration. Since ASP works efficiently only with InProc servers your servers have no UI and are difficult to debug if there's an error at startup.
VFP Servers can scale much better As discussed previously, the pool managers in Web Connection and FoxISAPI provide some of the best ways to scale Visual FoxPro applications to multiple servers both on local or remote machines. A properly designed application can also run on multiple Web server simultaneously.	ASP can't scale to multiple machines ASP cannot run on multiple machines and maintain context information such as session objects and object references. You can however call remote components, but configuration for this is tricky. You'll need to use Transaction server to make this work right and get security settings configured correctly and still be able to have a scalable server.