Building Large Scale Web Applications
with Visual FoxPro
By Rick Strahl, West Wind Technologies
http://www.west-wind.com/
An Example:
Egghead.com's Surplus Direct Site
Site Statistics
Site Configuration
Web Development Issues
Performance
Data Optimization
To SQL or not to SQL
Code Optimization
Web Site Optimization
Web Server Optimization
Security
Scaling applications
Visual FoxPro and
MultiThreading
Apartment
Model Threading in 6.0
Microsoft
Transaction Server
Pool Managers in Web Connection and
FoxISAPI
Scaling and Loadbalancing
Scaling with multiple
processors
Scaling with multiple
machines
Scaling with redundant
servers and IP routing
The Development Process
Team Development
Source Control
Integration of Code and HTML
Overview
Find out about common issues involved in building Web applications that have large hit
loads and require a team of developers, artists and HTML designers to be built.
This paper includes discussion of Web development in a team environment and integration
of Visual FoxPro into this environment. You'll see how performance and scalability affect
Visual FoxPro development with discussion of performance, scalability, load balancing and
tuning applications and the Web Server (Internet Information Server). Other topics include
security for Web applications, site maintenance and management related to Visual FoxPro
and keeping track of site statistics.
This document discusses the following:
- Example of a high volume Web site Egghead.com's Surplus Direct Site
- Discussion of implementation and development issues
- Application and data Performance
- VFP data vs. SQL Server
- Web Site and Web Server Performance
- Scalability
- Several load balancing scenarios
- Security
- Team Development for HTML and code
- Site Administration and Statistics
- Functionality comparison of Web Connection/FoxISAPI vs. Active Server based applications
What's Large Scale anyway?
There are many types of applications that can be built with Visual FoxPro. When it
comes to building applications that perform in a high transaction environment and require
sophisticated development scenarios the rules of typical development change somewhat. Most
successful commercial Web applications usually fall into this category.
When I talk about large scale Web development realize that this is a relative term that
depends on your particular environment and how you approach an application. In general I
consider large scale based on two particular issues:
- Web Site Operation
- Development Process
A site can be very heavily used and thus fall into the large scale bucket by the sheer
transaction volume and load issues it must deal with. Other applications involve a very
complex business environment that requires the sheer size of the application to be large
scale. Some applications have both.
Web Site Operation
This is probably the most common aspect people look at when judging the 'size' of a Web
application. What's the volume of traffic, how many hits to this or that page, how many
backend hits, how many records go into the database each day. These issues are extremely
critical for server based applications as they need to be carefully balanced against the
capacity of the system(s) that are running the application. Overload the system and you
lock up the Web site with disastrous results and loss of money.
- Performance and Scalability
The typical measure for performance is how much traffic is occurring on the site. One
has to be careful when looking at these numbers to understand what they mean especially
when comparing them to other sites and applications. Scalability deals with balancing the
load that the traffic incurs on the hardware its running on. This can be a very daunting
task especially if your needs go beyond a single server machine. We'll talk more about
performance and scalability in detail.
- Data Volume
Many large scale Web sites also generate huge data volumes. Dealing with super high
transaction volumes on the data server is a critical issue when designing applications as
contention and locking issues are much more likely to come to the forth than in typical
standalone business applications. Data management also becomes a very serious issue as you
may have issues storing all the information that's being captured. For example, one of the
sites I work with generates log entries for every hit incurred and every visitor coming to
the site along with every order placed. A typical day may generate close to 1 million new
records in the database storing and moving this data is difficult to deal with
especially when there's 'no good time for maintenance' on the server.
- Site Administration Issues
As with data issues site administration is also a big issue. Adding new software or
hardware can put a serious strain on a busy, live application. Web sites are expected to
be up 24 hours a day/7 days a week and any glitch can loose valuable customers. There are
also software issues here how to update code seamlessly in a distributed
environment, how to handle errors and logging etc.
The Development Process
Performance is the glamour spot when talking about big applications, but at least as
much attention and effort needs to go into tuning the development process as there is for
performance and scalability.
- Scope of Business Logic/Code
Obviously the bigger and more complex an application, the more resources in terms of
personal has to be thrown at it. Building complex applications is always a difficult task,
but in the Web environment there are additional issues of performance, scalability and
security to deal with.
- Complexity of HTML Design/Graphics
Building the actual HTML interface for an application is usually handled by a team
separate from the development team and as with code, the more expansive the site the more
complex this job becomes. Many commercial sites also constantly customize HTML to keep
their sites interesting so this aspect continues on way beyond the initial development.
- Integration of Code and HTML
The full application is really a mixture of both the logic elements and the visual
front end, which consists of HTML and graphics in most cases. Integration is a crucial
issue Coders will need to know a little about HTML, the graphics staff will need to
know a little about code.
- Team Development
In order to pull all of the groups together it's vital to let everyone have access to
the same code, html and data. Version control software is a requirement to make this work
well. Setting up an environment that allows for testing and deployment is also a must in
order to make the testing process separate from the online application as well as making
it easy to duplicate the test environment on the production servers when time comes to
deploy.
Egghead.com / Surplus Direct:
An Example Site
Egghead.com is on its way to becoming one of the biggest computer/software resellers on
the Net. Egghead.com consists of three sites the main Egghead site, Surplus Direct
Discount Warehouse and Surplus Auction. The two Surplus sites which were acquired by
Egghead last year. Both Surplus sites are running Visual FoxPro based applications to
drive the Web sites.
All of the sites are ranked within the top 20 of the busiest commerce Internet sites.
In order to demonstrate Visual FoxPro in a live, high volume application I want to show
the Surplus Direct Discount Ware site. This is a fairly straight forward shopping site
that allows reviewing and ordering from a catalog of inventory online.
This site sells previous version software and hardware - stuff that is 'surplus' to
manufacturers and other distributors. Items are sold at rock bottom prices which are
advertised via a printed catalog that's sent out by mail, ads in all the major Computer
hardware magazines like Computer Shopper.
The company uses one of the heaviest advertising plans on the Internet to promote their
site which has been rated as low as #5 Commercial site on the Web by PCMeter (a popular
Web visitor rating service) with corresponding traffic on the site. Surplus runs large
scale Web advertising programs on Yahoo, Netscape, InfoSeek, Excite, Shareware.com and
several other of the highest volume sites that take advertising online.
Online catalog of hardware/software
Inventory catalog
Site displays inventory of between 2-3,000 items in various categories.
Online, secure ordering
Visitors can pick up items for ordering and purchase items securely using SSL encryption.
Email confirmation.
Online Credit Card Validation
Credit cards are validated online via an ISAPI extension and the validated response is
processed in VFP for order completion.
Electronic Software Download
(feature has been temporarily removed pending vendor agreements)
A special sections of the online site allows purchase of items that can be
downloaded immediately. The Visual FoxPro application interfaces with an extended version
of the Web Connection ISAPI interface that handles communication with the CyberSource
third party packaging and authorization software running on a dedicated ESD server over
the Internet.
Extensive custom promotional features
The site includes a number of promotional features to 'lure' potential customers. Free
items advertised on banners, free items with orders over a certain dollar amount, rotating
banners on the site, weekly specials displayed in frames, email specials, featured items,
hot items list etc.
Sub Sites
The Surplus site often features specific vendor sub-sites. For example, a special
build your own Everex computer was running for a few months. There are frequent vendor
'plug-in' apps that run and can be hooked in with minimal effort.
Extensive Site Management tools
Total Remote Administration
Most aspects of the site can be administered remotely via an
HTML Admin interface that allows both for server management features as well statistics
and maintenance operations on the data such as data transfer with the HP mini.
Detailed Site and User Statistics
The site keeps track of detailed information about individual
hits and shopper information in order to determine traffic patterns. Shoppers are tracked
through the site anonymously and valuable information about where they came from, how much
traffic they generate is tracked into a shoppers table. The information can be displayed
at a glance in online graphs as well as is exported for more detailed daily reports that
are run and presented in Excel.
Customization Tasks
Several tasks such as rotating Banner administration and special
displays are also handled through the HTML interface. Site designers can use the HTML
interface to add/delete items for these tasks to allow a fluid interface for site
administration.
Data Updates
The data uploads and downloads to and from the HP mini are also
administered through HTML interface. Orders are exported once every hour and inventory is
imported 3 times a day.
Running offline Web site
Both of the Surplus sites run as 'offline' Web sites, meaning they don't access the
main business application directly.
HP mini Point of sale system
The main company application runs on an HP mini computer and the Web site is running
offline from the mini. Data is transferred several times a day to update inventory on the
Web and import orders captured online.
Security issues
The offline view serves as a security buffer from fraud. Orders go through a rigorous
3 step input validation process before being taken into the Point of Sale system. Web data
entry has serious fraud potential so extra steps are taken to minimize fraud.
Accuracy issue
Web based data entry is often not as accurate as that taken by a qualified phone
technician in the phone center. Although the Web site provides extensive validation, there
are some things that can't be checked online such as obviously bad order amounts etc. Web
orders are scrutinized to a detailed import routine that rejects orders based on stringent
rules. VFP Conversion program exports to Mini import format.
Site Statistics
Typical: 250,000 server hits/100,000 visits
Peaks (exclusive front page ads on Yahoo and Netscape):
3 million total pages a day
500,000+ VFP server hits a day
28,000 VFP server hits per hour
200,000 unique visitors per day
Application is outrunning three dedicated
T1 lines
Site Configuration
The Surplus site runs on several separate machines in a server pool. Each machine is a
fully functional Web server plus HTML and Visual FoxPro backends. Each server is fully
redundant if one fails it'll drop out of the server pool, but the site continues to
run. Data is stored in SQL Server on a separate server.
3 Dual Processor Pentium Pro 200s
- 1 Dual Processor Tandem Redundant Data Server
- 256 megs memory/dual 2 gig SCSI
- T3 direct Internet connection
- Windows NT Server 4.0
- Internet Information Server 3.0
- West Wind Web Connection
- Visual FoxPro 5.0a
- SQL Server 6.5 data backend
- Active Server Pages for non-data tasks
Web Development Issues at Surplus
Let's take a look at some of the development issues that need to be dealt with when
building applications for this high volume Web environment. We'll look at the following
topics in more detail:
- Performance
- Scalability
- Security
- Team Development
- HTML application development
- Site administration
- Site statistics
Performance
Performance is extremely critical in high transaction environments. Any application
needs to run as fast as it possibly can, but for high transaction environments tuning and
making sure code runs at its optimum is crucial as slow requests can tie up valuable
resources that might be needed by the next request in line. There are a number of areas
that should be focused on.
Data Optimization
Since we're dealing with database applications here data optimization is the most
important piece to deal with. Database operations tend to be the slowest operations in any
Web application and also the most resource intensive, so optimizations here can bring the
biggest benefits.
- Rushmore Optimization
If you're using local data, making sure that all of your queries take advantage of
Rushmore is probably the best thing you can do to speed up performance. Make sure you set
up indexes to match your queries and make sure the queries match the indexes exactly.
Also, make sure you have tags on DELETED() if you're running with SET DELETED ON. Take the
time to review performance and SQL optimization with VFP's ShowPlan function SYS(3054).
- Pull only data you need
When using local data it's often easier simply run SELECT * queries and often perform
further filtering in secondary queries as it's simply more convenient in many cases.
However, pulling that extra data takes time, especially if you're using remote servers.
Make sure queries bring down only the data you really need.
- Avoid Updateable views for queries
Updateable views tend to be quite a bit slower than Read-Only views. If you're running
data queries that use the same views as some update routines, consider creating separate
views for the query update to localize the overhead of the updateable view to the
operation that really needs it.
If you can skip views altogether you'll gain even more performance although it's not as
drastic as with Updateable views. SQLPassThrough for remote servers and local data access
are always faster than views.
- VFP is much faster than SQL backends
A question of whether to migrate an application to a SQL backend invariably comes up
when dealing with high performance applications. Keep in mind that VFP has an amazingly
fast data engine if speed is what you're after VFP is the best choice. For server
applications use of Visual FoxPro as a local data source is very appropriate as you have
only a small number of simultaneous VFP clients access the data at once. Data is usually
local to the machine the server is running on or on a fast, low volume network the
connection is not from user to the database, but from the VFP server app to the data. The
environment is also very controlled so many of the data integrity issues that can crop up
in typical network applications with large clients are not so much an issue.
There are good reasons to migrate to a SQL backend, but many applications may not have a
need for them. I'll discuss pros and cons of SQL in the next section.
- Minimize long requests or offload them
Transaction based systems work best with short requests. Long requests can tie up
valuable resources and load up the CPUs, which can bring a server to its knees. Whenever,
long requests should be offloaded to other servers (a special maintenance server maybe) or
possibly a SQL server. If you have to run maintenance operations regardless
To SQL or Not To SQL
Using Visual FoxPro Data
- Native data is great for server backends
For server applications use of Visual FoxPro as a local data source is very appropriate as
you have only a small number of simultaneous VFP clients access the data at once. Data is
usually local to the machine the server is running on or on a fast, low volume network
the connection is not from user to the database, but from the VFP server app to the
data. The environment is also very controlled so many of the data integrity issues that
can crop up in typical network applications with large clients are not so much an issue.
- Performance is much better than SQL
Keep in mind that VFP has an amazingly fast data engine if speed is what you're
after VFP is the best choice. When we converted from local data to a SQL backend data
access turned more between 2-3x slower for short requests and up to 5-10times slower for
complex queries. Some of this can be attributable to remoting the data to a separate
machine, but even in testing against a local SQL server the performance drop can be
considerable. Plan for it!
- Problems only with batch updates against heavily used, live data
The Surplus site ran for over a year against local VFP data up to a volume of about
250,000 backend hits. The only problems that occurred had less to do with the volume of
requests, but with data updates. The site needs to import new Inventory data several times
a day and it was necessary to import it while the site was still running in live mode,
while people were reading this data. This tended to corrupt indexes frequently causing
mysterious crashes and data consistency errors. There are way to work around these issues,
but at the Surplus site these workarounds were not appropriate. In addition, when using
multiple server machines access to remote tables gets more complex in making sure that
network paths are available, properly mapped and accessible. ODBC and a SQL backend can be
easier to help centralize data access from this sort of distributed environment.
Using SQL Server Backend
- Improved stability
The main reason for the move to SQL at Surplus was for better stability. Getting away
from the DBF/CDX file structure has resulted in much improved uptime. The environment and
nature of online batch updates mandated this move to get around data consistency problems
encountered with native VFP data.
- 50+% performance loss (SQL and Net overhead)
Overall performance of the site applications dropped by about 50% when the move to SQL
Server occurred. This is something you should plan on if you make the move from local
data.
- Lightened CPU Load on Web server
Once you move to a SQL backend much of the data processing load migrates to the SQL
Server box, away from the Web server/VFP machines. Running a complex query on a VFP
frontend won't cripple the Web server as the CPU intensive query now runs on the SQL
server. The idle CPU power can go towards serving other requests.
The flip side to this is that the SQL Server is busy now. Which means it now processes
other queries slower which in turn increases CPU load on that machine. This is where load
balancing comes in to decide where processing load should occur.
- More complex data access issues
The move to SQL also brought very difficult issues with it in terms of data
administration. SQL servers are very hard to deal with for large scale update commands and
purges that occur on data that's in heavy use. Locking issues can make it impossible to
update data that's being operated on extensive use of stored procedures and
sectioned processing is often required to accomplish tasks like clearing out log files.
General administration of SQL server is also more difficult: Tuning the server, security,
maintaining data partitions, backup and performance monitoring are not trivial tasks and
require personal with either previous experience or training.
- Was it worth it? YES!
In retrospect, was it worth to give up a huge performance benefit of local data? Most
definitely in this case. Stability of the application went way up with much fewer problems
of the site getting locked or stuck with data corruption issues. Centralized data accessed
and management through SQL Server has also made the administration process more
centralized and allows automating some of the heavy data maintenance tasks in the middle
of the night.
Code Optimization
The next step in performance tuning deals with code optimization
- Timing requests and logging
The first step is to determine where bottlenecks lie. If you use any sort of Web tool
it's a good idea to log Web requests and their times. This will give you a rough idea of
where the slowest hits and request are. Web Connection includes built-in logging (which
can optionally be turned off) so with it you have an immediate way to judge request
lengths. With other tools this may be a little more difficult because they may not have a
central entry point that can be easily logged. In that case you'll have to create timing
code for each request manually.
- Profile your code
Once you've isolated bottlenecks examine the code and trace it down to the individual
code that runs. Is it code that can be simplified? VFP has many ways to accomplish some
tasks - is there a way to use different commands that might be faster? Are you
unnecessarily calling functions or methods repeatedly that could run inline? Are you
creating objects that might be better served as function calls?
- New Coverage Analyzer can really help!
VFP 6.0's new coverage analyzer can help you determine bottlenecks in your code. The
analyzer summarizes the data that is captured by SET COVERAGE ON.
- Generic framework vs. hand optimized code
If you're using a framework that helps you with HTML generation realize that some
powerful functions may not be the fastest way to accomplish things. Most of these
functions have to be generic and thus have to test the environment and often work on
dynamic determination of types and so on. In other words, framework code is almost always
slower than hand optimized code. For example, in Web Connection there are several routines
that can dynamically create HTML displays of VFP table/cursor data. The code has to check
the type of every field and then decide how to display it based on that type. Compare that
to simply saying lcOutput = lcOutput + Tquery.Company no surprise that the straight
code will as much as run twice as fast.
Just understand the tradeoff between hand coding and and framework code. Sometimes the
functionality benefits will outweigh the performance here, but if your request needs to
run faster and everything else fails this can be the boost you might need. Again,
profiling will help in this determination.
- VFP6 String operations drastically improved
Visual FoxPro 6.0 has drastically improved string operations that involve
concatenation. Repeatedly calling commands like lcOutput = lcOutput + lcString can run several
hundred times faster than it ran in VFP 5 on large strings. In VFP 5.0 file output was
almost always faster than string concats this may not be the case any more in VFP
6.
- Class instantiation
Class instantiation is still one of the slower operations in VFP even though VFP 6 has
slightly improved performance here too. Consider creating objects and keeping them around
for reuse rather than recreating them all of the time.
The smallest and quickest loading class in VFP continues to be the non-visual RELATION
class. The fastest loading visual class is Custom.
Also keep in mind that method calls are considerably slower than UDF() calls.
- Stay away from Macros and Eval()
This one's pretty obvious, but it bears bringing up. This is somewhat related to the
'generic' framework code, which often has to rely on EVAL()s in order to dynamically
determine field or object names. Eval() and Macros are extremely powerful, but try to
avoid them as much as possible. Web Site Optimization
Web Site Optimization
Home page optimization
The home page of a busy commercial Web site probably gets upwards of 75% of all the
traffic that occurs on the site, so it's fairly important this page is as optimized as can
be. If the page can be static, by all means make it so. If it's mildly dynamic (capture
the referring link, assign a user Id etc), consider using an ASP page and storing the info
in a Cookie. Anything to avoid database access or otherwise calling a backend server on
the homepage. This is not the case at Surplus, because data access is unavoidable because
of the volume of data that is tracked about the user.
Also important is size of the HTML and images keep it to a minimum to avoid
clogging the network. The Surplus homepage, even with all those graphics on it comes down
with less than 45k
Optimizing site flow to minimize hits
If a site is laid out well it takes fewer steps to go where you need to go and get the
job done on the site. Fewer steps means fewer hits, means less load on the site. This
means trying to avoid intermediate menus, and clear flow that sometimes combines multiple
functionality on a single page.
Use ASP or HTM for non-data logic pages
On the Surplus site there are a number of pages that are simply display pages
hand customized pages that don't display anything that comes out of the database. These
pages are ASP pages ASP is used to handle the user id passing that's required to
track users through the site. Other sites may not even need ASP and can use plain HTML
pages instead.
Can you use generated static pages?
We're all database people here but sometimes it might be better to not think in
terms of data retrieved from a database. Really <s>. A lot of CPU cycles can be
saved if pages that rarely change are pre-generated to a static page and then simply read
from disk by the Web Server instead of being dynamically generated.
At Surplus category lists are a good example. These pages change maybe once a week, so
there's really no need to generate them each time somebody views them.
Minimize maintenance operations
The biggest killer on the Surplus site are maintenance operations that can run
requests that take up to 5 minutes to run (worst case). These operations put extreme load
on the backend server so the best thing to do is to isolate them and have them run at
night.
Web Server Optimization
Following are a few suggestions for improving performance on IIS 4.0 that can be
important for heavily loaded sites:
- Reduce Connection Timeouts
By default IIS sets up a connection timeout of 900 seconds. That means with HTTP 1.1
once you connect to a site your connection with that site maintains for 15 minutes even if
you go to the homepage and no further. Connections eat up valuable resources, so it's a
good idea to reduce this number to a more reasonable value. Be careful though: The
connection timeout must accommodate your longest actual request including maintenance
operatons. If you set the value to short your maintenance request that exceeds this
timeout will return an error.
- Limit connections for busy sites
If you ever get to a point where you are just absolutely overloaded with traffic that
you are basically bringing your site to its knees consider reducing the number of
connections that can be made to your site. While you'll be failing users with 'Server is
too busy' messages from the Web server at least you'll be able to service the request that
are making it in. Nobody is served by you accepting 10,000 connections that choke your
server, but may 5,000 connection let you run and keep at least half of the traffic happy.
This value is set way too high for typical NT based application server systems I
doubt any single NT Web server running a database app will handle more than 5,000
simultaneous connections.
- ASP options
- Turn off ASP Sessions if not used
If you're not using ASP or you're using ASP but not ASP Sessions, turn them off!
Sessions cause significant overhead on the Web server and assign cookies to clients
entirely on their own.
- Turn on Page Buffering
If you are using ASP turn on Page Buffering. Page buffering builds pages in their
entirety in memory or cache prior to dumping the output back to the Web server. This is
typically much faster than the direct WriteClient() approach that ASP uses normally. It
also gives you more flexibility: Only with buffering can you easily modify the HTTP header
and cookies after some text has already been output.
- Limiting threads created by IIS
IIS creates new threads for each ISAPI request if one or more requests hang
it's very easy for IIS to generate a huge number of threads very quickly. If your system
runs with too many threads it starts to slow down drastically. There's an option you can
set in the registry to prevent this from happening by telling IIS to limit the number of
threads it creates. I suggest no more than 50 threads per processor Note: IIS uses
24 threads internally so you'll want to add that value to the count. Works in IIS 4 and
HKEY_LOCAL_MACHINE\SYSTEM\
CurrentControlSet\Services\InetInfo\Parameters\PoolThreadlimit
HKEY_LOCAL_MACHINE\SYSTEM\
CurrentControlSet\Services\InetInfo\Parameters\ThreadTimeout
Both values are DWORD. Set the timeout to 15 minutes (900) or so
You can use the
IISReg utility on the CD to set these two value. IISReg also configures the IIS 3 ASP
options from above, but doesn't affect the IIS 4 ASP options.
Visual FoxPro and Multi Threading
ISAPI is multithreaded - VFP is not!
IIS and ISAPI run in a multithreaded environment that allows many simultaneous
requests to happen concurrently. Visual FoxPro on the other hand is single threaded and
must therefore simulate multithreading.
VFP 6 improves scalability somewhat!
Ok, so Microsoft has hyped VFP 6.0 to be a good COM citizen by supporting
Apartment Model threading and support for MTS. While VFP 6.0 is indeed compliant for
building Apartment Model Threaded COM objects and objects can be hosted inside of MS
Transaction Server, the concurrency issues we faced with VFP 5 COM servers continue to be
a problem.
The problem is the way that the VFP runtime is implemented. While it's now possible to
load multiple instances of an object into the same process it's not possible to get both
instances of the same object to operate simultaneously. For example, if IIS loads two
instances of your object from two simultaneously accessed ASP pages only one will run at a
time the second request will be queued. The following scenario will present a
problem: Assume Method1 runs a 15 second query and Method2 runs a .1 second string result.
If Method1 gets fired first the call to Method2 will block for 15 seconds before the
second user gets a result page. In addition, this blocking will cause requests to start
backing up to the point where the server may never catch up.
Without concurrency it's extremely hard to build scalable
applications even in medium load scenarios.
- Apartment Model Threading
in VFP 6.0
Apartment Model threading makes it possible to load multiple InProcess
components simultaneously. Visual FoxPro can manage multiple simultaneous 'apartments'
threads of your object in memory simultaneously one for each active client. Each apartment
is guaranteed to get called from the same thread context that created the server in order
to provide a thread safe environment to run your application in. Conversion of a tool as
complex as VFP to this threading model is no trivial task, but the benefits gained are a
huge improvement over VFP 5 inability to run more than one instance of a given InProc COM
object in any process. Unfortunately, multiple instances of the same object in the same
process (such as IIS) cannot operate simultaneously as method code is blocked by the first
method that gets control. Even though IIS may have multiple object references it can only
run one method of a given server at a time. You can however get two separate servers to
run inside of IIS and get them to operate simultaneously. The figure above demonstrates
the concurrency model where scalability can be achieved only by breaking out objects into
many objects in order to circumvent the single server issue.
- Support for Transaction Server (MTS)
Apartment Model threading also makes it possible to run VFP components
inside of Microsoft Transaction Server. Unfortunately, the blocking issues also affect the
way VFP components run inside of MTS. Just as with objects called directly from a client,
the issue is now migrated to the package level instead of IIS being the client
process now an MTS Package becomes the host object. The same problems arise as the same
object in a package cannot be accessed simultaneously due to the problems with the VFP
runtime blocking. Concurrent method calls can only be achieved by objects hosted in
multiple separate MTS packages. With MTS this blocking issue is even more pronounced than
with directly accessed objects as all object access on the machine is routed through the
MTS package. This means that multiple client processes (say the IIS Process and a VFP
application both using this object on a server) now have to fight for contention.
With all of these problems what does Transaction Server provide? Transaction server
provides a wrapping layer around COM for providing limited scalability features,
distributed and multi-step transaction control for client side code, a simplified security
model that allows configuration of InProc servers on the component level with 'roles' and
resource dispensers for system services such as ODBC.
Frankly, I'm not impressed with the functionality that MTS provides even when used with
objects that do support full concurrency. But a discussion of scalability wouldn't be
complete without at least mentioning this topic. My concerns are these:
- MTS in its current state does not provide server pooling it only
provides server deactivation and reactivation of unused servers without the client knowing
about it. Works like this: You connect to a server, then call MTS's SetAbort or
SetComplete method, which releases the server's context and actually causes MTS to release
the server. If you call the server again with the same reference MTS automatically reloads
the COM object and then calls the method. Lot of overhead here, but it can be useful if
you have huge numbers of simultaneous clients who have persistent connections to your
server. Typical Web Applications don't work this way though current models use
either a limited pool of client connections or each request loads and unloads the server
on its own (CreateObject and Release, which is typical for ASP applications). The only
scenario that would fit the MTS activation model is assigning object references to ASP
Session vars and keeping the reference around, but this would probably be as slow or
slower than loading and unloading the object on each page with no gain in scalability at
all.
MTS is a two step COM interface that consists of a Proxy and an OutOfProcess server that
hosts your InProc COM object. Your component runs InProcess, but the MTX context proxy is
called as an OutOfProc component from the client, which in turn routes the call to your
actual component. If you get the feeling this would be slower than a direct call you're
right. InProc components are noticably faster with direct calls and even OutOfProc calls
end up slightly faster than MTS. Keep in mind this affects only the calling process
(method calls plus parameter passing and property assignments), not operation of your
server itself. That will always be the same.
Server pooling is a feature that's documented but not implemented yet in MTS 2.0. In my
opinion, this is the most important feature that makes scalable applications possible. If
properly implemented this feature would allow specification of a number of these objects
to be loaded and each time a request is made for a client MTS would hand of one of these
already loaded references. Configuration would allow to specify how many objects to load
and on which machine to run them on along with potentially load balancing these based on
load on each machine. It's a mystery to me why MS has not implemented at least the basic
pooling functionality now it's only a slightly more complex step from the limited
activation/deactivation scheme in use now
Problem here is that if VFP does not improve the concurrency model it will not be able to
take advantage of object pooling once it becomes available.
- ODBC resource dispensing is now available directly in the ODBC drivers, so MTS isn't
required for this at all (although this is a frequently mentioned 'feature').
- When it comes down to it, the only features of MTS that seem to make real sense are:
Distributed and two step transaction, role based security and easier deployment of
components with automatic registration and security configuration. But for scalibility or
speed MTS is not really bringing much to the party in its current state.
Pool Managers in
FoxISAPI/Web Connection
Web Connection and FoxISAPI include a built-in pool manager that
allows managing multiple object references and handing these references to clients as
needed. If one instance is busy another reference is handed off if all instances
are busy requests are queued and serviced when a pool reference becomes available. These
components run best as Single Use Out of Process EXE servers and allow for immediate
deployment on local and remote machines via DCOM. While Out of Process components are
slightly slower (in this scenario), call overhead is measured in fractional milliseconds
per Web request. EXE servers also afford better stability, process isolation that won't
crash the client (IIS) on an error, and a configurable security environment.
Currently, this model provides the best scalability you can achieve with VFP COM objects
at Surplus over 50 servers run and service a single application.
Pool of Single Use EXE or InProc DLL servers
For scalable Web applications the pool managers built into these tools provide much better
control over the load placed on a machine by allowing a pool of objects to be created and
references being passed off out of that pool. The benefit here is that the servers are
already loaded and only direct method calls occur. ISAPI directly talks to the pool
manager, which is built into the ISAPI extension. The pool is also managed and kept to a
predetermined size so if your server gets too busy you don't overload your machine with
too many simultaneously running servers that would kill the server on CPU load
pending requests are queued until servers become available.
Full Admin Control over servers
Since the ISAPI extension controls the pool manager it's possible to control via basic
functionality built into the extension. Both FoxISAPI and Web Connection provide the
ability to unload and reload servers and to enter maintenance mode that allows a single
server to run for EXCLUSIVE data access. Web Connection also allows for online server
updates, automatic restarting of hung or timed servers, recovery from VFP server
exceptions and the ability to run in full Admin mode that blocks all server access except
for the logged in administrator without taking the site down.
These features sound really esoteric, but once you start running a high volume site you
realize that taking a site down just to replace a COM component is a major issue. Same if
you have to perform maintenance operations with ASP it's next to impossible to
guarantee single user (EXCLUSIVE) access to data the only way to fix this is to
stop the Web server and run a separate component (or the same component) while the
server's down.
Understanding CPU load and Speed
When examining load on a site it's crucial to understand how the application is
performing on a given machine. When talking about load we're mostly looking at the CPU
load that is incurred by the application. This load is affected by all system components
such as disk and memory, but shows itself most consistently in the level of CPU usage. As
disks get saturated queries slow down and use more CPU power to get to data. As memory
runs out more data is stored on disk rather than in memory cache and you get more CPU load
to access the data.
The key pieces to look at are:
- How many requests can you handle over a given period?
Load is determined by looking at a given number of hits over a given period at time.
The load that is incurred on the machine can be measured by the CPU usage that's incurred
for this traffic.
In rough terms this means a site that's running 1000 10second requests in an hour is
having similar load as a site that's running 20,000 ½ second requests assuming the sites
are running similar pieces of software and hardware. Actually, this not quite accurate as
the site running the 10 second requests is probably running much higher CPU loads for the
queries that are processing. So you see there's no easy way to judge an application's
load, but as rough guideline use, requests over time by request length.
- Based on No. of CPUs, processor speed and request processing time
Load is obviously affected by the horsepower of the machine it's running on. Two CPUs
can almost double performance (85-90% is more accurate). Faster disk and more memory can
also improve load capability drastically.
Most of the time the slower a request runs the more load it incurs on a server. While
running, the request is using up CPU cycles and database access.
There are exceptions to this rule however: If you're offloading processing to other
machines such as a SQL backend or server accessed via DCOM on remote machines you may
leave request handlers running at close to 0% CPU load while the remote machines are
chugging away.
- Additional instances help responsiveness, but not load on CPU
We've talked about running multiple instances and multi-threading in order to achieve
better scalability. Remember that multi-threading is not a silver bullet. It will not give
you better performance only better responsiveness.
For example: A request taking 10 seconds and a request taking 1 second can run at the same
time so the client waiting for the 1 second request doesn't have to wait 11 seconds for
his response. Bottom line is that while the 1 second request runs it might run a little
slower than it would normally because the 10 second request is already running so it
actually takes 2 seconds. The 10second request is also slowing down a little because of
the increased load on the CPU and now takes 11 seconds. You've provided better
responsiveness, but you've actually increased the total processing time by 2 seconds and
the total now takes 13 instead of 11 seconds.
Adding multiple simultaneous request will actually reduce overall performance as the CPU
or CPUs must schedule simultaneous threads or processes. Hence, the need to limit the
amount of simultaneously running operations. The more simultaneous processes run the
slower they get. Multiple processors can help in this scenario.
- Test your expected load!
The most important issue in relation to load is to be ready for it. If you're running
a growing Web site, you'll probably run into a situation at some point where the resources
of your configuration outrun your application or hardware you'll want to avoid this
as much as possible by testing your load capability and knowing how much you'll be able to
handle.
At Surplus we've had 3 occasions where we hit the wall with hardware and software. Twice
the hardware couldn't take the load resulting in locked servers running at solid 100% CPU
loads and two 100% maxed T1s (the full bandwidth available at the time). The other time
the traffic was so large that the backend servers simply could not keep up. The only way
to get a working site running again was to limit connections to 3000 at a time to allow at
least some of the traffic to succeed. This is something you want to avoid at all costs!!!
Test your expected load and be ready to add more hardware if necessary seems
obvious but this is a very common problem with growing Web applications that are running
their first big promotions!
- WebHammer
A cheap and simple tool to test your Web site with is WebHammer. It's used to
repeatedly 'hammer' a site with HTTP hits. You can set up up to 32 simultaneous threads
kicking out requests and you can run multiple instances on multiple machines to create a
variety of different requests.
http://www.genusa.com/iis/webhamr2.html
Scalability
With Internet commerce growing at over 100% each year it's very likely that a
commercial Web site will run into growing pains. Scalability issues come to the forth
especially when it comes to running applications that outrun the single Visual FoxPro
server and even more so when having to run more than a single server machine
simultaneously in order to handle volume.
- Web Servers are multithreaded Visual FoxPro is not -Multiple instances of VFP required
As previously discussed Visual FoxPro is limited to single threaded operation and
requires multiple simultaneous instances (either InProc or OutOfProc) in order to handle
concurrent request processing. Apartment model threading and the Web Connection/FoxISAPI
pool managers allow working around this issue as discussed in the previous section.
- Best scalability achieved with multiple processors on single
box
When it's time to throw more hardware resources at the application, the best way
to scale is to add more processors to the local box. Each processor gives about 85-90%
performance increase for CPU processing. IIS is multi-threaded and can internally take
advantage of multiple processors through the NT SMP architecture. Visual FoxPro can also
run concurrently on two processors especially if using OutOfProc components (untested with
InProc, but it should work with Apartment threaded servers as well). Because OutOfProc
components are essentially separate applications NT can throw processing on one or the
other processor. NT handles the details of scheduling and balances load on the processors
without any change of code.
You may want to play with the Control Panel's System settings for performance: Whether to
give the foreground tasks (Visual FoxPro which is not running as a service) or the
background tasks (IIS which is) more priority. I tend to run maximum foreground tasks
since the VFP servers typically do the hardest work.
Best load factor seems to be 2 VFP sessions per processor, plus IIS sharing the
processors, but that may vary based on your load and whether you use a SQL backend or
whether you call remote components.
- For additional scalability network machines can be used to spread
load
When one machine is not enough anymore, the next step is to move out to the network in
order to spread out the processing load across multiple machines. NT to date supports only
4 processors natively and the network allows you to go beyond that (NT 5 promises built-in
support for 8 processors). You can add additional machines that are accessible to the Web
server to handle application or data tasks. The most common step may be to move to a SQL
backend that handles data access on the server freeing the local VFP servers from a big
chunk of CPU access keep in mind that for short requests VFP still incurs a fair
amount of load for remote SQL requests incurred by invoking the ODBC connections. It takes
about .25 seconds or so for a SQL connection to drop CPU load to idle short
requests may not benefit from data offloading as much as you might expect. Once multiple
machines are involved it becomes important to have a localized data store and at Surplus
the move to multiple machines was one of the deciding factors in the resolution to switch
to a SQL server backend.
You can also offload application logic to other machines using DCOM to instantiate COM
objects on remote machines. For an application like Web Connection this means you can run
a pool of servers of the same object that runs both locally on the Web server as well as
on one or more remote machines. You can also offload processing more distinctly by calling
specific business objects directly on other application servers. So you may have the
Product Information Database sitting on one machine, the ship rate calculation engine
sitting on another and the order processing server on yet another. Either way offloads
some processing onto other machines.
Drawbacks to this approach are complexity and performance. Running components that run
both locally and on remote machines are difficult to configure so they correctly share
data and application paths. Also, getting them registered on another machine may difficult
luckily VFP 6.0 includes a new command CREATEOBJECTEX()
that
allows direct instantiation of objects on a remote machine without having to tinker with
configuring the server locally to run on the remote.
Performance is also an issue. Remote objects take a lot longer to load, and calling
methods in that server is comparatively slow as parameters and return values have to
travel over the network. If you're sharing common data there's also the issue of accessing
that data across the network often with UNC pathing which can be very slow. This approach
has another potential problem: There's still a bottleneck with a single Web server
servicing requests! If you get to a point where a single Web server can't take the
incoming requests there's no easy way to scale up.
Besides the drawbacks, scaling with COM over the network can provide significant
scalability as you are now offloading large amounts of functionality to other machines and
all but the heaviest usage sites on the Web can probably make due with a single Web server
scenario.
- IP Routing/Dispatch Manager can load balance multiple redundant
servers
So, where do you go when single Web server isn't enough? At Surplus this scenario
cropped up a while ago when even offloading to remote servers for processing was choking
the main machine with Web Server and ISAPI requests. The solution lies in using using
multiple servers simultaneously by employing either a hardware or software solution to
route IP requests to multiple servers. The way this works is that either a router or some
IP routing software is used to accept an incoming IP request and then routes it to one of
the available Web servers which are configured to service requests for this IP address.
IOW, you set up a pool of servers that can respond to a TCP/IP request for a 'phantom' or
'virtual' IP address.
The simplest of these mechanisms is DNS Round Robin routing, which is supported by most
DNS servers. The DNS server takes each request for an IP address and routes it to a set of
different IP addresses set up in the DNS registry in round robin fashion. DNS round robin
works, but if any one of the servers goes down you have the issue of getting some requests
that fail, which may be hard to catch if multiple servers are involved. Basically round
robin is a 'dumb' solution as it doesn't have internal knowledge of the network or
application or hardware. Thus if a server dies DNS will continue to include that server in
its round robin loop.
Hardware solutions involve routers that can handle this job at the hardware level. These
routers are also smart enough to see if a machine is down to skip that sequence in the
routing loop, but routers can rely only on hardware to determine the down status. Removing
servers from the pool is not quite straightforward and often requires fiddling with the
router or router configuration files directly. This can take time and often requires the
router to be reset which can take up to a minute to complete.
The third solution and the one used at Surplus is a software solution. A dedicated machine
running as a Dispatch Server serves as a dispatch manager that receives all incoming
TCP/IP traffic for a 'virtual' IP address (in this case this will be the main web site's
IP address). The individual pooled servers also run a piece of software that makes it
possible for the dispatch manager to talk to the Web servers and get crucial information
about their current status and load. This information includes performance information
such as current CPU load, network and disk load, concurrent connection and hits etc
IOW, it tells the dispatch manager how busy the box is. Based on this information the
dispatch manager schedules TCP/IP requests to the least busy servers and actually balances
the load. Since this is a software solution it's easy to move servers out of the pool
you simply bring up the dispatch software and tell it to unload a server from the
pool.
The software in use at Surplus is called Resonate. It works well now, although there
were some initial problems in getting it to work properly on the network The software
communication pieces are Java based (surprisingly). It is also very expensive as all of
the high end solutions appear to be in this hardware area.
Expensive or not, this solution has made it possible for the Surplus sites to grow at
staggering rates. There no longer is any worry about the Web Server or even the FoxPro
applications to be a bottleneck. If front end CPU loads get too high it's easy to add
another machine to the pool and provide additional horsepower. The front ends are also
fully self-contained! They contain the Web Server and an executable of the Server
application(s) as well as the HTML (which is a little different than the slide above),
which means any server failure will not take the site down. The decision was made to store
HTML locally as well to minimize network traffic and make each machine truly independent.
Of course, now the next problem will be to address the only bottleneck in this scenario:
The SQL Server backend. SQL Server is starting to show serious strain at processing the
data volume when traffic loads are high. While the 3 front end machines in the Surplus
Site run at typically 20-30% load the SQL backend is running at 80% and providing data
sluggishly. But that's beyond the scope of this discussion <s>
I would expect more solutions to become available for making IP routing available at lower
prices. It looks like hardware vendors are working on hybrid solutions that build software
into the routers that make it possible to tell a machine's state specific for a given
operating system.
Security on the Web
When building database Web applications security is important. You wouldn't want to
capture orders online including credit card numbers and then have somebody steal the
entire order/customer file with that sensitive information.
Security comes in many flavors and applies to different aspects of a Web site. Is the
information you pass over the Web safe? And how do you keep people from accessing certain
parts of your application?
Keep it simple - let NT do the work
Windows NT provides excellent, though somewhat complex security features that should
address the majority of your security needs. NT allows configuration of files at the file
level as well as the directory level. Web directories need to have Read and typically
Execute (or Script) rights set to allow Web clients to access the pages.
NT uses an account IUSR_ machine name to identify anonymous users to the Web site and
rights must be given to this user for any public areas that public users to your site
should be able to access. Beyond that however, make sure you remove any IUSR_ references
(they shouldnt be there in the first place), and also the Everyone account.
Also, be careful in playing with the rights of the IUSR_ account in User manager. When
working with IIS and COM it's very easy to give the IUSR_ account Admin rights to get some
security issues resolved, which is fine while developing just don't forget to undo
this setting once you put your site online.
Keep data in an unmapped path
Data security should be a top priority on your list. If you keep sensitive data on your
Web server first of all make sure that the data is not accessible via a relative path over
the Web. Ideally the data should reside in a totally off limits area away from the Web
site in an unmapped path. Even better if the data can sit on another machine and be
accessed over a non-TCP/IP network connection only you can just about eliminate your risk
for data piracy (at the cost of overhead for the network access). For extra security you
can also consider putting the data access over a separate network leg and use a non-TCP/IP
protocol on that leg to disallow access.
Setting rights on directory and files
If you must have data in a Web relative path so that the data can be downloaded via an
HTML link for authorized personal, make sure you set the proper password rights on these
directories to disallow anonymous access by Web users. If you use IE 3 and IIS, NT's
Challenge Response mechanism ties securely into NT's security system. With other Web
servers security of passwords passed over the Web varies.
File Security with NT Challenge Response Validation
NT supports NT Challenge Response for access to files, which means that if you're
accessing a page and IUSR_ doesn't have rights NT will try to validate your user account
through the local machine or domain if you have IIS configure to run through a specific
domain server. If you are a user of the local network you may not be prompted for a
password if you aren't, NT will request a login dialog and validate you. If you
type the correct password you're allowed access. Security in this fashion works both at
the directory level (which really just delegates down to the file level) and the file
level.
Make sure you set the Allow NT Challenge Response option in the IIS configuration.
Using Authentication from your code
You can also force authentication from dynamically generated result pages with Basic
Authentication. Authentication occurs as part of the HTTP header passed back to the Web
server/browser which interprets the header and pops up a validation box.
There are two steps to make this happen:
- On the password protected request, check whether the user is authenticated by checking
for the REMOTE_USER (ASP) or Authenticated Username (Web Connection/FoxISAPI) server
variable. If it's empty the user is not validated.
- If not, authenticated send back an Authentication request that pops up the dialog. If
the dialog's response was successful the request in step 1 is re-run. This time the user
will be authenticated and should be allowed access.
Here's a simple example:
************************************************************************
* wwDemoProcess :: Authentication
*********************************
*** Function: Demonstrate how to check authorization for users
************************************************************************
FUNCTION Authentication
LOCAL lcUsername, lcPassword, loCGI
*** Easier reference
loCGI=THIS.oCGI
*** Try to retrieve the Authenticated Username
lcUserName=loCGI.ServerVariables("Authenticated Username")
IF EMPTY(lcUserName) && Any validations against password here...
*** Send Password Dialog – if Successful response this request is re-run
THIS.oHTML.HTMLAuthenticate(loCGI.GetServername())
RETURN
ENDIF
THIS.StandardPage("You've been validated for this request...",;
"You've entered a username of <b> "+lcUserName+ ;
"</b> and password of <b> "+lcPassword + "</b><p>"+;
"Subsequent requests for this server won't prompt you for "+;
"a password again until you shut down your browser.")
RETURN
The actual authentication request is implemented via a special HTTP request that is
returned instead of an HTML document. The following code generates the actual password box
popup when sent back to the Web server:
**********************************************************************
* wwHTTPHeader :: Authenticate
******************************
*** Function: Sends the authorization content type header
*** Use to pop up Security Dialog and force authentication.
*** You can use Authentication Username (CGI)
*** to retrieve the entered user name if valid...
*** Pass: tcRealm - Domain to log in to.
*** tcErrorText - Error message to display when failing
*** Return: nothing or string if tlNoOutput=.T.
**********************************************************************
FUNCTION Authenticate
LPARAMETERS tcRealm, tcErrorText
tcRealm=IIF(type("tcRealm")="C",tcRealm,"")
tcErrorText=IIF(type("tcErrorText")="C",tcErrorText,;
"<h2>Gotta enter your password to get in!</h2>")
THIS.cOutput=[HTTP/1.0 401 Not Authorized]+CR+;
[WWW-Authenticate: basic realm="]+ tcRealm + ["]+ CR+CR +;
[<HTML>]+tcErrorText+[</HTML>]
ENDFUNC
* Authenticate
Authentication provides a built-in mechanism tied to the Operating system to validate
users. Once authenticated you can always check the users Username which is passed along
with each subsequent request until the browser is shut down.
You can also implement your own security scheme bypassing authentication altogether and
creating an HTML page that asks for login information. You can then capture the login
information on your own and validate against a user table denying access if it's invalid.
Note though that you need to set some sort of flag that can be checked on each request to
make sure the user does not access unauthorized pages or requests directly simply by
typing the URL.
Note that basic authentication is not encrypted unless you combine it with a secure
transaction request (HTTPS)!
Do you need secure transactions?
By default all the information that travels over the Web is not encrypted in any way.
All the information including the HTML form variables and Server information that is
returned to your backend programs from the Web browser including authentication
information is not encrypted. This means somebody with a protocol analyzer could
potentially snatch passwords or ID or credit card numbers while in transit.
Secure server transactions use certificate based encryption based on a private and public
key to encrypt all the content that flows between the Web server and browser. Keys are
administered by a few 3rd party key authority companies at $250 for a year. You
create a key request with the server's key manager utility and fill out an online
submission form for a key request. (see www.versign.com
for more information on obtaining a key). The server sends the key request which is used
to generate your private key. This key is returned to you as file and merged with your
existing key to provide the secure certificate on your site. Once installed using secure
transactions means accessing the HTTPS protocol instead of HTTP - a simple change to your
URL is all that is required once the key is in place to make a transaction secure.
http://www.west-wind.com/wconnect/wc.dll?wwDemo~SecureCheck
https://www.west-wind.com/wconnect/wc.dll?wwDemo~SecureCheck
do the same thing and can be handled identically in code. The latter is encrypted and
secure. To check whether a request is secure you can check the SERVER_PORT or Server Port
Server variables.
Not all browser support secure transactions and attempts to access a secure page with a
non-secure browser will cause the page to fail. Tell those users to get a browser from
this century, Ok? <s>
Do you need secure transactions? If your site captures sensitive information like
credit cards - definitely. If you're using a custom password scheme with passwords entered
on HTML pages - probably. For general applications? Probably not.
Secure transactions are easy to implement. You simply use the https:// prefix instead
of http:// to reference links with. But secure transactions are much, much slower than
non-secure transactions. Therefore it's a good idea to use secure transactions only when
you need them. For example on the Surplus site, the site runs in secure mode only when
actually capturing the order information from the user and for some of the maintainence
taks all other site operations run non-secure.
Web Development as a team
Building complex Web sites typically involves more people than just programmers. Web
applications tend to bring together a variety of skills
- Programming and Code/Data Design
This is us. Regardless of the type of application you'll likely find at least one
developer a few of them more likely on every dynamic data driven Web
application. The application design group is responsible for providing the business logic
and data access as well as some elements of overall application flow and design which
impacts on the HTML design group.
- HTML Design
Web applications are invariably HTML based and visually oriented (there are exceptions
though). The visual aspect of commercial Web sites is very important as it presents the
'corporate image' to the world. The HTML design team is responsible for creating site flow
and the actual layout of pages. They work closely with the Graphics design team. They also
work to a degree with the programming group to integrate data driven elements into their
forms.
- Graphics Design
Graphics design plays a large part at Surplus as splashy look of the site is
determined mainly by the images on pages. Images constantly change as specials are updated
daily and weekly. In addition, the graphics group is also managing product photographs
that show on the Web site. Graphics are crucial to the site and are optimized for speed.
For example the entire Surplus homepage text and all of the 50 or so images on the
page together is less than 70k bytes. Considering the amount of information on that page
that's fairly impressive.
- Network and Security Administration
From the previous discussion on scalability it should be fairly obvious that network
know how is extremely important. It's good for the developers to have a good idea of
network issues, but many issues like router configuration, managing the DNS server,
optimizing net throughput, dealing with multiple servers and the Resonate configuration
really should be handled by a dedicated network administrator or group. Security
configuration is also a huge issue and having somebody well versed in NT security issues
is important when running into mysterious login dialogs and managing the various internal
users who access the Web site and network.
- Web Design/Marketing for site
Finally, there are people who are on the business end of the Web site. This job has
recently migrated into all upper management reaches as the merger into Egghead.com now has
basically created an Internet only business.
Decisions made here deal with how to attract traffic to the site, what promotions to run,
what products to feature on the site etc. In addition, this group keeps track of
statistics for the site by analyzing the data the application and various Web statistics
tools capture and this information is then used further in making decisions for promotions
etc.
Team members (for Surplus and Auction Sites):
At Surplus the breakdown of the team at Surplus:
- 2 onsite programmers (full time)
- 2 offsite developer consultants
- 3 HTML designers
- 2 Email Mailing List managers
- 2 Graphic and Animation Artists
- 1 Network Administrator plus several techs
- 1 24/5 operators who monitor the site
- 1 Site Manager
Considering the volume and income generated by the Web sites this staff is rather
modest.
Source Code Control
In this environment where multiple people are involved in the development process
Source Control is extremely important to make sure integrity of code and HTML documents is
kept intact. Source control is applied on the Visual FoxPro project and the custom ISAPI
DLL extensions to the Web Connection framework, as well as the HTML pages. Graphics are
not under Source Control for graphics.
- SourceSafe Integration built in
Visual FoxPro has built in support for Visual SourceSafe and integration with the VFP
environment is smooth through the project manager. While project manager access works fine
I find myself using Source Safe directly
- Source Files, Support Files, docs
SourceSafe has the ability to store all Visual FoxPro code files like PRGs, Classes,
Forms, Menus, DBCs etc. In addition, it's also possible to store support files and
documentation as part of the project to keep everything in one place
- HTML Designers
- Use Visual Interdev
The HTML designers at Surplus don't like Visual Interdev and FrontPage's HTML editors
so VI is only lightly used. VI makes it possible to directly integrate with SourceSafe
within the VI project.
- Or SourceSafe interactively
The preferred method for the HTML designers is to use HomePage and then use SourceSafe
manually to check in and check out files. The HTML project is also set up as Web project
that allows to 'Deploy' all changes made to the staging server to be sent down to multiple
Web servers with a single operation.
- Consultants & Staff
- Easy project download
U sing SourceSafe provides an easy way to download an entire project to a local machine
for local development. A consultant can be brought in and within a half an hour can have a
local environment set up to be developing against the staging server.
- SourceSafe works over TCP/IP remotely
Another really cool feature is SourceSafe's ability to connect to a remote machine
over the Internet. For example, I can work out of my office by connecting to the Surplus
server on which I have an account. By telling SrcSafe.ini on my local machine to look at
the source safe project on the Surplus site I can access the project remotely:
; The two important paths used by SourceSafe.
Data_Path = \\111.111.111.111\c$\program files\devstudio\vss\data
- Separate Staging Test Server
All development and testing occurs on local machines which are then 'staged' on the a
separate staging test server that very closely matches the configuration and setup of the
online server. As such it contains IIS, the dispatch manager client software (it can run
as part of the Resonate pool) a local SQL server database that matches the online
server, a full copy of the VFP project and all HTML and graphics. Product photos and
graphics are also uploaded here first and then transferred to the live site by an auto
update program that finds files that have changed. The staging server serves both as a
backup of the live sites (not so much an issue with multiple Web servers now) as well as a
closely matched testing environment of the online site.
Everything is first tested on the staging server before moving things to the live site.
HTML is moved by using the SourceSafe Web/Deploy option. The Web deploy path points at
multiple paths on the individual Web servers.
The VFP servers are compiled and tested on the staging server. When everything runs Ok,
the data source is switched to the live SQL site and data changes are re-tested. Once this
testing is complete the EXE is moved to the online site. Since the Web Connection servers
never change their OLEPUBLIC interfaces no re-registration of the server on the live site
is required.
Integrating HTML and Code
HTML generation is probably the most 'different' aspect of Web application development
compared to traditional desktop applications. At Surplus it was extremely important to
work closely with the HTML design team in creating pages that could be visually maintained
by the design staff. It wouldn't have been sufficient to build a FoxPro backend
application that does all of the HTML generation internally. Instead the tools
needed to provide a mechansim for mixing HTML with minimal code/expression syntax so
dynamic information from the database could be displayed on the HTML.
Today you have many options to build HTML based applications whether you use a script
based engine like Active Server Pages or a code based engine such as Web Connection.
At Surplus a combination of code and scripting is used. All requests fire a method
inside of a class that runs to process the mainline business logic that needs to occur on
a request. When the code is complete it calls an HTML page stored on disk and embeds
FoxPro expressions into the page. The Script page uses an Active Server like scripting
language (using different tags from an older version of Web Connection) to allow embedding
of simple expressions like field names or PRIVATE/PUBLIC variables into a page. Any valid
string based FoxPro expression can also be embedded this includes FoxPro native
functions as well as User Defined Functions (UDFs). In addition, blocks of code can also
be embedded inside of the page, but this is avoided at Surplus due to speed issues with
interpreting the code at runtime.
HTML is the Front End Interface
For Web applications HTML is the front end to the user. HTML usage can be simple using
basic HTML at the lowest common denominator so all browsers can access pages, or can be
advanced taking advantage of the most recent browser enhancements actually embedding
advanced functionality on the client side in the HTML page.
At Surplus the focus is on making the page run on as many browsers as possible and
creating pages that are small to download, so HTML extensions and scripting are kept to a
minimum. This has changed recently as some interface scripting has been added to pages to
allow for basic visual effects such as changing buttons etc.
Understand the limitations of HTML
Even the newest HTML standards don't provide the same functionality you'd expect from a
typical GUI development environment. DHTML introduced in IE 4.0 takes a huge step in the
right direction, but currently building complex forms and user interfaces is a far cry
from using say the Form Designer in Visual FoxPro. The event model in the browser is also
more limited and trapping events and responding to them is a little more complex and can
require a fair amount of code.
Data connectivity
Pure HTML makes no provisions for data connectivity! If you're dealing with typical Web
server based Web applications like Surplus Direct you're seeing an application that's all
driven by the server. The server generates the HTML for a page and recreates the entire
page whenever the user makes new choices and updates.
Again, DHTML makes provisions for data connectivity, but at the cost of substantial
installation on the client site, which is usually not an option for public commercial
applications noone wants to wait around for 20 minutes to download a set of data
ActiveX controls and the client side ADO engine at 28.8k. Most of these technologies also
require IE 4 exclusively, which is leaving out a large portion of the market.
The bottom line is that commercial sites will continue to be driven by heavy server side
applications that rely on the server accessing the data, generating HTML from the data.
Keep HTML and Code separate
Whenever possible try to build your application in such a way that business logic and
HTML are clearly separated and don't reside in the same place. If you're using a tool like
Web Connection or FoxISAPI, that should be easy as most of the code will sit in a VFP
project and most of the HTML will sit in pages stored in a Web directory. If you're using
Active Server Pages it's easy to get pages that heavily mix HTML and code which is a bear
to maintain. With ASP it is a good idea to create 'code modules' as ASP include files or
use ASP pages that act as router pages that contain code to perform logic and route off to
the actual display pages.
The reason for all of this is two-fold: For one it's easier to maintain code in a code
environment! I don't care how much better Visual Interdev has gotten in the latest rev it
has nothing on the VFP or VB development environments in richness. Also, even with syntax
color highlighting (which helps a lot) it's difficult to have to look through a huge HTML
page, just to find that 2 line snippet of code that was embedded in the middle of the
page. Keeping the code out of the way is also important when passing of pages to the HTML
team. Most of the HTML team probably don't know how the database logic works nor
should they have to look at it and be tempted to mess with it. There are some useful
things that should be accessible to designers, but this should be kept to a minimum.
Typical that should be accessible are database fields, some known (and hopefully
documented) variables that might be required and maybe basic operations for handling HTTP
headers like redirects and Cookie read/write etc.
Scripting/Templates for data and display logic
I'm a little biased to the code drives the HTML approach of development and this is the
approach that's used at Surplus. Basically you have an application that handles each
request and then branches off to a script page to handle display of the HTML.
This functionality is actually implemented at the Visual FoxPro code level within the
Web Connection Framework that handles the script parsing.
Whether you use Web Connection or Active Server Pages, scripting is a necessary part of
development. Scripting makes it possible to keep the display logic in an easily
maintainable medium of a simple text file that can be edited and updated simply by copying
the file to the Web server. Imagine that every time you change an image in a page you had
to recompile your application
Site Administration
When running high volume site any downtime is a problem that can drive away customers.
Hence, it's important to get through any administration tasks as quickly as possible to
keep the site running at full operation.
- Server Management
When working with Visual FoxPro code we're dealing essentially with COM objects that
have been compiled into a DLL or EXE. One of the key requirements is the ability to load
and unload servers according to the load of the site. The busier the site the more servers
are loaded into the pool up to 2.5 times servers per processor on the box.
Servers can be added or removed at run time without taking the site down or restarting.
It's also possible to run in 'maintenance' mode a single instance that makes it
possible to run operations that require EXCLUSIVE data access or other Admin functionality
(this doesn't matter any longer now that all data sits in SQL Server but it's a
serious requirement for VFP data).
- Locking out users
It's also very useful to have functionality that allows you to lock out users from a
server so that no one can access the VFP backend server.
- Updating code online
It's also possible to update code online without shutting down the Web server. This is
one of those times when lock out mode is useful all users are kept out with a busy
flag while they see a page that says to hold on for a second. In the meantime, the ISAPI
extension copies the updated COM object into place once copying is complete the
busy flag is reset and servers are immediately reloaded. This process takes 10-20 seconds
and is completely automated.
- Minimizing maintainence routines
Maintenance routines that aren't absolutely necessary should be relegated to the
middle of the night or other time that's not busy. This seems really obvious, but it
always amazes me to see code changes and data intensive operations occur during the heat
of the day. It should be obvious that making code changes 10 minutes before a huge
promotion starts may not be a good idea.
Remember that maintenance operations often affect more than just the single machine. For
example data updates can seriously load the SQL backend and prevent access from other
front ends.
- Offsite or online data?
One decision that is often overlooked is whether you should use online or offline
data. By online I mean data directly linked to your mainline business application
at Surplus we're running an offline application that does not talk directly to the HP Mini
during the course of the day. Only import and exports actually interact with the HP. In
this case the choice is fairly obvious as the interface to the HP is not trivial and would
not work very well in online operation.
However, many other businesses will have options to run either online or offline. Offline
operation has many advantages that you should look into. In particular you can:
- Optimize performance and data layout for Web application
The mainline business application is not loaded by the Web site. Successful Web
applications have the tendency to grow rapidly. If you're running an online system you
have to be prepared to extend your scalability issues to the mainline application. If that
application is a typical file or client/server business application you may find that it
won't be ready to deal with the transaction volume thrown at it from the Web. By running
offline and merging only relevant data using batch operations you're minimizing the load
on the business application.
It also allows you maximize performance for online transactions as opposed to the full
range of operations that might be required for an online application. In addition, you can
possibly keep data files smaller with only the relevant data rather than full scale data
kept in the mainline business app.
- Forces additional security check
Having an Web application offline also provides a buffer between fraudulent orders.
For example, at Surplus orders are exported from the Web app into a separate FoxPro
application that re-validates the order data and then calls a special import program that
runs on the HP. This program then goes and diligently performs fraud checks and credit
card validation all offline to the Web site essentially double checking the not so
thorough validation that occurred on the Web site.
Visual FoxPro or Active Server Pages
I've shown a lot of functionality in this document and related it to Visual FoxPro and
Web Connection, because that's what was used for the Surplus application. So the table
below compares some features/functionality of Active Server and Visual FoxPro to put some
of the issues discussed into the context of Active Server. Keep in mind that I'm a little
biased as the author of Web Connection, but I do believe the points made here are very
valid and fair.
Visual FoxPro |
Active Server Pages |
Business Logic lives mostly in VFP, optionally in COM and
minimally in scripted pages.
It's possible to test and debug applications without using COM, which makes the
development process in VFP a lot easier. |
Business logic lives in Scripts and COM objects only.
Typical ASP applications keep a lot of business logic in scripts. For more complex
operations the only way to extend the functionality is to use COM objects objects
that cannot be unloaded without shutting down the Web server or Web 'application'. |
Code drives HTML
When using a VFP based solution like Web Connection or FoxISAPI the focus is on using
code and objects to address the business logic. The environment encourages working within
the development environment and using classes to access business logic. Objects created in
code can even be passed forward into scripted pages. HTML and scripting is used more
towards the end of displaying the results, although you also have the option of mixing
code and HTML. The environment does not encourage this though. |
HTML drives Code
Active Server relies heavily on scripting to tie together logic. Since HTML and
scripts live in the same page it often turns out that the HTML is the driving force of the
page using the scripting to figure the display. In my opinion this is backwards and
violates mixing business rules with interface code.
This type of implementation can be avoided with ASP, but the architecture certainly
doesn't encourage it. |
Easy HTML generation requires a framework
If you're using a VFP based tool you either need to build your own library of high
level functions or use a framework supplied by the vendor. This can be good or bad
some tools provide lots of functionality, but more importantly you can use VFP to extend
the framework in anyway you see fit. |
All HTML works through the scripting engine
With ASP the scripting engine and ASP's built in objects allow creating of output and
retrieving form and server data. It's built-in and the engine provides just about all the
basic functionality you need. ASP does not provide generic page generation for data
displays and other high level functions you have to build that yourself. |
Easy development and debugging inside of VFP
Building applications within Visual FoxPro makes it possible to use VFP interactively
and even debug live requests within VFP including setting break points and stepping
through code. Errors show and can be fixed right away. |
COM development is complex with no way of debugging live
components. Scripts not conducive to lots of code
Objects can only run as COM objects and cannot be debugged while running inside of the
IIS process. Some debugging for complex object passing cannot be debugged at all as ASP's
intrinsic objects are not available for you to test with outside of the IIS environment.
Debugging components without a debugger is a drag! |
Code updates require a recompile. Scripts can be updated
online at any time.
Servers created with VFP require recompilation and updating online. With Web
Connection it's possible to not shut down the server to update the server. Scripts are
just text files and can be updated at any time. |
Scripts can uploaded at any time. COM objects require a
server shutdown
This is probably the strongest feature ASP has going for it: Updating a script is as
simple as copying a file and this encompasses both HTML and code. Things get more complex
with COM objects though if you use and update them you have to shut down the Web
server or the Web application at least. |
Fairly complex one time setup
Setting up a VFP based solution is fairly complex as it involves properly registering
servers and making sure all the paths are properly set and configured. Troubleshooting
setup issues can easily frustrate new and experienced users. |
Easy setup for scripts. Complex setup for COM objects.
Scripts are easy set Script rights to a directory and install the files and
you're off and running. You need an ODBC data source for data access, but that's about it.
COM object setup is complex as you have to deal with security issues and proper server
configuration. Since ASP works efficiently only with InProc servers your servers have no
UI and are difficult to debug if there's an error at startup. |
VFP Servers can scale much better
As discussed previously, the pool managers in Web Connection and FoxISAPI provide some
of the best ways to scale Visual FoxPro applications to multiple servers both on local or
remote machines. A properly designed application can also run on multiple Web server
simultaneously. |
ASP can't scale to multiple machines
ASP cannot run on multiple machines and maintain context information such as session
objects and object references. You can however call remote components, but configuration
for this is tricky. You'll need to use Transaction server to make this work right and get
security settings configured correctly and still be able to have a scalable server. |