String encoding and dealing with data returned over Web connection is arguably one of the most confusing subjects I've run into with working in .NET. All strings in .NET are Unicode (double byte) and require specific encoding to display properly. When retrieving data over the Web the data is retrieved in a binary stream and in order to use it as a string it must be encoded. Different content might require different encodings and you have to control how to encode the string. This basically involves telling the stream reader which CodePage to convert to. .NET provides a number of tools to facilitate the encoding process including the Encoding class, which allows you to easily switch encoding formats for specific operations. Many classes and conversion tools then use this Encoding class as a parameter or property to provide their encoding and decoding. To help with finding out what encoding is used the HttpWebResponse object returns a ContentEncoding property, but unfortunately very few Web servers return this information in their headers, so it's difficult to dynamically discover what format to encode to. CodePage 1252 is the best all around choice for Western content and I tend to use that as the default if no ContentEncoding can be determined. The following code is useful when creating an Encoding instance: Encoding enc; try { enc=Encoding.GetEncoding(Response.ContentEncoding); } catch { enc = Encoding.GetEncoding(1252); } If you are returning binary data store this data in a byte array (byte[]) or stream the data directly whatever output source you need to deal with. For example, if you download a file, don't store it to string first but stream it straight into a file on disk. |
If you've worked at
all with .NET you've probably found out about streams by now. Streams are
very flexible abstractions that are used to deal with blocks of data that are
well, streaming – built from data that is not necessary complete by the time
you start reading it. Streams are efficient because they read and write data
sequentially for the most part (you can also access some streams like files
with random access). In most cases streams are mapped to things like files or
Network I/O inputs and outputs. Streams can also be applied to strings and
memory mapped files and any number of other things that require reading and
writing from large blocks of data. Streams manage the underlying access to
insure integrity of the data so you can read the data before all the data is
available. .NET uses streams for most of the network I/O environment, so
access HTTP, FTP, and even sockets provides a fairly consistent interface
across protocols. In these situations you usually end up with an input stream
and an output stream. Both the WebRequest and WebResponse (which are the base classes of the HttpWebRequest/HttpWebResponse objects) have methods to
return the respective streams which you read from and write to. |
.NET strings are objects and
as such require some overhead when they are created. Common operations such
as are very expensive when performed in tight loops. A new object is created and the old one discarded for each iteration of the loop. Creating strings for anything more than few kilobytes in this manner gets slow in a hurry! Realizing that string building is a very common task, the .NET Framework includes a StringBuilder class that is optimized for manipulating strings as presized character arrays that data is inserted to rather than creating new objects everytime. StringBuilder is hundreds of times faster than plain string concatenation and reduces memory usage considerably. When running in tight loops you should avoid using the + operator with strings or any objects getting converted to strings. Instead you can use the AppendFormat method which appends data into strings using a string template without the overhead of separate string objects. |
Delegates are an important concept in .NET. They are used frequently in code that implements event handling or any sort of dynamic code transfer where a calling routine provides a callback function for a handler process. You can think of a delegate as a type safe function pointer. Delegates are actually objects that encapsulate the function pointer and provide the compiler with a function signature that must be used when calling a delegate pointer function. If you're familiar with C++ it's like a pointer to a function plus a typedef wrapped into a single object. The most common use of delegates is an eventhandler, which uses the delegate to fire events. When the event publisher fires the event method, the delegate that is assigned to handle the event is called and you're event subscriber object then can simply handle the event by implementing a method in your class. In multi-thread scenarios, delegates are used to call the user's thread entry point code. All of this greatly simplifies handling function pointers. For more information on object interface development check out the article .NET Interface-Based Programming in this issue. |