String and Character Encoding in

String and Character Encoding in .NET

String encoding and dealing with data returned over Web connection is arguably one of the most confusing subjects I've run into with working in .NET. All strings in .NET are Unicode (double byte) and require specific encoding to display properly. When retrieving data over the Web the data is retrieved in a binary stream and in order to use it as a string it must be encoded. Different content might require different encodings and you have to control how to encode the string. This basically involves telling the stream reader which CodePage to convert to.

Encoding enc;

try {

enc=Encoding.GetEncoding(Response.ContentEncoding);

}

catch {

enc = Encoding.GetEncoding(1252);

}

Streams and StreamReader in .NET

If you've worked at all with .NET you've probably found out about streams by now. Streams are very flexible abstractions that are used to deal with blocks of data that are well, streaming – built from data that is not necessary complete by the time you start reading it. Streams are efficient because they read and write data sequentially for the most part (you can also access some streams like files with random access). In most cases streams are mapped to things like files or Network I/O inputs and outputs. Streams can also be applied to strings and memory mapped files and any number of other things that require reading and writing from large blocks of data. Streams manage the underlying access to insure integrity of the data so you can read the data before all the data is available. .NET uses streams for most of the network I/O environment, so access HTTP, FTP, and even sockets provides a fairly consistent interface across protocols. In these situations you usually end up with an input stream and an output stream. Both the WebRequest and WebResponse (which are the base classes of the HttpWebRequest/HttpWebResponse objects) have methods to return the respective streams which you read from and write to.

Use StringBuilder for string concatenation

are very expensive when performed in tight loops. A new object is created and the old one discarded for each iteration of the loop. Creating strings for anything more than few kilobytes in this manner gets slow in a hurry! Realizing that string building is a very common task, the .NET Framework includes a StringBuilder class that is optimized for manipulating strings as presized character arrays that data is inserted to rather than creating new objects everytime. StringBuilder is hundreds of times faster than plain string concatenation and reduces memory usage considerably. When running in tight loops you should avoid using the + operator with strings or any objects getting converted to strings. Instead you can use the AppendFormat method which appends data into strings using a string template without the overhead of separate string objects.

Delegates

Delegates are an important concept in .NET. They are used frequently in code that implements event handling or any sort of dynamic code transfer where a calling routine provides a callback function for a handler process. You can think of a delegate as a type safe function pointer. Delegates are actually objects that encapsulate the function pointer and provide the compiler with a function signature that must be used when calling a delegate pointer function. If you're familiar with C++ it's like a pointer to a function plus a typedef wrapped into a single object.