Friday, November 30, 2018, 1:11:52 PM
XML is not as popular as it once was, but there's still a lot of XML based configuration and data floating around today. Just today I was working with a conversion routine that needs to generate XML formatted templates, and one thing that I needed is an easy way to generate a properly encoded XML string.
Stupid Pet Tricks
I'll preface this by saying that your need for generating XML as standalone strings should be a rare occurrance. The recommendation for generating any sort of XML is to create a proper XML document XmlWriter or Linq to XML structure and create your XML that way which provides built-in type to XML conversion.
In most cases you'll want to use a proper XML processor whether it's an XML Document, XmlWriter or LINQ to XML to generate your XML. When you use those features the data conversion from string (and most other types) is built in and mostly automatic.
However, in this case I have a huge block of mostly static XML text and creating the entire document using structured XML documents seems like overkill when really i just need to inject a few simple values.
So in this case I'm looking for a way to format values as XML for which the
XmlConvert static class works well.
Should be easy right? Well...
XMLConvert class works well - except for string conversions which it doesn't support.
XmlConvert.ToString() works with just about any of the common base types except for
string to convert properly XML formatted content.
Reading an encoded XML Value
There are a number of different ways that you can generate XML output and all of them basically involve creating some sort of XML structure and reading the value out of the 'rendered' document.
The most concise way I've found is the following:
public static string XmlString(string text)
return new XElement("t", text).LastNode.ToString();
which you can call with:
XmlString("Brackets & stuff <> and \"quotes\" and more 'quotes'.").Dump();
and which produces:
Brackets & stuff <> and "quotes" and more 'quotes'.
If you don't want to use LINQ to XML you can use an XML Document instead.
private static XmlDoc _xmlDoc;
public string XmlString(string text)
_xmlDoc = _xmlDoc ?? new XmlDocument();
var el = _xmlDoc.CreateElement("t");
el.InnerText = text;
Note that using
XmlDocument is considerably slower than
XElement even with the document caching used above.
SecurityElement.Escape() is a built-in CLR function that performs XML encoding. It's a single function so it's easy to call, but it will always encode all quotes without options. This is OK, but can result in extra characters if you're encoding for XML elements. Only attributes need quotes encoded. The function is also considerably slower than the other mechanisms mentioned here.
If you don't want to deal with adding a reference to LINQ to XML or even
System.Xml you can also create a simple code routine. XML strings really just escape 5 characters (3 if you're encoding for elements), plus it throws for illegal characters < CHR(32) with the exception of tabs, returns and line feeds.
The simple code to do this looks like this:
/// Turns a string into a properly XML Encoded string.
/// Uses simple string replacement.
/// Also see XmlUtils.XmlString() which uses XElement
/// to handle additional extended characters.
/// <param name="text">Plain text to convert to XML Encoded string</param>
/// <param name="encodeQuotes">
/// If true encodes single and double quotes.
/// When embedding element values quotes don't need to be encoded.
/// When embedding attributes quotes need to be encoded.
/// <returns>XML encoded string</returns>
/// <exception cref="InvalidOperationException">Invalid character in XML string</exception>
public static string XmlString(string text, bool encodeQuotes = false)
var sb = new StringBuilder(text.Length);
foreach (var chr in text)
if (chr == '<')
else if (chr == '>')
else if (chr == '&')
// special handling for quotes
else if (encodeQuotes && chr == '\"')
else if (encodeQuotes && chr == '\'')
// Legal sub-chr32 characters
else if (chr == '\n')
else if (chr == '\r')
else if (chr == '\t')
if (chr < 32)
throw new InvalidOperationException("Invalid character in Xml String. Chr " +
Convert.ToInt16(chr) + " is illegal.");
Attributes vs. Elements
Notice that the function above optionally supports quote encoding. By default quotes are not encoded.
That's because elements are not required to have quotes encoded because there are no string delimiters to worry about in an XML element. This is legal XML
<doc>This a "quoted" string. So is 'this'!</doc>
However, if you are generating an XML string for an attribute you do need to encode quotes because the quotes are the delimiter for the attribute. Makes sense right?
<doc note="This a "quoted" string. So is 'this'!"
' is not required in this example because the attribute delimiter is
". So this is actually more correct:
<doc note="This a "quoted" string. So is 'this'!"
However, both are valid XML. The string function above will encode single and double quotes when the
encodeQuotes parameter is set to
true to handle setting attribute values.
The following LINQPad code demonstrates:
var doc = new XmlDocument();
doc.LoadXml("<d><t>This is & a \"test\" and a 'tested' test</t></d>");
var node = doc.CreateElement("d2");
node.InnerText = "this & that <doc> and \"test\" and 'tested'";
var attr = doc.CreateAttribute("note","this & that <doc> and \"test\" and 'tested'");
The document looks like this:
<t>This is & a "test" and a 'tested' test</t>
<d2 note="this & that <doc> and "test" and 'tested'">
this & that <doc> and "test" and 'tested'
Bottom line: Elements don't require quotes to be encoded, but attributes do.
This falls into the pre-mature optimization bucket, but I was curious how well each of these mechanisms would perform relative to each other. It would seem that
XElement and especially
XmlDocument would be very slow as they process the element as an XML document/fragment that has to be loaded and parsed.
I was very surprised to find that the fastest and most consistent solution in various sizes of text was
XElement which was faster than my string implementation. For small amounts of text (under a few hundred characters) the string and XElement implementations were roughly the same, but as strings get larger
XElement started to become considerably faster.
XmlDocument - even the cached version - was the slower solution. With small strings roughly 50% slower, with larger strings many times slower and incrementally getting slower as the string size gets larger.
Surprisingly slowest of them all was
SecurityElement.Escape() which was nearly twice as slow as the XmlDocument approach.
XElement is doing to parse the element, it's very efficient and it's built into the framework and maintained by Microsoft, so I would recommend that solution, unless you want to avoid the XML assembly references in which case the custom solution string works as well with smaller strings and reasonably close with large strings.
Take all of these numbers with a grain of salt - all of them are pretty fast for one off parsing and unless you're using manual XML encoding strings in loops or large batches, the perf difference is not of concern here.
If you want to play around with the different approaches, here's a Gist that you can load into LINQPad that you can just run:
XML string encoding is something you hopefully won't have to do much of, but it's one thing I've tripped over enough times to take the time to write up here. Again, in most cases my recommendation would be to write strings using some sort of official XML parser (XmlDocument or XDocument/XElement), but in the few cases where you just need to jam a couple of values into a large document, nothing beats simple string replacement in the document for simplicity and easy maintenance and that's the one edge, use-case where a function like
XmlString() makes sense.