What’s the easiest way to create an XML document in C++?

What’s the easiest way to create an XML document in C++?

Oct 24, 2013

What’s the easiest way to create an XML document in C++?
In some recent posts, I’ve been sharing with you how we develop software at Trade-Ideas.  Before I answer the question above, I think we should take a moment to ponder the question.  What’s the easiest way…
I’ve worked with a lot of engineers in a lot of organizations.  It seem like some programmers and software engineers are their own worst enemies.  They make their own lives difficult.  When we develop libraries, tools, and processes we do it to improve our lives.  When one of these things gets in your way, look at it and find out why.  Maybe it needs to be fixed, replaced, or just thrown away.  Things aren’t always perfect.  Most of our choices come with a trade off.  But someone needs to examine the tools and processes.  If you follow them blindly you will get stuck on things that shouldn’t have been a problem in the first place.
There are a lot of good general purpose XML libraries out there.  Now they come standard with Java, PHP and many other languages.  But ask yourself: is this general purpose library the best for my purpose?  Very often you will want to write another library on top of the standard one.  Or, you might even want to start from scratch.  Either way, you end up with reusable code that’s good for you.
This article is focused on the way our server creates an XML document from scratch.  We also have some interesting client libraries, which are very good at reading XML.  In some cases we actually read an XML file, make a small number of changes, and then save the result.  That last case is rare for me.  The general purpose library has to deal with that case, and sometime that makes it hard to do the things that I care more about.
Consider the following XML fragments.
  • <FORMULA>Price &gt; 5 &amp; Volume &gt; 1000000</FORMULA>
  • <FORMULA   >Price &gt; 5 &amp; Volume &gt; 1000000</FORMULA>
  • <FORMULA><!– selected from config screen –>Price &gt; 5 &amp;<!– created by optimizer –> Volume &gt; 1000000</FORMULA>
  • <FORMULA><![CDATA[Price > 5 & Volume > 1000000]]></FORMULA>
  • <FORMULA><![CDATA[Price > 5 ]]>![CDATA[& Volume > 1000000]]></FORMULA>

These all mean the same thing.  When you read the data, you should always see that the text is “Price > 5 & Volume > 1000000”.  My libraries let me read and write the text as a simple string.  I don’t care which format to is used.
What’s the alternative?  I haven’t saved all the bad code samples I’ve seen over the years.  (Feel free to add your own in the comments!)  To read this data you typically have to read a list (possibly a recursive list) of items.  Each CDATA section is an item.  Each comment is an item.  You have to know which items to throw away, and which ones to interpret.  Combine all of these to make one string.  Writing is typically easier than reading, but not as easy as it should be.
The model used by the standard library is very powerful, but who cares?  I know I have that when I need it.  Instead, I want tools that make it easy to do what I usually do.
I told you what I don’t care about.  So, what are my concerns?  I care about 
  • memory management, 
  • quoting/escaping, 
  • performance, and 
  • language encoding.

Memory management is easy.  As long as you don’t mind some changes to the normal way that XML libraries work, you can use STL to handle all of your memory management needs.
I use one object filled with STL data structures to handle each XML element.  (Each of the samples listed above is one element.)  I call my class XmlNode.
An XmlNode knows about it’s children, but not it’s parent.  This makes life a lot easier.  Honestly, I don’t have any idea why most XML libraries keep a pointer from each element to the XML document.  It causes nothing but trouble.
Copying an XmlNode object will copy it’s contents.  We almost never use a pointer to an XmlNode.  All of this is consistent with normal STL data structures.
Occasionally we’ll use swap() as a quick way to move elements from one place to another.  A call to XmlNode.swap() will call swap() on each of the STL data structures.  More often, we’ll build a tree in place and/or pass the XmlNode objects by reference.  I.e. we’ll say 
void addData(XmlNode &parent, SomeTypeOfData toBeEncoded) {…} 
rather than 
XmlNode encode(SomeTypeOfData toBeEncoded) {…; return result}
It’s interesting to note that XmlNode has so many public fields.  Traditionally you make most if not all of your fields private.  Then you add a lot of methods, like addNode(), getNthNode(), etc.  That was essential when you were doing your own memory management.  But when I expose an STL data structure, it is safe and self contained; I’m not worried that someone will break anything.  The benefit of this is that the user can now use so many methods and algorithms provided by STL, and I don’t have to do a thing.  Do I really want to expose an iterator?  A reverse iterator?  A count method?  A sort method?  The list goes on seemingly forever.  By exposing the member variables the way I did, I exposed a very powerful (and well known and well documented) interface with very little work.
Quoting and escaping is all automatic.  The library user only deals with the original strings.  There is a way to give the library a hint about CDATA vs. escaping, but that’s optional.
The quoting routines include some special logic for dealing with old clients.  I’ll describe the problem further in a moment.  But the library user doesn’t have to care.  That, like the standard parts of quoting, is completely automatic.
Performance was tricky.  The initial version created a lot of strings.  And it was way too slow.  My first attempt at optimizing this was very complicated.  I tried to reuse one string, and I tried to guess in advance how big to make it.  I eventually realized that all the guessing was not necessary.  STL strings use a very simple algorithm, and that’s something you should have seen in school.  As long as you keep appending to a string, rather than creating a new string, it will be very fast. KISS!
Notice my call to a standard library routine to verify the character encoding.  This is surprisingly fast.  Like the futex / mutex from a previous article, I didn’t believe it could be that fast.  I had to make sure my test program was correct, and I finally found some documentation confirming my results.  As long as you are only verifying the string, and not creating a new string, it is very fast.
The character encoding was a little bit tricky.  My preferred output is, of course, UTF-8.  The problem is the input.  For historical reasons we have some data which is not unicode.  Unfortunately, it’s not all tagged, so you don’t know what it is.
(This is important for things like the name of a strategy.  The user can type anything he wants for the this.  He can type things in his own native language, like “مستويات قياسية جديدة وهبوط”, “Neue Hochs und Tiefs”, or “Новые максимумы и минимумы”.  We store the strategy, including its name, in our cloud.  Later we will send the strategy name back to the user’s computer, as part of a list of choices.)
In the old days MS Windows had the idea of code pages.  You could set up your computer to work in one specific language.  The number 65 would always correspond to the letter “A”.  But the number 139 might mean “כ” if your computer speaks hebrew, “Л” if your computer speaks Russian or Bulgarian, or any number of other things for other languages.  This system worked great as long as you stayed on one computer.  The system falls apart when people using different languages try to exchange documents.
XML is picky.  I have to say which language I’m using.  If I don’t say anything explicitly, that’s just shorthand for saying that I’m using UTF-8.  There is no way to specify “I don’t know, let the user’s computer figure it out.”  Long, long ago, that’s exactly how our library worked.  And some people are still using old versions of the client.  Some people have moved on from that, but still have old data stored on our servers.  We don’t way to break any old code if we can help it.
How do we deal with this?  We always spit out UTF-8.  We start by looking at the data to see if it is already valid UTF-8.  If so, we copy it as is.  If not, we assume that it is LATIN-1.  That’s a very common encoding, useful for English, French, Spanish, and similar languages.  We convert from LATIN-1 to UTF-8.  A modern client will read the UTF-8 with no problems, and will not know that we did any translation.  If the original language really was LATIN-1, the user will see exactly what he expected, without knowing any of these details.  But we do some tricks in this case.  We use a very special way of encoding the letters.  A modern client will see the same thing regardless of wether we copied the letters as is, or whether we had to convert them.  But the really old clients will see something different.  We send them a list of numbers.  If they sent us the number 139, we will send 139 right back to them.  And they will interpret it in the default way for their language.  So, even though we didn’t know the right language, we could encode it correctly.  We encoded it in such a way that a modern client won’t choke, and an old client will get exactly what was expected.
This is a common theme.  Our service has been around for a long time.  We have to keep making changes, and some of these changes are completely unexpected.  What if we didn’t do all of this?  A user who saved data with the old client might cause the new client to crash.  Or a user who still has the old client might have all of his native language scrambled.
Remember, we don’t have full control of our users.  Old client libraries might stay in service for years.  Sometimes we integrate with large companies who don’t update their software on our schedule.  But we also have a lot of individual users and small groups.  If you wrote your software to deal with our old library, and it works, you don’t want to rewrite your code each time we come out with something new.
I considered glossing over this last issue, the character encodings.  When I first started writing this, I felt embarrassed that I didn’t do things better the first time.  But looking back, it was a reasonable mistake.  I used the standards that were popular at the time.  And I bet I’m not the only one who needs this.  Please let me know if you have a similar problem, especially if this code helps you solve your problem.
My library is shown below, in it’s entirety.  This references MiscUtil.h, which I discussed in a previous article.  For the most part I’m quite proud of this code.  It works well.
If I could start over I might change a few small things.  I was a bit sloppy with the nomenclature.  For example, I should have said “element” rather than “node”.  And maybe I should have said “.newChild()” rather than “[-1]”.  These aren’t important changes.  These certainly aren’t important enough to make me change and retest all of the code that uses this library!
==> XmlSupport.h <==
#ifndef __XmlSupport_h_
#define __XmlSupport_h_

#include <map>
#include <vector>
#include <string>
#include “../shared/MiscSupport.h”

class XmlNode
{
private:
  void addToString(std::string &output, const std::string &recommendedName) const;
public:
  std::string name;
  PropertyList properties;
  std::map< std::string, class XmlNode > namedChildren;
  std::vector< class XmlNode > orderedChildren;
  std::string text;
  bool useCdata;
  std::string asString(const std::string &recommendedName = “API”) const;
  void clear();
  bool empty() const;
  XmlNode &operator[](std::string name)
  {
    return namedChildren[name];
  }
  XmlNode &operator[](int i)
  {
    if (i < 0)
      {
return *orderedChildren.insert(orderedChildren.end(), XmlNode());
      }
    else
      {
return orderedChildren[i];
      }
  }
  void swap(XmlNode &other)
  {
    name.swap(other.name);
    properties.swap(other.properties);
    namedChildren.swap(other.namedChildren);
    orderedChildren.swap(other.orderedChildren);
    text.swap(other.text);
    bool temp = other.useCdata;
    other.useCdata = useCdata;
    useCdata = temp;
  }
  XmlNode() : useCdata(false) { }

  // This means that if we send a string as .text or a .property[], the result
  // will be byte for byte the same as when we started.  It might be quoted,
  // but that’s okay as long as the client can unquote it and get what we
  // started with.  This is talking about a UTF-8 client.  This ignores the
  // magic we sometimes to get bytes to the delphi client.
  //
  // Be sure to call setlocale(LC_ALL, “”); in the main program!  Otherwise
  // this will not work right for utf-8 characters.
  static bool binarySafe(std::string const &);
};

#endif

==> XmlSupport.C <==
#include <stdlib.h>
#include <string.h>
#include <iconv.h>

#include “XmlSupport.h”

/* We were spending A LOT of time converting XML from the internal format to
 * a string.  This was espeically true when trying to send real-time alerts
 * to people immediately after the open.  So I optimized this operation.
 *
 * For my test data the optimized version of asString takes about 1/6 of the
 * time as the un-optimized version.  There were two large components of this
 * change.  By resuing one string, rather than creating a lot of strings, we 
 * reduced the time significantly.  And I got a surprisingly big improvement by
 * using a char * to iterate in htmlspecialchars rather than string::iterator.
 *
 * A previous version of this code tried to use reserve to set the strings to
 * the correct size in advance.  Unfortunately we needed to do a lot of work
 * to get this value correct.  Although it was an improvement over the original
 * code, this version is more than twice as fast.  And this version is much
 * simpler!
 *
 * I also tried switching from a string to a rope.  (I did this before I did
 * any optimizations.)  That change actually made asString take 3 times as
 * long.  That was a big surprise.  Even when I completely disabled
 * htmlspecialchars, the rope version was slower than the original (complete)
 * version.  Based on the implementation described on the SGI web site, I
 * thought this would have been an ideal case for ropes.
 */

#include <stdio.h>

static bool isValidUtf8(std::string s)
{
  // mbstowcs() used in this way is surprisingly fast.
  const char *start = s.c_str();
  const char *const end = start + s.size();
  while (start < end)
    {
      if (mbstowcs(NULL, start, 0) == (size_t)-1)
// We have already found an illegal sequence, so we can stop now.
return false;
      start += strlen(start) + 1;
    }
  return true;
}

// XML messages can only contain valid strings.  Delphi was pretty lax here,
// taking any character it didn’t know as itself.  But Java definately would
// bomb on bad sequences, and presumably C# will as well.
//
// There are three seperate issues here.
//
// 1)  XML doesn’t like most control codes, i.e. characters less than 32.  We
//     solve that by completely removing those code.  These values *are* valid
//     UTF-8 characters.  I’m not sure how these get in there, but we did
//     have a problem once, so we added this step.  I’ve never seen this step
//     cause a problem.
//
//     I’m not sure if this step is complete.  There are other characters
//     which are listed as invalid or discouraged in an XML document.  I
//     haven’t seen a problem, but maybe it’s worth testing.  To handle all of
//     these, we’d have to change the algorithm.  I’m treating 0x7f as a bad
//     character and throwing it out because it is easy and I don’t think that
//     will hurt.
//
// 2) There are a few characters like < and > which would confuse XML. We quote
//    those more or less like PHP’s htmlspecialchars() function.  This always
//    works perfectly.
//
// 3) The database might contain strings which are not valid UTF-8.  Typically
//     stuff from the web site is UTF-8, where stuff from (the Delphi version
//     of) TI Pro is in an encoding which varies from one computer to the next.
//     That encoding is never shared with the server; the server just copies
//     the values from the client, stores them, and sends them back, verbatim.
//
//     Since we are not 100% sure what’s in the database, we have to make some
//     assumptions.  First we check if the string could be interpreted as a
//     valid UTF-8 string.  In that case we send the string as is.
//
//     If the string is not valid as UTF-8, then we quote every byte that is
//     greater than 127.  We quote it as a number, not a name.  A parser that
//     works with unicode will convert those into latin-1 characters, which is
//     a reasonable guess at what they are.  The Dephi parser will convert
//     these back into the original number, so the copy will look just like
//     the original.
//
//     This strategy will always generate XML that is legal.  It will usually
//     generate values which are correct.  I suppose it could fail because
//     the special characters just happen to look like valid UTF-8 when in
//     fact they were created by another encoding, but that doesn’t seem
//     likely.  Also, if someone was using an encoding other than latin-1 (like
//     hebrew) on the Delphi client, and saved values, then tried to load his
//     settings from Java or C#, he’ll find his settings have been changed into
//     random Latin-1 characters.  That last case is bound to happen, but not
//     a lot.  It’s the best we can do.
//
//     Of course, if something was stored in UTF-8 by a client that knows
//     unicode, and then the user switches to the Delphi client, his special
//     characters will all turn into random Latin-1.  
//
// Note:  The quoting described here only works if your local is set properly.
// Normally it is enough to say setlocale(LC_ALL, “”) in main(), assuming you
// environment variables are not too crazy.  If you forget that step, however,
// things won’t be so bad.  By default, C++ uses a locale with only characters
// between 0 and 127.  The Delphi clients will see pretty much the same thing
// as always.  The unicode clients tring to read valid unicode stuff from the
// database will see the problem with random Latin-1 characters.  Even if they 
// were only using valid Latin-1 characters, the message will still not be
// correct.  Of course, if the message only contains characters between 0
// and 127, everything will work fine.

static void xmlQuote(std::string &output, std::string input)
{
  const char * const inputStart = input.data();
  const char * const inputEnd = inputStart + input.size();
  const bool validUtf8 = isValidUtf8(input);
  for (char const *it = inputStart; it != inputEnd; it++)
    {
      switch(*it)
{
case ‘&’ :
 output.append(“&amp;”, 5);
 break;
case ‘”‘ :
 output.append(“&quot;”, 6);
 break;
case ‘<‘ :
 output.append(“&lt;”, 4);
 break;
case ‘>’ :
 output.append(“&gt;”, 4);
 break;
case ‘t’ :
 // These seem the be the only valid control characters in XML.
 // The others will cause a problem, even if they are quoted.
 // http://www.w3.org/TR/xml11/#charsets
 // http://www.w3.org/TR/REC-xml/#charsets
 output += *it;
 break;
case ‘n’ :
 // Some XML readers seem to turn r and n into ” “.
 // Mike found this problem with GWT.
 // I think I saw some similar problems where rn were being changed
 // into n, or something similar, but only on certain browsers.
 output.append(“&#10;”, 5);
 break;
case ‘r’ :
 output.append(“&#13;”, 5);
 break;
default :
 {
   int ch = (unsigned char)(*it);
   if ((ch < 32) || (ch == 127))
     { // Ignore all invalid control characters.  Do not try to
// quote them.  It will change the error message, but the
// Java code still cannot deal with them.
// http://bytes.com/forum/thread86931.html
// The Java code definately bombed on charactar 0x0b.  I got
// a different error message when I quoted it, but it still
// failed.
     }
   else if (validUtf8 || (ch < 127))
     { // Normal printable ASCII or UTF-8 characters. 
output += *it;
     }
   else
     { // The Delphi client will convert these back to the bytes that
// we see here.  For clients that understand unicode, this
// translation will be correct as long as the source was in
// Latin-1.
char buffer[30];
// According to
// http://www.htmlhelp.com/reference/html40/entities/
                // browsers offer better support for decimal notation than hex.
int count = sprintf(buffer, “&#%d;”, ch);
output.append(buffer, count);
// I assume that this option will not add a huge performance
// penalty, since this only applies to occasional messages,
// like the contents of the config window.  It will not apply
// to the alert messages which make up the bulk of our data.
     }
 }
}
    }
}

// cdataQuote is an alternative to xmlQuote.  The output is designed to be
// used in XML.  In some cases the output might be more readable and/or
// shorter than from xmlQuote.
//
// For the most part, cdataQuote just wraps the input between <![CDATA[
// and ]]> tags.  But there are some special cases.  Most obviously, if
// we have ]]> in the input, we have to do some special logic to that.
//
// As with xmlQuote, we ensure that the output only contains valid characters.
// If we see something that is not valid in the input, we try to fix it.
// As with xmlQuote, invalid control characters are deleted.  If we see 
// invalid bytes above the valid ASCII range, we assume they are LATIN-1 and
// convert them into unicode.  Unlike xmlQuote, this code will try to convert
// the LATIN-1 codes into equivalent UTF-8 codes.  There are some codes that
// are valid but meaninless, such as character 127.  That is to say that they
// don’t hurt the XML parser (like a null would) but they don’t convey any
// meaning, either.  cdataQuote removes all of these when translatings,
// not just the ones it has to.
//
// The result is that cdataQuote will correctly encode valid LATIN-1 data,
// in case some is still hiding in the database.  And it will always spit out
// valid XML.  But it doesn’t deal with some of the special cases dealing with
// Delphi and other character sets.  It would be better to call xmlQuote if you
// think you are talking with the Delphi code.  cdataQuote is initially aimed
// at our javascript proxy.  Note that the main program has to choose to
// use cdata quoting.  We don’t try to pick it automatically.

class CdataQuoter
{
private:
  std::string utf8Convert[256];
  std::string latin1Convert[256];
  static const std::string start;
  static const std::string restart;
  static const std::string end;
public:
  CdataQuoter();
  void quote(std::string &output, std::string const &input) const;
};

const std::string CdataQuoter::start = “<![CDATA[“;
const std::string CdataQuoter::restart = “]]]]><![CDATA[>”;
const std::string CdataQuoter::end = “]]>”;

void CdataQuoter::quote(std::string &output, std::string const &input) const
{
  output += start;
  const bool validUtf8 = isValidUtf8(input);
  std::string const * const convert = validUtf8?&utf8Convert[0]:&latin1Convert[0];
  char const *current = input.c_str();
  char const *const stop = current + input.size();
  while (current < stop)
    if (current[0] == ‘]’ && current[1] == ‘]’ && current[2] == ‘>’)
      {
output += restart;
current += 3;
      }
    else
      {
output += convert[(unsigned char)*current];
current++;
      }
  output += end;
}

CdataQuoter::CdataQuoter()
{
  // I believe these are the only valid control characters in XML.
  utf8Convert[(int)’t’] = latin1Convert[(int)’t’] = “t”;
  utf8Convert[(int)’r’] = latin1Convert[(int)’r’] = “r”;
  utf8Convert[(int)’n’] = latin1Convert[(int)’n’] = “n”;
  // Normal ASCII characters.
  for (int i = 32; i < 127; i++)
    utf8Convert[i] = latin1Convert[i] = std::string(1, (char)i);
  // If we know that the input was valid utf-8, we copy these byte for
  // byte.
  for (int i = 128; i < 256; i++)
    utf8Convert[i] = std::string(1, (char)i);
  // If we don’t have UTF-8, we assume we have latin-1 and convert that to
  // UTF-8
  iconv_t cd = iconv_open(“UTF8”, “LATIN1”);
  for (int i = 160; i < 256; i++)
    {
      iconv(cd, NULL, NULL, NULL, NULL);
      char input = i;
      char *inputPtr = &input;
      char output[5];
      char *outputPtr = output;
      size_t inputLeft = 1;
      size_t outputLeft = 5;
      size_t result = 
iconv(cd, &inputPtr, &inputLeft, &outputPtr, &outputLeft);
      if (result != (size_t)-1)
latin1Convert[i] = std::string(output, outputPtr – output);
    }
  iconv_close(cd);
}

static CdataQuoter cdataQuoter;

bool XmlNode::binarySafe(std::string const &data)
{
  if (!isValidUtf8(data))
    return false;
  for (std::string::const_iterator it = data.begin();
       it != data.end();
       it++)
    {
      const unsigned char ch = *it;
      if ((ch < ‘ ‘) && (ch != ‘n’) && (ch != ‘r’) && (ch != ‘t’))
return false;
      if (ch == 127)
// Treat delete just like any other control character.
return false;
    }
  return true;
}

std::string XmlNode::asString(const std::string &recommendedName) const
{
  std::string result;
  // Surprisingly, adding a reserve statement does not help by any 
  // measurable amount.  In other versions of this code, when we were
  // creating a lot more strings, reserve was very helpful.
  //result.reserve(14000);
  addToString(result, recommendedName);
  return result;
}

static const std::string s_NODE=”NODE”;

void XmlNode::addToString(std::string &output, const std::string &recommendedName) const
{
  std::string realName;
  if (name.empty())
    {
      realName = recommendedName;
    }
  else
    {
      realName = name;
    }
  // We should really check to make sure that the name is valid.  Due to a bug
  // we sometimes saw a name that was “2”, or similar.  The Delphi code seems
  // to ignore this.  The Java code threw an exception.
  output += ‘<‘;
  output += realName;
  for (PropertyList::const_iterator property = properties.begin();
       property != properties.end();
       property++)
    {
      output += ‘ ‘;
      output += property->first;
      output.append(“=””, 2);
      xmlQuote(output, property->second);
      output += ‘”‘;
    }
  if (namedChildren.empty() && orderedChildren.empty() && text.empty())
    {
      output.append(” />”, 3);
    }
  else
    {
      output += ‘>’;
      for (std::map< std::string, XmlNode >::const_iterator namedChild = 
    namedChildren.begin();
  namedChild != namedChildren.end();
  namedChild++)
{
 namedChild->second.addToString(output, namedChild->first);
}
      for (std::vector< XmlNode >::const_iterator orderedChild =
    orderedChildren.begin();
  orderedChild != orderedChildren.end();
  orderedChild++)
{
 orderedChild->addToString(output, s_NODE);
}
      if (useCdata)
cdataQuoter.quote(output, text);
      else
xmlQuote(output, text);
      output.append(“</”, 2);
      output += realName;
      output += ‘>’;
    }
}

void XmlNode::clear()
{
  name.clear();
  properties.clear();
  namedChildren.clear();
  orderedChildren.clear();
  text.clear();
}

bool XmlNode::empty() const
{
  return name.empty() 
    && properties.empty() 
    && namedChildren.empty() 
    && orderedChildren.empty() 
    && text.empty();
}

#ifdef __UNIT_TEST_

// g++ -Wall -ggdb -D__UNIT_TEST_ XmlSupport.C MiscSupport.C

#include <iostream>

void test(XmlNode &parent, std::string name, std::string value)
{
  XmlNode &node = parent[-1];
  node.properties[“name”] = name;
  node[“default”].text = value;
  XmlNode &cdata = node[“cdata”];
  cdata.text = value;
  cdata.useCdata = true;
  node.properties[“mode”] = XmlNode::binarySafe(value)?”text”:”binary”;
}

int main(int, char **)
{
  setlocale(LC_ALL, “”);
  XmlNode main;
  test(main, “empty”, “”);
  test(main, “simple”, “simple”);
  test(main, “special”, “&”<>”);
  test(main, “real top list”, “<API><TOPLIST SHORT_FORM=”form=1&amp;show0=D_Symbol&amp;col_ver=1&amp;sort=MaxRV&amp;X_NYSE=on&amp;X_ARCA=on&amp;X_AMEX=on&amp;XN=on” TYPE=”info” WINDOW=”TL1″ WINDOW_NAME=””><COLUMNS><c_D_Symbol CODE=”D_Symbol” DESCRIPTION=”Symbol” FORMAT=”” TEXT_FIELD=”1″ TEXT_HEADER=”1″ de_DESCRIPTION=”Kürzel” /></COLUMNS><SORT_BY FIELD=”RV” /></TOPLIST></API>”);
  test(main, “XML”, “<API><FIRST NAME=”philip”>philip</FIRST><MORE>&amp;&lt</MORE></API>”);
  std::string toTest;
  for (int i = 32; i < 127; i++)
    toTest += (char)i;
  test(main, “ascii”, toTest);
    toTest += “¡¢£¤¥¦§¨©ª«¬®¯°±²³´µ¶·¸¹º»¼½¾¿ÀÁÂÃÄÅÆÇÈÉÊËÌÍÎÏÐÑÒÓÔÕÖ×ØÙÚÛÜÝÞßàáâãäåæçèéêëìíîïðñòóôõö÷øùúûüýþÿ”;
  test(main, “utf8”, toTest);
  toTest.clear();
  for (int i = 1; i < 256; i++)
    toTest += (char)i;
  test(main, “all bytes but null”, toTest);
  toTest.clear();
  for (int i = ‘~’; i < 165; i++)
    toTest += (char)i;
  test(main, “past ascii”, toTest);
  toTest = ‘ ‘ + toTest;
  test(main, “null & past ascii”, toTest);
  test(main, “cdata”, “<![CDATA[<MAIN><TEST /></MAIN>]]>”);
  for (int i = 0; i < 256; i++)
    test(main, “byte ” + ntoa(i), std::string(1, i));
  std::cout<<“<?xml version=”1.0″ encoding=”UTF-8”?>n”<<main.asString()<<std::endl;
}

#endif

I hope you find this library useful.  Again, it was aimed at a very special purpose.  Maybe your needs are the same as mine, and maybe they aren’t.  Again, I’m very proud of this code, and it works well.  Maybe sometime I’ll show you some of my mistakes.  We can learn from those, right?  But for now, here’s a nice piece of code that you might want to copy.