Every year or so I manage to uncover a gap in my knowledge as it relates to strings, character sets, and encoding. I have just started embracing this as part of the cycle of mastering (or attempting to master) any given topic. Whenever I feel the need I resort to the most fundamental mechanisms of learning. It is a simple act of reading, practice and memorization.

Reading

These are the two articles I use to rollback my atrophied memory on Strings:

Practice

The problem I had encountered this time was born from the need to serialize a simple object into a valid UTF-8 xml string, the following shows the code I was originally using to get this accomplished:

public class SomeItem
{
    public string SomeDataElement;
    public string AnotherElement;
}

class Program
{
    static void Main(string[] args)
    {
		SomeItem si = new SomeItem();

        XmlSerializer xml = new XmlSerializer(si.GetType());
        StringWriter stringwriter = new StringWriter();

        xml.Serialize(stringwriter, si);

        Console.WriteLine(stringwriter.ToString());
        Console.Read();
	}
}

Pretty straightforward stuff, however, the resulting serialized string would always have encoding defaulted UTF-16 as follows:

<?xml version=\"1.0\" encoding=\"utf-16\"?>


The thing to internalize here is that .NET is based upon the Unicode character set, and even more specifically, UTF-16 encoding and so whenever Streams are used to output any data you should explicitly define how you want the characters encoded. In this particular example the “Encoding” property associated with StringWriter is actually a read-only property, and so I was forced to inherit StringWriter and override the defined encoding as follows:

public class StringWriterUTF8 : StringWriter
{
	public override Encoding Encoding
	{
		get{ return Encoding.UTF8;}
	}
}

class Program
{
	static void Main(string[] args)
    {
		SomeItem si = new SomeItem();
		
		XmlSerializer xml = new XmlSerializer(si.GetType());
		StringWriterUTF8 stringwriter = new StringWriterUTF8();
		xml.Serialize(stringwriter, si);
		
		Console.WriteLine(stringwriter.ToString());
		Console.Read();
	}
}

Memorization

Well, that is what this post is for…



Comment Section

Comments are closed.