A Custom Text Encoding Generator For Silverlight
Unlike the .NET platform, Silverlight only provides two text encodings out of the box: UTF-8 (UTF8Encoding class) and UTF-16 (UnicodeEncoding class).
Accordingly, if you find yourself in a situation where you need to encode or decode data with another encoding (e.g. iso-8859-1), you’ll have to write your own Encoding class (or delegate the work to a server-side service).
I found myself in this exact situation yesterday, and came up with a little tool which automates the process. The Encoding Generator is a WPF application which takes the name or code page of a well known encoding, and generates source code for a custom Encoding class which compiles under Silverlight.
Current version: 1.0.0, 2010.03.31, requires .NET 3.5 SP1 or higher
(You can subscribe to the RSS feed or follow me on Twitter in order to get notified about updates and bug fixes)
How Does It Work?
Specifying the Encoding
In order to specify the encoding you want to use, you can either enter the name or numeric code page of a well-known encoding. As soon as you enter a valid value, some information for the encoding is being displayed in the right hand border you can see on the screenshot.
As a sample for valid encoding names or code pages, here’s some values you can enter in order to tell the tool to generate an iso-8859-1 encoder (see screenshot):
- iso-8859-1 (name)
- latin1 (name)
- 28591 (code page)
- A list of encodings can be found here.
Fallback Character
The tool gives you the option to specify a fallback character value, which is used as a default in case a character or byte value is being processed during encoding/decoding. In case you don’t specify the character, the encoding class will crash at runtime should it receive data that cannot be properly encoded or decoded.
Single-Byte Encoding Limitation
The generated class only works if a single byte can be translated into a single character and vice versa. Accordingly, if you try to generate code for an encoding that uses several bytes per (e.g. utf-8) character, the generator shows an error message.
Byte Range
You need to specify the byte range of the encoding. For example, ASCII supports only 128 characters, and therefore has a byte range of 128 bytes. Most other encodings support a byte range of 256 bytes, though. 256 is the maximum value that can be specified, as a single byte cannot deliver more values (the byte data type covers a numeric range from 0 – 255).
Testing
The generator also creates an NUnit test class that compares the results of the generated class against the original encoding. Accordingly, this test class is supposed to run in a regular .NET environment, not in Silverlight (if the original encoding that is used in the test was available in SL, you wouldn’t have to generate a custom encoding class in the first place…).
Internals
At runtime, the following is happening: Basically, the generator maintains mapping tables to do the encoding and decoding from characters to bytes and vice versa. Fore every request, it just looks up the translation tables for every supported character/byte value of the encoding.
The generator creates these translation tables on the fly in the form of a static array and dictionary.
Performance
The library doesn’t contain any performance tweaks and performs much slower than the built-in encodings that rely on all sorts of black magic. However, as long as you don’t have to encode or decode huge amounts of data, this shouldn’t be noticeable.
Here’s the results from my machine for 10000 iterations:
- Encoding the whole character table to a byte array (256 characters)
- 17 milliseconds with the built-in encoding
- 94 milliseconds with the generated encoding
- Decoding the bytes back into a string
- 2 milliseconds with the built-in encoding
- 46 milliseconds with the custom encoding
“Silverlight only provides two text encodings out of the box: UTF-8 and Unicode.”
UTF-8 *is* Unicode. Do you mean UTF-16 ?
Daniel,
I am aware of that. The encoding classes that come out of the box in Silverlight are called UTF8Encoding and UnicodeEncoding so I decided to stick to the terminology. But you’re right, I could have been more clear – I updated the posting accordingly. Thanks for the feedback.
You save me! Thanks!
I am really impressed. This is a great piece of work. Thanks for sharing it!
Hi,
Your encoding generator has been a blessing for me. I have been writing a keyboard app for WP7 and this has been vital for me.
I am looking into supporting Nordic languages but i can’t get encoding generator to take the latin6 / iso-8859-10 codes.
I’ll look into the code eventually. Thought i’d ask you first as i have my hands full building decent frequency word list
dont worry about it. I think i am have figured out how to use utf-8 finally 🙂 will know for sure tomorrow
Saved me 2 hours of work. Great idea to autogenerate it.
Muito boa solução! Resolveu definitivamente o meu problema. Obrigado por compartilhar!
Nice job, i m trying to add ASCII support , but when i call my generated class i get the following error:
“An item with the same key has already been added.”
any ideas?
thanks and nice work.
Hi,am trying to load an windows 1252 xml, and am getting an error in the silverlight. i saw the class “A custom encoding class that provides encoding capabilities for the
Central European (Windows)’ encoding under Silverlight”
But i dont know how to use it in silverlight, i mean how to give the xml to it .
Please any help is appreciated.
I’m impressed and grateful
Hi,
thanks a lot, saved me a lot of time.
Great work 😉
This is a brilliant idea and brilliant execution, thank you!
Thank you man, you’re a crack!!
Thanks for sharing its very useful tool, not to mention the auto-generated unit tests!
Thanks a lot! Saved me a lot of time to decode a web page from ISO-8859-15. Works perfectly!
Thank you! This helped me alot!
Hi Philipp,
Can you please make one that can generate encoding for BIG5 (two bytes)? We are desperately need that. Thank you!
Todd
Thank you for publishing this, wonderful work, you saved me too 😉
You made my day, thank you a lot! Very useful and nice application!
You are my hero! It works perfectly and save me a lot of time, thank you for sharing your work 🙂
Thanks! I spent like 2-3h trying to get 1252 to utf8 to work.
This helped me a bit: http://msdn.microsoft.com/en-us/library/kdcak6ye.aspx
But in the end the thing that solved my issue was when I was reading the textfiles with at StreamReader, all I had to do was specifiy the generated class. Like this:
Stream s = Application.GetResourceStream(new Uri(“TestData/file.xml”, UriKind.Relative)).Stream;
var reader = new StreamReader(s, new Windows1252Encoding());
Where the Windows1252Encoding class is what the Silverlight-application generated for me.
BIG THANKS to the author!
Hi Philipp,I need an example about how to implemente a generated class from you tool.
Is is possible for you post a small example?
Thanks in advice.
i am speechless i was about to throw 3 days worth of work because of this problem, and then i fount this.
i really cant thank you enough!!
best tool ever!!!
thank you , thank you, thank you!
Thank you for publishing this, Great idea to auto generate it, you saved me from Headache and lot of time…
This utility is godsend, really. Pretty good use in Lightswitch. I cannot be grateful enough for your sharing. Thanks again!
And thanks from me.Very helpful tool
Hi there, this is a great tool but not exactly what I need.
Im looking for directions to a site or video detailing how I could make a custom encoder/ decoder for my custom cipher. The alphabet contains a-z, A-Z, all numbers, grammar, symbols and well everything, each one has a corresponding number. The only difficulty is that with this cipher it requires you to times the number result by its position in the word or number.
For example:
a = 2
t = 9
Ok so that would mean that at would = 2,18
Then tat would = 9,4,27.
Just hoping someone has the knowhow to help me program this or point me to someone who does.