pipe-menus.. character encoding, high ascii and special xml chars
logan at dct.com
Tue Oct 28 12:54:24 EST 2003
One common item to all of my pipe-menu apps is ob3_string_to_xml_safe(), which converts a given string to it's html-like unsigned hex counterpart.
gchar *ob3_string_to_xml_safe(const gchar *str)
gchar *s = NULL;
g_return_val_if_fail(str != NULL && str != '\0', NULL);
const guchar *p = (guchar *) str;
GString *string = g_string_new(NULL);
g_string_append_printf(string, "&#x%x;", *p++ & 0xff);
s = string->str;
ob3 becomes ob3.
Initially I used an ord()-like conversion, but some characters would return negative values (É as an example). The xml parser would freak on &#-55;
The xml encoding could be set to say iso-8859-1, which would allow for such a character to be processed without any special encoding. I thought about printing <?xml version="1.0" encoding="iso-8859-1"?> at the start of every <openbox_pipe_menu>, but since ob3 defaults to UTF-8, I'm not sure if this would be an expected behavior... or if I did change it, would people even notice/care?
If iso-8859-1 were used, characters like <>&'" still have to be encoded, either by their builtin xml entities, or using some other conversion. Then I started to think about how much additional work is needed to process this unsigned hex conversion.
A fairly simple test could be performed within ob3_string_to_xml_safe()...
gchar c = *p++;
if(!g_ascii_isprint(c) || (c == '<' || c == '>' ...))
g_string_append_printf(string, "&#x%x;", c & 0xff);
g_string_append_printf(string, "%c", c);
With this, any characters outside of the 32-126 range or the special xml characters would be converted. That means less (and readable) output, and likely less work for the xml parser.
Guess I'm just looking for some thoughts on this stuff. If anyone has experience with this, or knows of some good reading (usenet posts, etc), I'd appreciate it.
More information about the openbox