Craig Box's journeys, stories and notes...


Asshat space (or wordpress c2 a0, for search-fu)

Somehow, WordPress is inserting C2 A0 characters in my feed, which means that Planet NZTech can't parse them, so my posts don't show up until I find them manually and fix them.

C2 A0 is a unicode non-breaking space. It could be because of my habit of hitting Space twice after a sentence, that it realiases one of them has to be non-breaking. Whatever it is, it's irritating.

It doesn't happen in the output under ISO-8859-1. It's only on Windows, doing a diff of the feed as downloaded on my UTF-8 Linux server, that I actually see the problem.

Badly configured UTF-8 systems often end up with the symbol A-with-circumflex (Â) before the character. In #wlug, we lovingly call this character "the asshat". I had thought that putting it in would stop this post from being picked up, but seems there's an â in HTML just for my asshat character.

I've also found I can see them with LANG=iso-8859-1 less index.html. This explains why I couldn't find them to start with - less runs in UTF-8 by default, which draws it as a space!

Unfortunately, it works fine on Planet WLUG, so it's fixed in newer planetplanet, which doesn't work for Follower at the moment ?

Not much can really be fixed at this point, so this writeup can act as a "this is the problem" in case anyone Googles for "wordpress c2 a0".

Tags: ,

One Response to “Asshat space (or wordpress c2 a0, for search-fu)”

  1. John McPherson says:

    "Badly configured UTF-8 systems often end up with the symbol A-with-circumflex (Â) before the character."

    That is what happens when you view an accented western utf-8 character as if it were latin/iso-8859-1. Eg the pound character £ is 0xc2 0xa3 in unicode, and if you view it as iso-8859-1 instead, you'll see £ because £ happens to be 0xa3 in 8859-1. If all apps defaulted to utf-8 instead of some defaulting to iso-8859-1, there wouldn't be a problem 🙂

Leave a Reply to John McPherson