Decoding strings with an unknown encoding

2 thoughts on “Decoding strings with an unknown encoding”

  1. “new String(bytes[], encoding) does no such thing”

    Sure it does. Observe:

    byte[] invalidUTF8 = {-128};
    String mystery = new String(invalidUTF8, “UTF-8”);
    print((int) mystery.charAt(0));
    // prints 65533

    .. and 65533, FFFD in hex, happens to be REPLACEMENT_CHARACTER in Unicode.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s