[physfs] Unicode conversions fail outside BMP
166291 at gmail.com
Sat Nov 3 11:56:38 EDT 2012
Recently I've been trying to get my application to run on Windows. At
first I found that it didn't load files properly when they had a certain
phrase in them that's meant to break things that don't support Unicode
Long story short, I wrote this:
int main(int argc, char** argv)
PHYSFS_mount(PHYSFS_getBaseDir(), "", 0);
PHYSFS_File* file = PHYSFS_openRead("𝓲");
(If you see a box please try and copy this in your IDE. MSVC displays it
On Linux it works. It writes 'No such file or directory' to my console.
On Wine + MinGW (I know, I know, I'll explain in a tick why I don't
think they're at fault. I haven't got PhysFS to work in Windows yet.) it
returns 'Invalid name.'. What?
Digging deeper in to the code, I found that windows.c's doPlatformExists
fails. After manually printing out the UTF-16 string to a file and then
using iconv to read it, it turns out my lovely character has turned to a
question mark, which happens to be an invalid name.
Changing the code up so it bypasses the Unicode conversions:
static int doPlatformExists(LPWSTR wpath)
LPWSTR newpath = L"Z:\\home\\jookia\\Staging\\test-𝓲";
if(pGetFileAttributesW(wpath) == PHYSFS_INVALID_FILE_ATTRIBUTES)
wpath = newpath;
pGetFileAttributesW(wpath) == PHYSFS_INVALID_FILE_ATTRIBUTES,
} /* doPlatformExists */
Will make the program write out 'File not found.' to my console.
It seems the error is in utf8ToUcs2 not accounting for surrogates:
/* !!! BLUESKY: UTF-16 surrogates? */
if (cp > 0xFFFF)
cp = UNICODE_BOGUS_CHAR_CODEPOINT;
I know UCS-2 technically doesn't account for surrogates, but we're using
UTF-16 in Windows. Commenting out the code will make a weird path due to
it not being converted properly, but it will bring up a 'File not found.'
So... Are there any plans to fix this? Are patches welcome?
More information about the physfs