[physfs] Unicode conversions fail outside BMP

Jookia 166291 at gmail.com
Sat Nov 3 11:56:38 EDT 2012


Hello!

Recently I've been trying to get my application to run on Windows. At 
first I found that it didn't load files properly when they had a certain 
phrase in them that's meant to break things that don't support Unicode 
properly.

Long story short, I wrote this:

#include <physfs.h>
#include <stdio.h>

int main(int argc, char** argv)
{
   PHYSFS_init(argv[0]);
   PHYSFS_mount(PHYSFS_getBaseDir(), "", 0);
   PHYSFS_File* file = PHYSFS_openRead("𝓲");

   printf("%s\n", PHYSFS_getLastError());

   return 0;
}

(If you see a box please try and copy this in your IDE. MSVC displays it 
fine.)

On Linux it works. It writes 'No such file or directory' to my console. 
Fantastic!

On Wine + MinGW (I know, I know, I'll explain in a tick why I don't 
think they're at fault. I haven't got PhysFS to work in Windows yet.) it 
returns 'Invalid name.'. What?

Digging deeper in to the code, I found that windows.c's doPlatformExists 
fails. After manually printing out the UTF-16 string to a file and then 
using iconv to read it, it turns out my lovely character has turned to a 
question mark, which happens to be an invalid name.

Changing the code up so it bypasses the Unicode conversions:

static int doPlatformExists(LPWSTR wpath)
{
     LPWSTR newpath = L"Z:\\home\\jookia\\Staging\\test-𝓲";

     if(pGetFileAttributesW(wpath) == PHYSFS_INVALID_FILE_ATTRIBUTES)
     {
         wpath = newpath;
     }

     BAIL_IF_MACRO
     (
         pGetFileAttributesW(wpath) == PHYSFS_INVALID_FILE_ATTRIBUTES,
         winApiStrError(), 0
     );
     return(1);
} /* doPlatformExists */

Will make the program write out 'File not found.' to my console.

It seems the error is in utf8ToUcs2 not accounting for surrogates:


         /* !!! BLUESKY: UTF-16 surrogates? */
         if (cp > 0xFFFF)
             cp = UNICODE_BOGUS_CHAR_CODEPOINT;

I know UCS-2 technically doesn't account for surrogates, but we're using 
UTF-16 in Windows. Commenting out the code will make a weird path due to 
it not being converted properly, but it will bring up a 'File not found.'

So... Are there any plans to fix this? Are patches welcome?

Thanks,
Jookia.


More information about the physfs mailing list