[physfs] Unicode conversions fail outside BMP
    Jookia 
    166291 at gmail.com
       
    Sat Nov  3 11:56:38 EDT 2012
    
    
  
Hello!
Recently I've been trying to get my application to run on Windows. At 
first I found that it didn't load files properly when they had a certain 
phrase in them that's meant to break things that don't support Unicode 
properly.
Long story short, I wrote this:
#include <physfs.h>
#include <stdio.h>
int main(int argc, char** argv)
{
   PHYSFS_init(argv[0]);
   PHYSFS_mount(PHYSFS_getBaseDir(), "", 0);
   PHYSFS_File* file = PHYSFS_openRead("𝓲");
   printf("%s\n", PHYSFS_getLastError());
   return 0;
}
(If you see a box please try and copy this in your IDE. MSVC displays it 
fine.)
On Linux it works. It writes 'No such file or directory' to my console. 
Fantastic!
On Wine + MinGW (I know, I know, I'll explain in a tick why I don't 
think they're at fault. I haven't got PhysFS to work in Windows yet.) it 
returns 'Invalid name.'. What?
Digging deeper in to the code, I found that windows.c's doPlatformExists 
fails. After manually printing out the UTF-16 string to a file and then 
using iconv to read it, it turns out my lovely character has turned to a 
question mark, which happens to be an invalid name.
Changing the code up so it bypasses the Unicode conversions:
static int doPlatformExists(LPWSTR wpath)
{
     LPWSTR newpath = L"Z:\\home\\jookia\\Staging\\test-𝓲";
     if(pGetFileAttributesW(wpath) == PHYSFS_INVALID_FILE_ATTRIBUTES)
     {
         wpath = newpath;
     }
     BAIL_IF_MACRO
     (
         pGetFileAttributesW(wpath) == PHYSFS_INVALID_FILE_ATTRIBUTES,
         winApiStrError(), 0
     );
     return(1);
} /* doPlatformExists */
Will make the program write out 'File not found.' to my console.
It seems the error is in utf8ToUcs2 not accounting for surrogates:
         /* !!! BLUESKY: UTF-16 surrogates? */
         if (cp > 0xFFFF)
             cp = UNICODE_BOGUS_CHAR_CODEPOINT;
I know UCS-2 technically doesn't account for surrogates, but we're using 
UTF-16 in Windows. Commenting out the code will make a weird path due to 
it not being converted properly, but it will bring up a 'File not found.'
So... Are there any plans to fix this? Are patches welcome?
Thanks,
Jookia.
    
    
More information about the physfs
mailing list