[physfs] Unicode conversions fail outside BMP
darkuranium at gmail.com
Sun Nov 4 06:23:24 EST 2012
I'd just like to note that (in my personal experience, take it for what it
is) while Windows *in theory* supports UTF-16, in practice, it cannot work
properly with pairs and is therefore more like UCS-2.
I've noticed this on multiple occasions, including while doing (forced by
the university, I might add) C# development.
On 4 November 2012 06:10, Jason McKesson <korval2 at gmail.com> wrote:
> On 11/3/2012 8:56 AM, Jookia wrote:
>> Recently I've been trying to get my application to run on Windows. At
>> first I found that it didn't load files properly when they had a certain
>> phrase in them that's meant to break things that don't support Unicode
>> Long story short, I wrote this:
>> #include <physfs.h>
>> #include <stdio.h>
>> int main(int argc, char** argv)
>> PHYSFS_mount(PHYSFS_**getBaseDir(), "", 0);
>> PHYSFS_File* file = PHYSFS_openRead("𝓲");
>> printf("%s\n", PHYSFS_getLastError());
>> return 0;
>> (If you see a box please try and copy this in your IDE. MSVC displays it
>> On Linux it works. It writes 'No such file or directory' to my console.
>> On Wine + MinGW (I know, I know, I'll explain in a tick why I don't think
>> they're at fault. I haven't got PhysFS to work in Windows yet.) it returns
>> 'Invalid name.'. What?
>> Digging deeper in to the code, I found that windows.c's doPlatformExists
>> fails. After manually printing out the UTF-16 string to a file and then
>> using iconv to read it, it turns out my lovely character has turned to a
>> question mark, which happens to be an invalid name.
>> Changing the code up so it bypasses the Unicode conversions:
>> static int doPlatformExists(LPWSTR wpath)
>> LPWSTR newpath = L"Z:\\home\\jookia\\Staging\\**test-𝓲";
>> if(pGetFileAttributesW(wpath) == PHYSFS_INVALID_FILE_**ATTRIBUTES)
>> wpath = newpath;
>> pGetFileAttributesW(wpath) == PHYSFS_INVALID_FILE_**ATTRIBUTES,
>> winApiStrError(), 0
>> } /* doPlatformExists */
>> Will make the program write out 'File not found.' to my console.
>> It seems the error is in utf8ToUcs2 not accounting for surrogates:
>> /* !!! BLUESKY: UTF-16 surrogates? */
>> if (cp > 0xFFFF)
>> cp = UNICODE_BOGUS_CHAR_CODEPOINT;
>> I know UCS-2 technically doesn't account for surrogates, but we're using
>> UTF-16 in Windows. Commenting out the code will make a weird path due to it
>> not being converted properly, but it will bring up a 'File not found.'
>> So... Are there any plans to fix this? Are patches welcome?
> UCS-2 doesn't "technically" do anything. It's two bytes per codepoint and
> stops at codepoint 0xFFFF. Changing utf8ToUcs2 to use surrogate pairs
> would be terrible, since that's not what UCS-2 is.
> Instead, you want a utf8ToUtf16 function.
> physfs mailing list
> physfs at icculus.org
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the physfs