I'd just like to note that (in my personal experience, take it for what it is) while Windows *in theory* supports UTF-16, in practice, it cannot work properly with pairs and is therefore more like UCS-2.<br><br>I've noticed this on multiple occasions, including while doing (forced by the university, I might add) C# development.<br>
<br><div class="gmail_quote">On 4 November 2012 06:10, Jason McKesson <span dir="ltr"><<a href="mailto:korval2@gmail.com" target="_blank">korval2@gmail.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<div class="HOEnZb"><div class="h5">On 11/3/2012 8:56 AM, Jookia wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
Hello!<br>
<br>
Recently I've been trying to get my application to run on Windows. At first I found that it didn't load files properly when they had a certain phrase in them that's meant to break things that don't support Unicode properly.<br>
<br>
Long story short, I wrote this:<br>
<br>
#include <physfs.h><br>
#include <stdio.h><br>
<br>
int main(int argc, char** argv)<br>
{<br>
PHYSFS_init(argv[0]);<br>
PHYSFS_mount(PHYSFS_<u></u>getBaseDir(), "", 0);<br>
PHYSFS_File* file = PHYSFS_openRead("𝓲");<br>
<br>
printf("%s\n", PHYSFS_getLastError());<br>
<br>
return 0;<br>
}<br>
<br>
(If you see a box please try and copy this in your IDE. MSVC displays it fine.)<br>
<br>
On Linux it works. It writes 'No such file or directory' to my console. Fantastic!<br>
<br>
On Wine + MinGW (I know, I know, I'll explain in a tick why I don't think they're at fault. I haven't got PhysFS to work in Windows yet.) it returns 'Invalid name.'. What?<br>
<br>
Digging deeper in to the code, I found that windows.c's doPlatformExists fails. After manually printing out the UTF-16 string to a file and then using iconv to read it, it turns out my lovely character has turned to a question mark, which happens to be an invalid name.<br>
<br>
Changing the code up so it bypasses the Unicode conversions:<br>
<br>
static int doPlatformExists(LPWSTR wpath)<br>
{<br>
LPWSTR newpath = L"Z:\\home\\jookia\\Staging\\<u></u>test-𝓲";<br>
<br>
if(pGetFileAttributesW(wpath) == PHYSFS_INVALID_FILE_<u></u>ATTRIBUTES)<br>
{<br>
wpath = newpath;<br>
}<br>
<br>
BAIL_IF_MACRO<br>
(<br>
pGetFileAttributesW(wpath) == PHYSFS_INVALID_FILE_<u></u>ATTRIBUTES,<br>
winApiStrError(), 0<br>
);<br>
return(1);<br>
} /* doPlatformExists */<br>
<br>
Will make the program write out 'File not found.' to my console.<br>
<br>
It seems the error is in utf8ToUcs2 not accounting for surrogates:<br>
<br>
<br>
/* !!! BLUESKY: UTF-16 surrogates? */<br>
if (cp > 0xFFFF)<br>
cp = UNICODE_BOGUS_CHAR_CODEPOINT;<br>
<br>
I know UCS-2 technically doesn't account for surrogates, but we're using UTF-16 in Windows. Commenting out the code will make a weird path due to it not being converted properly, but it will bring up a 'File not found.'<br>
<br>
So... Are there any plans to fix this? Are patches welcome?<br>
</blockquote></div></div>
UCS-2 doesn't "technically" do anything. It's two bytes per codepoint and stops at codepoint 0xFFFF. Changing utf8ToUcs2 to use surrogate pairs would be terrible, since that's not what UCS-2 is.<br>
<br>
Instead, you want a utf8ToUtf16 function.<div class="HOEnZb"><div class="h5"><br>
______________________________<u></u>_________________<br>
physfs mailing list<br>
<a href="mailto:physfs@icculus.org" target="_blank">physfs@icculus.org</a><br>
<a href="http://icculus.org/mailman/listinfo/physfs" target="_blank">http://icculus.org/mailman/<u></u>listinfo/physfs</a><br>
</div></div></blockquote></div><br>