I&#39;d just like to note that (in my personal experience, take it for what it is) while Windows *in theory* supports UTF-16, in practice, it cannot work properly with pairs and is therefore more like UCS-2.<br><br>I&#39;ve noticed this on multiple occasions, including while doing (forced by the university, I might add) C# development.<br>

<br><div class="gmail_quote">On 4 November 2012 06:10, Jason McKesson <span dir="ltr">&lt;<a href="mailto:korval2@gmail.com" target="_blank">korval2@gmail.com</a>&gt;</span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

<div class="HOEnZb"><div class="h5">On 11/3/2012 8:56 AM, Jookia wrote:<br>

<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

Hello!<br>

<br>

Recently I&#39;ve been trying to get my application to run on Windows. At first I found that it didn&#39;t load files properly when they had a certain phrase in them that&#39;s meant to break things that don&#39;t support Unicode properly.<br>


<br>

Long story short, I wrote this:<br>

<br>

#include &lt;physfs.h&gt;<br>

#include &lt;stdio.h&gt;<br>

<br>

int main(int argc, char** argv)<br>

{<br>

  PHYSFS_init(argv[0]);<br>

  PHYSFS_mount(PHYSFS_<u></u>getBaseDir(), &quot;&quot;, 0);<br>

  PHYSFS_File* file = PHYSFS_openRead(&quot;𝓲&quot;);<br>

<br>

  printf(&quot;%s\n&quot;, PHYSFS_getLastError());<br>

<br>

  return 0;<br>

}<br>

<br>

(If you see a box please try and copy this in your IDE. MSVC displays it fine.)<br>

<br>

On Linux it works. It writes &#39;No such file or directory&#39; to my console. Fantastic!<br>

<br>

On Wine + MinGW (I know, I know, I&#39;ll explain in a tick why I don&#39;t think they&#39;re at fault. I haven&#39;t got PhysFS to work in Windows yet.) it returns &#39;Invalid name.&#39;. What?<br>

<br>

Digging deeper in to the code, I found that windows.c&#39;s doPlatformExists fails. After manually printing out the UTF-16 string to a file and then using iconv to read it, it turns out my lovely character has turned to a question mark, which happens to be an invalid name.<br>


<br>

Changing the code up so it bypasses the Unicode conversions:<br>

<br>

static int doPlatformExists(LPWSTR wpath)<br>

{<br>

    LPWSTR newpath = L&quot;Z:\\home\\jookia\\Staging\\<u></u>test-𝓲&quot;;<br>

<br>

    if(pGetFileAttributesW(wpath) == PHYSFS_INVALID_FILE_<u></u>ATTRIBUTES)<br>

    {<br>

        wpath = newpath;<br>

    }<br>

<br>

    BAIL_IF_MACRO<br>

    (<br>

        pGetFileAttributesW(wpath) == PHYSFS_INVALID_FILE_<u></u>ATTRIBUTES,<br>

        winApiStrError(), 0<br>

    );<br>

    return(1);<br>

} /* doPlatformExists */<br>

<br>

Will make the program write out &#39;File not found.&#39; to my console.<br>

<br>

It seems the error is in utf8ToUcs2 not accounting for surrogates:<br>

<br>

<br>

        /* !!! BLUESKY: UTF-16 surrogates? */<br>

        if (cp &gt; 0xFFFF)<br>

            cp = UNICODE_BOGUS_CHAR_CODEPOINT;<br>

<br>

I know UCS-2 technically doesn&#39;t account for surrogates, but we&#39;re using UTF-16 in Windows. Commenting out the code will make a weird path due to it not being converted properly, but it will bring up a &#39;File not found.&#39;<br>


<br>

So... Are there any plans to fix this? Are patches welcome?<br>

</blockquote></div></div>

UCS-2 doesn&#39;t &quot;technically&quot; do anything. It&#39;s two bytes per codepoint and stops at codepoint 0xFFFF.  Changing utf8ToUcs2 to use surrogate pairs would be terrible, since that&#39;s not what UCS-2 is.<br>


<br>

Instead, you want a utf8ToUtf16 function.<div class="HOEnZb"><div class="h5"><br>

______________________________<u></u>_________________<br>

physfs mailing list<br>

<a href="mailto:physfs@icculus.org" target="_blank">physfs@icculus.org</a><br>

<a href="http://icculus.org/mailman/listinfo/physfs" target="_blank">http://icculus.org/mailman/<u></u>listinfo/physfs</a><br>

</div></div></blockquote></div><br>