We simply use the platform default encoding (as some builds may rely on this behavior). If you make heavy use of non-standard characters I’d simply suggest forcing your build to use UTF-8.
utf-8 is not character set, unicode is
I think “any UTF-8 character except NUL, CR and LF” means exactly utf-8-encoded unicode characters.
Still think its a bug.