Microsoft Windows [Version 10.0.18363.592]
Impact
This issue is affecting reading console input via the Universal C Runtime as well - _read
, getchar
, fread
, scanf
, etc. Using _cgets_s
only works around this issue because it uses ReadConsoleW
instead of ReadFile
. This is also reported against the UCRT on Developer Community here: _read() cannot read UTF-8 but _cgets_s() can.
When using ReadFile
to read from a console handle, UTF-8 input is not correctly returned. Using ReadFile
on other types of handles (files, pipes) can read UTF-8 without issue. SetConsoleCP
and SetConsoleOutputCP
do not appear to affect this behavior.
C:\Users\stwish\source\read_utf8>type win32_test.cpp
#include <Windows.h>
#include <stdio.h>
int main()
{
SetConsoleCP(65001);
SetConsoleOutputCP(65001);
const HANDLE console_stdin = GetStdHandle(STD_INPUT_HANDLE);
const size_t buf_count = 20;
char buffer[buf_count]{};
DWORD num_read;
BOOL result = ReadFile(
console_stdin,
buffer,
buf_count,
&num_read,
nullptr
);
printf("ReadFile returned '%d'\n", result);
for (int i = 0; i < 20; i++)
{
printf("%02x ", (unsigned char)buffer[i]);
}
return 0;
}
C:\Users\stwish\source\read_utf8>cl /nologo /EHsc /MT win32_test.cpp /Zi
win32_test.cpp
C:\Users\stwish\source\read_utf8>win32_test.exe
我是中文字符
ReadFile returned '1'
00 00 00 00 00 00 0d 0a 00 00 00 00 00 00 00 00 00 00 00 00
C:\Users\stwish\source\read_utf8>echo 我是中文字符 | win32_test.exe
ReadFile returned '1'
e6 88 91 e6 98 af e4 b8 ad e6 96 87 e5 ad 97 e7 ac a6 20 0d
C:\Users\stwish\source\read_utf8>type input.txt
我是中文字符
C:\Users\stwish\source\read_utf8>type input.txt | win32_test.exe
ReadFile returned '1'
e6 88 91 e6 98 af e4 b8 ad e6 96 87 e5 ad 97 e7 ac a6 00 00
Expected behavior
Running win32_test.exe
and entering '我是中文字符' input on the console should return e6 88 91 e6 98 af e4 b8 ad e6 96 87 e5 ad 97 e7 ac a6 0d 0a
as this is the UTF-8 representation of that string, plus CR LF.
Running win32_test.exe
and entering '我是中文字符' input on the console will return 6 null characters and CR LF, but still returns that the read operation was successful.
eryksun, r37r0m0d3l, KindDragon, vrubleg, asm256 and 8 more
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4