source code encoding

utf-8 or utf-16

The utf-8 and utf-16 are different implementation of unicode standards. Microsoft Windows choose utf-16 as their unicode implementations for filesystem and visual studio default source encoding, while gcc and mac choose utf-8 as default.

For the reason, in Windows programming, when talking about unicode, we may usually talk about saving string in wide-character (wchar_t*) array.

Let's see the code below:

char* string = "中文";
wchar_t* wstring = L"中文";

In Windows, when writing this code in utf-8, and "set nobomb" in vim, the following things happen:

Visual Studio will use local-machine(gb2312 if your Region and Language is Chinese) encoding read the source string "中文", and save it as local-machine encoding in string variable.
Visual Studio will use local-machine encoding read the source string L"中文", convert and save it as unicode encoding in wstring variable.

When we "set bomb" in vim, the following things happen:

Visual Studio will use utf-8 encoding read the source string "中文", and save it as local-machine encoding in string variable.
Visual Studio will use utf-8 encoding read the source string L"中文", and save it as unicode encoding in wstring variable.

Conclusion

In windows, the encoding in char* is not guaranteed, while the wchar_t* always use unicode(utf-16) encoding.

Reference

http://www.utf8everywhere.org/ A very good article!
http://www.public-software-group.org/utf8proc
http://puszcza.gnu.org.ua/software/microutf8/

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

source code encoding

utf-8 or utf-16

Reference

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Clone this wiki locally