Skip to content

source code encoding

Wu Jie edited this page Mar 30, 2014 · 1 revision

utf-8 or utf-16

The utf-8 and utf-16 are different implementation of unicode standards. Microsoft Windows choose utf-16 as their unicode implementations for filesystem and visual studio default source encoding, while gcc and mac choose utf-8 as default.

For the reason, in Windows programming, when talking about unicode, we may usually talk about saving string in wide-character (wchar_t*) array.

Let's see the code below:

char* string = "中文";
wchar_t* wstring = L"中文";

In Windows, when writing this code in utf-8, and "set nobomb" in vim, the following things happen:

  • Visual Studio will use local-machine(gb2312 if your Region and Language is Chinese) encoding read the source string "中文", and save it as local-machine encoding in string variable.
  • Visual Studio will use local-machine encoding read the source string L"中文", convert and save it as unicode encoding in wstring variable.

When we "set bomb" in vim, the following things happen:

  • Visual Studio will use utf-8 encoding read the source string "中文", and save it as local-machine encoding in string variable.
  • Visual Studio will use utf-8 encoding read the source string L"中文", and save it as unicode encoding in wstring variable.

Conclusion

  • In windows, the encoding in char* is not guaranteed, while the wchar_t* always use unicode(utf-16) encoding.

Reference

Clone this wiki locally