-
Notifications
You must be signed in to change notification settings - Fork 2
source code encoding
Wu Jie edited this page Mar 30, 2014
·
1 revision
The utf-8 and utf-16 are different implementation of unicode standards. Microsoft Windows choose utf-16 as their unicode implementations for filesystem and visual studio default source encoding, while gcc and mac choose utf-8 as default.
For the reason, in Windows programming, when talking about unicode, we may usually talk about saving string in wide-character (wchar_t*) array.
Let's see the code below:
char* string = "中文";
wchar_t* wstring = L"中文";In Windows, when writing this code in utf-8, and "set nobomb" in vim, the following things happen:
- Visual Studio will use local-machine(gb2312 if your Region and Language is Chinese) encoding read the source string "中文", and save it as local-machine encoding in string variable.
- Visual Studio will use local-machine encoding read the source string L"中文", convert and save it as unicode encoding in wstring variable.
When we "set bomb" in vim, the following things happen:
- Visual Studio will use utf-8 encoding read the source string "中文", and save it as local-machine encoding in string variable.
- Visual Studio will use utf-8 encoding read the source string L"中文", and save it as unicode encoding in wstring variable.
Conclusion
- In windows, the encoding in char* is not guaranteed, while the wchar_t* always use unicode(utf-16) encoding.