-
Notifications
You must be signed in to change notification settings - Fork 715
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use UTF-8 encoding for Windows? #652
Comments
Excellent case at https://utf8everywhere.org/, found thanks to microsoft/cppwinrt#269. Excellent discussion at: https://softwareengineering.stackexchange.com/questions/102205/should-utf-16-be-considered-harmful |
General solution:
How to convert the database file name:
|
The test suite was updated to check the result of |
I am a bit concerned by the thread at https://sqlite.1065341.n5.nabble.com/Filename-encoding-on-Unix-platforms-td102210.html, especially the comment by Richard Hipp in https://sqlite.1065341.n5.nabble.com/Filename-encoding-on-Unix-platforms-td102210.html#a102215. My concern is that using various forms of conversion to deal with locale-specific or platform-specific character conversions, especially on the database file name, may lead to an unpleasant surprise down the road (when someone least expects it and is least prepared for it). Further analysis may be needed to reduce the risk of such a surprise as much as possible on Windows. |
I discovered that the Windows platform version uses the
UTF-16le
internal database encoding while the other platform versions use theUTF-8
database encoding. The results of using the HEX function on TEXT string values indicate that Android/iOS WebKit Web SQL uses theUTF-8
encoding as well. I found the following official descriptions:It is very clear in those and other places that the necessary conversions are done automatically and there should be no difference between UTF-8 and UTF-16 database encoding at the API level. However I discovered some hidden Gotchas:
From some research I discovered that it is generally more efficient to store the data in UTF-8 format:
For the reasons above I think it would be beneficial to fix the Windows version to use the UTF-8 encoding by default. (The easy way is to use PRAGMA encoding right after opening the database.) The user can then change the internal database encoding using PRAGMA encoding before writing any data. (See https://www.sqlite.org/pragma.html#pragma_encoding.)
ADDITIONAL IMPORTANT READING: https://sqlite.1065341.n5.nabble.com/UTF-16-API-a-second-class-citizen-td46048.html links to https://www.joelonsoftware.com/2003/10/08/the-absolute-minimum-every-software-developer-absolutely-positively-must-know-about-unicode-and-character-sets-no-excuses/ which looks like essential reading for all serious sqlite users.
The text was updated successfully, but these errors were encountered: