Hacker News new | past | comments | ask | show | jobs | submit login

Surrogate pairs are only in UTF-16 so as to encode code points that require more than 16 bits. UTF-8 has no need of them because it's already a variable width encoding.

If there were no code points larger than 16 bits then UTF-8 would only need a maximum of 3 bytes per code point and UTF-16 wouldn't need surrogate pairs. Well actually UTF-16 probably wouldn't exist at all because UCS-2 would have been enough for everybody.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: