Skip to content

Possible Issues with natsort.humansorted or ns.LOCALE

Seth Morton edited this page Apr 19, 2023 · 6 revisions

In addition to modifying how characters are sorted, ns.LOCALE will take into account locale-dependent thousands separators (and locale-dependent decimal separators if ns.FLOAT is enabled). This means that if you are in a locale that uses commas as the thousands separator, a number like 123,456 will be interpreted as 123456. If this is not what you want, you may consider using ns.LOCALEALPHA which will only enable locale-aware sorting for non-numbers (similarly, ns.LOCALENUM enables locale-aware sorting only for numbers).

Regenerate Key With natsort_keygen() After Changing Locale

When natsort_keygen() is called it returns a key function that hard-codes the provided settings. This means that the key returned when ns.LOCALE is used contains the settings specified by the locale loaded at the time the key is generated. If you change the locale, you should regenerate the key to account for the new locale.

Corollary: Do Not Reuse natsort_keygen() After Changing Locale

If you change locale, the old function will not work as expected. The locale library works with a global state. When natsort_keygen() is called it does the best job that it can to make the returned function as static as possible and independent of the global state, but the locale.strxfrm() function must access this global state to work; therefore, if you change locale and use ns.LOCALE then you should discard the old key.

NOTE: If you use PyICU then you may be able to reuse keys after changing locale.

The locale Module From the StdLib Has Issues

natsort will use PyICU for humansorted() or ns.LOCALE if it is installed. If not, it will fall back on the locale library from the Python stdlib. If you do not have PyICU installed, please keep the following known problems and issues in mind.

NOTE: Remember, if you have PyICU installed you shouldn't need to worry about any of these.

I have found that unless you explicitly set a locale, the sorted order may not be what you expect. Setting this is straightforward (in the below example I use 'en_US.UTF-8', but you should use your locale):

>>> import locale
>>> locale.setlocale(locale.LC_ALL, 'en_US.UTF-8')
'en_US.UTF-8'

The locale Module Is Broken on Mac OS X

It's not Python's fault, but the OS... the locale library for OSX (and possibly some other BSD systems) is broken. See the following links:

Of course, installing PyICU fixes this, but if you don't want to or cannot install this there is some hope.

  1. As of natsort version 4.0.0, natsort is configured to compensate for a broken locale library. When sorting non-numbers it will handle case as you expect, but it will still not be able to comprehend non-ASCII characters properly. Additionally, it has a built-in lookup table of thousands separators that are incorrect on OS X/BSD (but is possible it is not complete... please file an issue if you see it is not complete)
  2. Use "*.ISO8859-1" locale (i.e. 'en_US.ISO8859-1') rather than "*.UTF-8" locale. I have found that these have fewer issues than "UTF-8", but your mileage may vary.