Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bpo-32002: Refactor C locale coercion tests #4369

Merged

Conversation

ncoghlan
Copy link
Contributor

@ncoghlan ncoghlan commented Nov 11, 2017

Exactly which locale requests will end up giving
you the "C" locale is actually platform dependent.

A blank locale and "POSIX" will translate to "C"
on most Linux distros, but may not do so on other
platforms, so this adjusts the way the tests are
structured accordingly.

This may also prove sufficient to fix a current
test failure on Cygwin (hence the issue reference)

https://bugs.python.org/issue32002

@@ -15,27 +15,36 @@
interpreter_requires_environment,
)

# Set the list of ways we expect to be able to ask for the "C" locale
EXPECTED_C_LOCALE_EQUIVALENTS = ["C", "invalid.ascii"]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think "" should be included in this list otherwise the case where all the locale envt variables are set to "" may not be tested. On Android it is only the tests that have coerce_c_locale="warn" that must be skipped when all locale envt variables are set to "".

There is currently no test run for when none of these variables is defined and AFAIK it seems correct since it is equivalent to having all locale envt variables set to "". But since it is such a common case, testing with all locale envt variables set to "" should be mandatory for all platforms.

Maybe EXPECTED_C_LOCALE_EQUIVALENTS could be replaced with LOCALES_TO_TEST.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That makes sense. If we go that way, then what I'd actually suggest we do is give the empty locale a dedicated test case, separate from the C locale equivalents.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And a dedicated test case for the empty locale would be more explicit.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 to a dedicated test case for empty locale.

@@ -45,7 +54,7 @@
# There's no reliable cross-platform way of checking locale alias
# lists, so the only way of knowing which of these locales will work
# is to try them with locale.setlocale(). We do that in a subprocess
# to avoid altering the locale of the test runner.
# in setupModule() below to avoid altering the locale of the test runner.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/setupModule/setUpModule/

@@ -215,7 +224,8 @@ def _check_child_encoding_details(self,
class LocaleConfigurationTests(_LocaleHandlingTestCase):
# Test explicit external configuration via the process environment

def setUpClass():
@classmethod
def setUpClass(cls):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the comment below:
s/setupModule/setUpModule/

@ncoghlan
Copy link
Contributor Author

ncoghlan commented Nov 12, 2017

OK, I've factored out the default case to its own subtest. @embray, would you mind trying this patch out on Cygwin and seeing how it breaks?

I'm thinking the right order to do things will be to get #4334 merged for Android, rebase this test case change on top of that one, and then figure out a reasonable way to make the expected behaviour in the default locale automatically adjust itself (without making the test case tautological).

Copy link
Contributor

@xdegaye xdegaye left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The tests are successfull on Android when the following changes are made.

#
# Options for dealing with this:
# * Don't set PYTHON_COERCE_C_LOCALE on such platforms (e.g. Windows doesn't)
# * Don't set the PYTHON_COERCE_C_LOCALE preprocessor definition on
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/PYTHON_COERCE_C_LOCALE/PY_COERCE_C_LOCALE/

# Check behaviour for the default locale
with self.subTest(default_locale=True,
PYTHONCOERCECLOCALE=coerce_c_locale):
var_dict = base_var_dict.copy()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

base_var_dict can be used directly without a copy here.


# Check behaviour for the default locale
with self.subTest(default_locale=True,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This block can only be run if the following conditional is true, otherwise some test results may not match the expected results:

if set(base_var_dict).isdisjoint(extra_vars):

For clarity, this statement written above may be moved down, when checking for explicitly configured locales:

base_var_dict.update(extra_vars)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The expectation is that the test case that passes in extra_vars will also set expected_warnings and coercion_expected appropriately.

The current use case is the tests that sets LC_ALL='C', and expects that to both disable coercion, and potentially result in a warning (for the PYTHONCOERCECLOCALE=warn case).

@bedevere-bot
Copy link

A Python core developer has requested some changes be made to your pull request before we can consider merging it. If you could please address their requests along with any other requests in other reviews from core developers that would be appreciated.

Once you have made the requested changes, please leave a comment on this pull request containing the phrase I have made the requested changes; please review again. I will then notify any core developers who have left a review that you're ready for them to take another look at this pull request.

And if you don't make the requested changes, you will be put in the comfy chair!

@embray
Copy link
Contributor

embray commented Nov 13, 2017

I also think that this is one of those cases where seemingly tautological test cases may still be worth running, due to non-trivial interactions between the interpreter, the environment, libc, libintl, etc...

@ncoghlan
Copy link
Contributor Author

I'm actually starting to wonder if we should put together a configure check that reports back the result of:

  • setting LC_ALL, LC_CTYPE, and LANG to the empty string
  • calling setlocale(LC_CTYPE, "");
  • calling setlocale(LC_CTYPE, NULL);

That way we could query sysconfig to find out what the build platform did. My main concern with that idea is that it could get complicated when it came to cross-compilation support, so plan B would be to write a short Programs/_querydefaultlocale.c test helper that did the same thing.

@embray
Copy link
Contributor

embray commented Nov 13, 2017

This test still fails on Cygwin because it assumes that by default the locale will be "C" and that coercion will happen (which results in the LC_CTYPE environment variable getting set). So these failures look something like:

======================================================================
FAIL: test_PYTHONCOERCECLOCALE_not_set (test.test_c_locale_coercion.LocaleCoercionTests) (default_locale=True, PYTHONCOERCECLOCALE=None)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/embray/src/python/cpython/Lib/test/test_c_locale_coercion.py", line 325, in _check_c_locale_coercion
    coercion_expected)
  File "/home/embray/src/python/cpython/Lib/test/test_c_locale_coercion.py", line 221, in _check_child_encoding_details
    self.assertEqual(encoding_details, expected_details)
AssertionError: {'fse[124 chars]slashreplace', 'lang': '', 'lc_ctype': '', 'lc_all': ''} != {'fse[124 chars]slashreplace', 'lang': '', 'lc_ctype': 'c.utf-8', 'lc_all': ''}
  {'fsencoding': 'utf-8',
   'lang': '',
   'lc_all': '',
-  'lc_ctype': '',
+  'lc_ctype': 'c.utf-8',
?               +++++++

   'stderr_info': 'utf-8:backslashreplace',
   'stdin_info': 'utf-8:surrogateescape',
   'stdout_info': 'utf-8:surrogateescape'}

In #4361 I just skipped the cases didn't explicitly set one of the LC_ vars to "C" or "POSIX" since otherwise the assumptions in the test of when coercion should occur didn't hold up.

@embray
Copy link
Contributor

embray commented Nov 13, 2017

@ncoghlan I think the configure-time check is a good idea. It could still be disabled when cross-compiling, in which case a fallback is needed. But it might be useful to have as a first pass, especially for use by the test suite.

@embray
Copy link
Contributor

embray commented Nov 13, 2017

On problem I have noticed, at least on Cygwin, is that setlocale() actually behaves differently depending on whether or not one compiles with libintl 😢. That's because libintl has its own wrapper around setlocale() with different behavior.

@xdegaye
Copy link
Contributor

xdegaye commented Nov 13, 2017

I'm actually starting to wonder if we should put together a configure check

I think we should limit the use of configure for setting the platform type instead and hard code the specific expectations for a platform in test_c_locale_coercion.py. The platform type is needed elsewhere.

For example Android had an entry in the sysconfig dictionary at one time (issue 27442) which has been replaced later with the new sys.getandroidapilevel() function (issue 28740) to modify _bootlocale.py that needs to know very early in the startup sequence whether the platform is Android (issue 28596). Knowing that the platform is Android is also needed in many tests and in subprocess.py because its shell is /system/bin/sh.

@xdegaye
Copy link
Contributor

xdegaye commented Nov 13, 2017

This test still fails on Cygwin because it assumes that by default the locale will be "C"

@embray what are the test results if you replace if test.support.is_android: with if True: in test_c_locale_coercion.py with the current implementation of this PR already applied ?

@embray
Copy link
Contributor

embray commented Nov 13, 2017

@xdegaye That fails all the same on Cygwin. The difference between Cygwin and Android here are different. On Cygwin the meaning of the "C" locale is as in POSIX--it means ASCII encoding. The only issue on Cygwin is that the system default locale is "C.UTF-8" instead of just "C" as the tests currently assume.

In other words, _Py_LegacyLocaleDetected() is returning 0 on Cygwin in cases where the tests assume it returns 1.

Exactly which locale requests will end up giving
you the "C" locale is actually platform dependent.

A blank locale and "POSIX" will translate to "C"
on most Linux distros, but may not do so on other
platforms, so this adjusts the way the tests are
structured accordingly.

This may also prove sufficient to fix a current
test failure on Cygwin (hence the issue reference)
@ncoghlan ncoghlan force-pushed the bpo-32002-refactor-locale-coercion-test-cases branch from 90b8bb5 to 05abe72 Compare December 9, 2017 08:41
@ncoghlan
Copy link
Contributor Author

ncoghlan commented Dec 9, 2017

@xdegaye @embray I'm going to merge this once the current CI run finishes (so I can build a patch for https://bugs.python.org/issue30672 on top of it), but I'll leave the tracker issue as "pending" until you've confirmed that this works as intended on Cygwin and Android

@ncoghlan
Copy link
Contributor Author

ncoghlan commented Dec 9, 2017

Although it would help not to overwrite the Android fixes when rebasing the patch :)

So change of plan: I'll hold off on merging until you've both given the thumbs up. If it looks good, @xdegaye can merge it, otherwise I'll deal with it after I get back from NZ.

@xdegaye
Copy link
Contributor

xdegaye commented Dec 13, 2017

@ncoghlan, @embray

The test_c_locale_coercion.log file is the output of running test_c_locale_coercion on Android.

test_c_locale_coercion is successful when the android-bpo-32002.diff.txt patch is applied on top of Nick last commit (It has a .txt extension to trick github into accepting it as an attachment).

@ncoghlan
Copy link
Contributor Author

ncoghlan commented Dec 14, 2017 via email

# this code path is run with environ['LC_ALL'] == 'C', then
# LEGACY_LOCALE_WARNING is printed.
if (test.support.is_android and
_expected_warnings == [CLI_COERCION_WARNING]):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I need to study this a bit more carefully, but I believe this condition may apply for Cygwin as well (or anywhere that the default locale has UTF-8 encoding).

@embray
Copy link
Contributor

embray commented Dec 15, 2017

I still get the following failures on Cygwin:

======================================================================
FAIL: test_LC_ALL_set_to_C (test.test_c_locale_coercion.LocaleCoercionTests) (default_locale=True, PYTHONCOERCECLOCALE='warn')
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/embray/src/python/cpython/Lib/test/test_c_locale_coercion.py", line 344, in _check_c_locale_coercion
    _coercion_expected)
  File "/home/embray/src/python/cpython/Lib/test/test_c_locale_coercion.py", line 235, in _check_child_encoding_details
    self.assertEqual(stderr_lines, expected_warnings)
AssertionError: Lists differ: ['Python runtime initialized with LC_CTYPE[192 chars]ed.'] != []

First list contains 1 additional elements.
First extra element 0:
'Python runtime initialized with LC_CTYPE=C (a locale with default ASCII encoding), which may cause Unicode compatibility problems. Using C.UTF-8, C.utf8, or UTF-8 (if available) as alternative Unicode-compatible locales is recommended.'

+ []
- ['Python runtime initialized with LC_CTYPE=C (a locale with default ASCII '
-  'encoding), which may cause Unicode compatibility problems. Using C.UTF-8, '
-  'C.utf8, or UTF-8 (if available) as alternative Unicode-compatible locales is '
-  'recommended.']

======================================================================
FAIL: test_PYTHONCOERCECLOCALE_set_to_zero (test.test_c_locale_coercion.LocaleCoercionTests) (default_locale=True, PYTHONCOERCECLOCALE='0')
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/embray/src/python/cpython/Lib/test/test_c_locale_coercion.py", line 344, in _check_c_locale_coercion
    _coercion_expected)
  File "/home/embray/src/python/cpython/Lib/test/test_c_locale_coercion.py", line 232, in _check_child_encoding_details
    self.assertEqual(encoding_details, expected_details)
AssertionError: {'fsencoding': 'utf-8', 'stdin_info': 'utf-8:surrogateesc[123 chars]: ''} != {'fsencoding': 'ascii', 'stdin_info': 'ascii:surrogateesc[123 chars]: ''}
- {'fsencoding': 'utf-8',
?                 ^^^^^

+ {'fsencoding': 'ascii',
?                 ^^^^^

   'lang': '',
   'lc_all': '',
   'lc_ctype': '',
-  'stderr_info': 'utf-8:backslashreplace',
?                  ^^^^^

+  'stderr_info': 'ascii:backslashreplace',
?                  ^^^^^

-  'stdin_info': 'utf-8:surrogateescape',
?                 ^^^^^

+  'stdin_info': 'ascii:surrogateescape',
?                 ^^^^^

-  'stdout_info': 'utf-8:surrogateescape'}
?                  ^^^^^

+  'stdout_info': 'ascii:surrogateescape'}
?                  ^^^^^

But +1 to going ahead and merging, if you want, as this is significantly cleaned up, and I can look into addressing the final Cygwin-related bits on top of this.

@ncoghlan ncoghlan merged commit 9c19b02 into python:master Dec 16, 2017
@ncoghlan
Copy link
Contributor Author

Merged! Thanks for the reviews and updates, folks :)

@ncoghlan ncoghlan deleted the bpo-32002-refactor-locale-coercion-test-cases branch March 30, 2018 07:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants