Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

load_schema() is not thread-safe #3799

Closed
1 task done
tstadel opened this issue Jan 2, 2023 · 0 comments
Closed
1 task done

load_schema() is not thread-safe #3799

tstadel opened this issue Jan 2, 2023 · 0 comments
Labels
P3 Low priority, leave it in the backlog topic:pipeline type:bug Something isn't working wontfix This will not be worked on

Comments

@tstadel
Copy link
Member

tstadel commented Jan 2, 2023

Describe the bug
When loading multiple pipelines in parallel with the same fresh Haystack environment, load_schema() of haystack.nodes._json_schema breaks due to race conditions. Both threads pass

if not os.path.exists(schema_file_path):
into the if block and try to write a new schema file in update_json_schema. While one of them is faster the other one overrides the file, but is not finished with flushing to disk. Number one tries to read the "being-written" file and errors out with the following error:

Error message

/usr/local/lib/python3.8/dist-packages/haystack/pipelines/config.py:245: in validate_config
    validate_schema(pipeline_config=pipeline_config, strict_version_check=strict_version_check, extras=extras)
/usr/local/lib/python3.8/dist-packages/haystack/pipelines/config.py:302: in validate_schema
    schema = load_schema()
/usr/local/lib/python3.8/dist-packages/haystack/nodes/_json_schema.py:426: in load_schema
    return json.load(schema_file)
/usr/lib/python3.8/json/__init__.py:293: in load
    return loads(fp.read(),
/usr/lib/python3.8/json/__init__.py:357: in loads
    return _default_decoder.decode(s)
/usr/lib/python3.8/json/decoder.py:337: in decode
    obj, end = self.raw_decode(s, idx=_w(s, 0).end())
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

self = <json.decoder.JSONDecoder object at 0x7fe02b3712e0>, s = '', idx = 0

    def raw_decode(self, s, idx=0):
        """Decode a JSON document from ``s`` (a ``str`` beginning with
        a JSON document) and return a 2-tuple of the Python
        representation and the index in ``s`` where the document ended.
    
        This can be used to decode a JSON document from a string that may
        have extraneous data at the end.
    
        """
        try:
            obj, end = self.scan_once(s, idx)
        except StopIteration as err:
>           raise JSONDecodeError("Expecting value", s, err.value) from None
E           json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)

/usr/lib/python3.8/json/decoder.py:355: JSONDecodeError

Expected behavior
Running multiple tests in parallel in one Haystack environment works without initial setup set.

Additional context
We can work around that issue by adding a setup step/fixture to all tests that forces Haystack to write the schema and silently catch the race condition error.
The race condition could also happen outside of test environments, if you for example try to load two pipelines in two different processes at the same time.

To Reproduce
Run multiple tests that load a pipeline in parallel.

FAQ Check

@tstadel tstadel added type:bug Something isn't working topic:pipeline labels Jan 2, 2023
@masci masci added the P3 Low priority, leave it in the backlog label Jan 25, 2023
@masci masci added the wontfix This will not be worked on label Dec 18, 2023
@masci masci closed this as completed Dec 18, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
P3 Low priority, leave it in the backlog topic:pipeline type:bug Something isn't working wontfix This will not be worked on
Projects
None yet
Development

No branches or pull requests

2 participants