load_schema()
is not thread-safe
#3799
Labels
P3
Low priority, leave it in the backlog
topic:pipeline
type:bug
Something isn't working
wontfix
This will not be worked on
Describe the bug
When loading multiple pipelines in parallel with the same fresh Haystack environment,
load_schema()
ofhaystack.nodes._json_schema
breaks due to race conditions. Both threads passhaystack/haystack/nodes/_json_schema.py
Line 414 in 6cd0e33
update_json_schema
. While one of them is faster the other one overrides the file, but is not finished with flushing to disk. Number one tries to read the "being-written" file and errors out with the following error:Error message
Expected behavior
Running multiple tests in parallel in one Haystack environment works without initial setup set.
Additional context
We can work around that issue by adding a setup step/fixture to all tests that forces Haystack to write the schema and silently catch the race condition error.
The race condition could also happen outside of test environments, if you for example try to load two pipelines in two different processes at the same time.
To Reproduce
Run multiple tests that load a pipeline in parallel.
FAQ Check
The text was updated successfully, but these errors were encountered: