FileNotFoundError: [Errno 2] No such file or directory: 'roberta-base-squad2/processor_config.json' #115

thaisnang · 2020-05-18T15:38:02Z

For some reason, the config file is not getting dumped in the folder. I have tried changing folder permissions but no help.

tanaysoni · 2020-05-19T07:39:48Z

Hi @thaisnang, could you provide more details on what code/tutorial you're running and the full error stack trace that you get?

thaisnang · 2020-05-19T08:53:09Z

I ran tutorial 1 with FARMReader. It ran the first time then I tried again but this time with the offline model (same model base RoBERTa). After that it always gives the following:-
05/19/2020 14:16:31 - INFO - elasticsearch - PUT https://localhost:9200/document [status:400 request:0.004s]
05/19/2020 14:16:31 - INFO - haystack.indexing.io - Found data stored in data/article_txt_got. Delete this first if you really want to fetch new data.
05/19/2020 14:16:31 - INFO - elasticsearch - POST https://localhost:9200/_count [status:200 request:0.536s]

05/19/2020 14:16:52 - INFO - elasticsearch - POST https://localhost:9200/_bulk [status:200 request:1.665s]
05/19/2020 14:16:53 - INFO - elasticsearch - POST https://localhost:9200/_bulk [status:200 request:0.399s]
05/19/2020 14:16:53 - INFO - haystack.indexing.io - Wrote 517 docs to DB
05/19/2020 14:16:53 - INFO - farm.utils - device: cuda n_gpu: 1, distributed training: False, automatic mixed precision training: None
05/19/2020 14:17:04 - WARNING - farm.modeling.language_model - Could not automatically detect from language model name what language it is.
We guess it's an ENGLISH model ...
If not: Init the language model by supplying the 'language' param.
Traceback (most recent call last):
File "Tutorial1_Basic_QA_Pipeline.py", line 123, in
reader = FARMReader(model_name_or_path="roberta-base-squad2", use_gpu=True)
File "/home/imsai/.local/lib/python3.6/site-packages/haystack/reader/farm.py", line 86, in init
doc_stride=doc_stride, num_processes=num_processes)
File "/home/imsai/.local/lib/python3.6/site-packages/farm/infer.py", line 194, in load
processor = Processor.load_from_dir(model_name_or_path)
File "/home/imsai/.local/lib/python3.6/site-packages/farm/data_handler/processor.py", line 182, in load_from_dir
config = json.load(open(processor_config_file))
FileNotFoundError: [Errno 2] No such file or directory: 'roberta-base-squad2/processor_config.json'

I tried with transformer as well it gives the following--

05/19/2020 14:22:30 - INFO - elasticsearch - PUT https://localhost:9200/document [status:400 request:0.036s]
05/19/2020 14:22:30 - INFO - haystack.indexing.io - Found data stored in data/article_txt_got. Delete this first if you really want to fetch new data.
05/19/2020 14:22:30 - INFO - elasticsearch - POST https://localhost:9200/_count [status:200 request:0.004s]
05/19/2020 14:22:30 - INFO - haystack.indexing.io - Skip writing documents since DB already contains 517 docs ... (Disable only_empty_db, if you want to add docs anyway.)
05/19/2020 14:22:38 - INFO - elasticsearch - POST https://localhost:9200/document/_search [status:200 request:0.318s]
05/19/2020 14:22:38 - INFO - haystack.retriever.elasticsearch - Got 10 candidates from retriever
05/19/2020 14:22:38 - INFO - haystack.finder - Reader is looking for detailed answer in 362347 chars ...
convert squad examples to features: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00, 7.14it/s]
add example index and unique id: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00, 5629.94it/s]
Traceback (most recent call last):
File "Tutorial1_Basic_QA_Pipeline.py", line 140, in
prediction = finder.get_answers(question="Who is the father of Arya Stark?", top_k_retriever=10, top_k_reader=5)
File "/home/imsai/.local/lib/python3.6/site-packages/haystack/finder.py", line 45, in get_answers
top_k=top_k_reader)
File "/home/imsai/.local/lib/python3.6/site-packages/haystack/reader/transformers.py", line 77, in predict
predictions = self.model(query, topk=self.n_best_per_passage)
File "/home/imsai/.local/lib/python3.6/site-packages/transformers/pipelines.py", line 1042, in call
for s, e, score in zip(starts, ends, scores)
File "/home/imsai/.local/lib/python3.6/site-packages/transformers/pipelines.py", line 1042, in
for s, e, score in zip(starts, ends, scores)
KeyError: 0

thaisnang · 2020-05-19T08:59:01Z

Note:- I did not download the RoBERTa separately I just renamed the files from the cache it automatically downloaded. I have renamed them properly that I am sure of. Hopefully, this is not affecting it.

tanaysoni · 2020-05-19T13:12:23Z

Hi @thaisnang, by default, the models are cached and are not re-downloaded on every execution. If that doesn't fit your workflow, I am curious to know more on how you plan to use the save(offline) functionality.

Here's how you can save the model of a FARMReader:

reader.inferencer.save("path-to-save")

and load it again by supplying the path:

reader = FARMReader(model_name_or_path="path-to-save")

thaisnang · 2020-05-19T13:45:55Z

Actually I saw the model was downloading again when I ran it the second time. So I thought instead of downloading every execution why don't I just copy the cached model and properly rename it and use it as an offline model. And that's what I did, it should not interfere with the function right?

thaisnang · 2020-05-19T14:58:38Z

OK, I downloaded again and this time the model did not redownload it was using the cached model.
And the model was saved as well. Thanks.

* fix draw * stray print

* Ignore some mypy errors * Fix I/O comparator * Avoid calling asdict multiple times when comparing dataclasses * Enhance component tests * Fix I/O dataclasses comparison * Use Any instead of type when expecting I/O dataclasses * Fix mypy * Change InputSocket taken_by field to sender * Remove variadics implementation * Adapt tests * Enhance docs and simplify run * Remove useless check on drawing * Add __canals_optional_inputs__ field in components * Rework a bit Pipeline._ready_to_run() * Simplify some logic * Add __canals_mandatory_inputs__ field in components * Handle pipeline loops * Fix tests * Document component state run logic * Add double loop pipeline test * Make component decorator a class * PR feedback * Add error logging when registering Component with identical names * Add 'remove' action that removes current component from Pipeline run input queue * Simplify run checks and logging * Better logging * Apply suggestions from code review Co-authored-by: ZanSara <[email protected]> * Trim whitespace * Add support for Union in Component's I/O * Remove dependencies section in marshaled pipelines * Create Component Protocol * simpler optional deps * Simplify component init wrapping and fix issue with save_init_params * Update canals/pipeline/save_load.py Co-authored-by: ZanSara <[email protected]> * Simplify functions to find I/O sockets * Fix import * change import * testing ci * testing ci * Simplify _save_init_params * testing ci * testing ci * use direct pytest call * trying to force old version for macos * list macos versions * list macos versions * disable on macos * remove extra * refactor imports * re-enable some logs * some more tests * small correction * Remove unused leftover methods * docs * update docstring * mention optionals * example for dataclass initialization * missed part * fix api docs * improve error reporting and testing * add tests for Any * parametrized tests * fix test for py<3.10 * test type printing * remove typing. prefix from Any (compat with Py3.11) * test helpers * test names * add type_is_compatible() * tests pass * more tests * add small comment * handle Unions as anything else * use sender/receiver for socket pairs * more sender/receiver renames * even more renames * split if statement * Update __about__.py * fix logic operator and add tests * Update __about__.py * Simplify imports * Move draw in pipeline module and clearly define public interface * Format pyproject.toml * Include only required files in built wheel * Move sample components out of tests * stub component class decorator * update static sample components to new API * stub * dynamic output examples * sum * add components fixed * re-add inputsocket and outputsocket creation * fix component tests * fixing tests * Add methods to set I/O dinamically * fix drawing * fix some integration tests * tests green * pylint * remove stray files * Remove default in InputSocket and add is_optional field * Fix drawing * Rework sockets string representation * Add back Component Protocol * Simplify method to get string representation of types * Remove sockets __str__ * Remove Component's I/O type checks at run time * Remove IO check in init wrapper * Update canals/utils.py Co-authored-by: Massimiliano Pippi <[email protected]> * Split __canals_io__ field in __canals_input__ and __canals_output__ * Order input and output fields * Add test to verify __canals_component__ is set * Remove empty line * Add component class factory * Fix API docs workflow failure * fix api docs * Update __about__.py * Add component from_dict and to_dict methods * Add Pipeline to_dict and from_dict * Fix components tests * Add some more tests * Change error messages * Simplify test_to_dict * Add max_loops_allowed in test_to_dict * Test non default max_loops_allowed in test_to_dict * Rework marshal_pipelines * Rework unmarshal_pipelines * Rename some stuff * allow falsy outputs * apply falsy fix to validation * add test for falsy inputs * Split _cleanup_marshalled_data into two functions * Use from_dict to deserialise component * Remove commented out code and update variable name * Add test to verify difference when unmarshaling Pipeline with duplicate names * Update marshal_pipelines docstring * update workflow * exclude tests from mypy in pre-commit hooks * add additional falsy tests * remove unnecessary import * split test into two Co-authored-by: ZanSara <[email protected]> * remove init_parameters decorator and fix assumptions * fix accumulate * stray if * Bump version to 0.5.0 * Implement generic default_to_dict and default_from_dict * Update default_to_dict docstring Co-authored-by: Massimiliano Pippi <[email protected]> * Remove all mentions of Component.defaults * Add Remainder to_dict and from_dict (#91) * Add Repeat to_dict and from_dict (#92) * Add Sum to_dict and from_dict (#93) * Add Greet to_dict and from_dict (#89) Co-authored-by: Massimiliano Pippi <[email protected]> * Rework Accumulate to_dict and from_dict (#86) Co-authored-by: Massimiliano Pippi <[email protected]> * Add to_dict and from_dict for Parity, Subtract, Double, Concatenate (#87) * Add Concatenate to_dict and from_dict * Add Double to_dict and from_dict * Add Subtract to_dict and from_dict * Add Parity to_dict and from_dict --------- Co-authored-by: Massimiliano Pippi <[email protected]> * Change _to_mermaid_text to use component serialization data (#94) * Add MergeLoop to_dict and from_dict (#90) Co-authored-by: Massimiliano Pippi <[email protected]> * Add Threshold to_dict and from_dict (#97) * Add AddFixedValue to_dict and from_dict (#88) Co-authored-by: Massimiliano Pippi <[email protected]> * Remove BaseTestComponent (#99) * Change @component decorator so it doesn't add default to_dict and from_dict (#98) * Rename some classes in tests to suppress Pytest warnings (#101) * Check Component I/O socket names are valid (#100) * Remove handling of shared component instances on Pipeline serialization (#102) * Fix docs * Bump version to 0.6.0 * Revert "Check Component I/O socket names are valid (#100)" (#103) This reverts commit 4529874. * Bump canals to 0.7.0 * Downgrade log from ERROR to DEBUG (#104) * Make to/from_dict optional (#107) * remove from/to dict from Protocol * use a default marshaller * example component with no serializers * fix linting * make it smarter * fix linting * thank you mypy protector of the dumb programmers * feat: check returned dictionary (#106) * better error message if components don't return dictionaries * add test * use factory * needless import * Update __about__.py * fix default serialization and adjust sample components accordingly (#109) * fix default serialization and adjust sample components accordingly * typo * fix pylint errors * fix: `draw` function vs init parameters (#115) * fix draw * stray print * Update version (#118) * remove extras * Revert "remove extras" This reverts commit a096ff8. * fix package name, change _parse_connection_name function name, add tests (#126) * move sockets into components package (#127) * chore: remove extras (#125) * remove extras * workflow * typo * fix: Sockets named "text/plain" or containing a "/" fail during pipeline.to_dict (#131) * don't split sockets by / * revert hashing edge keys * docs: remove missing module from docs (#132) * remove stray print (#123) * addo sockets docs (#133) * tidy up utils about types (#129) * Update canals.md (#134) * rename module in API docs * make `__canals_output__` and `__canals_input__` management consistent (#128) * make __canals_output__ and __canals_input__ management consistent and assign them to the component instance * make pylint happy * return the original type instead of the metaclass * use type checking instead of instance field * declare the actual returned type * fix after conflict resolution * remove check * Do not use a dict as intermediate format and use `Socket`s directly (#135) * do not use a dict as intermediate format and use sockets directly to simplify code and remove side effects * fix leftover from cherry-pick * move is_optional evaluation for InputSocket to post_init (#136) * re-introduce variadics to support Joiner node (#122) * move sockets into components package make __canals_output__ and __canals_input__ management consistent and assign them to the component instance do not use a dict as intermediate format and use sockets directly to simplify code and remove side effects move is_optional evaluation for InputSocket to post_init re-introduce variadics to support Joiner node restore connection-time check use custom type annotation, fix tests * fix leftovers from rebase * rename fan-in to joiner * clean up and fix typing * let inputs arrive later * address review comments * address review comments * fix docstrings * try * try * fix run input * linting * remove comments * fix pylint * bumb version to 0.9.0 (#140) * properly annotate classmethods (#139) * feat: add `Pipeline.inputs()` (#120) * add Pipeline.describe_input() * add tests * split dict and str outputs and add to error messages * tests * accepts/expects * move methods * fix tests * fix module name * tests * review feedback * Add missing typing_extensions dependency (#152) * feat: use full connection data to route I/O (#148) * fix sample components * make sum variadic * separate queue and buffer * all works but loops & variadics together * fix some tests * fix some tests * all tests green * clean up code a bit * refactor code * fix tests * fix self loops * fix reused sockets bug * add distinct loops * add distinct loops test * break out some code from run() * docstring * improve variadics drawing * black * document the deepcopy * re-arrange connection dataclass and add tests * consumer -> receiver * fix typing * move Connection-related code under component package * clean up connect() * cosmetics and typing * fix linter, make Connection a dataclass again * fix typing * add test case for #105 --------- Co-authored-by: Massimiliano Pippi <[email protected]> * feat: Add Component inputs/outputs functions (#158) * Add component inputs/outputs methods * Different impl approach * Black fixes * Rename functions to match naming in pipeline inputs/ouputs * Fix find_component_inputs, update unit tests (#162) * Fix API docs (#164) * make Variadic wrap an iterable (#163) * Add pipeline outputs method (#150) Co-authored-by: ZanSara <[email protected]> * Update __about__.py (#165) Update version to 0.10.0 * add CODEOWNERS * feat: read defaults from `run()` signature (#166) * Read defaults from run signature * simplify setting of sockets * fix test * Update sample_components/fstring.py Co-authored-by: Massimiliano Pippi <[email protected]> * Update canals/component/component.py Co-authored-by: Massimiliano Pippi <[email protected]> * dostring --------- Co-authored-by: Massimiliano Pippi <[email protected]> * Use full import path as 'type' in serialization. (#167) * Use full import path as 'type' in serialization. Try to import the path when deserializing * fix test data * add from_dict test * remove leftover * Update canals/pipeline/pipeline.py Co-authored-by: ZanSara <[email protected]> * add error message to PipelineError --------- Co-authored-by: ZanSara <[email protected]> * bump version * fix: copy input values before passing them down pipeline.run (#168) * copy input values before passing them down pipeline.run * Update test_mutable_inputs.py * fix mypy and pyright (#169) * bump version * remove data we won't keep * reformat * try * skip tests on transient code --------- Co-authored-by: Silvano Cerza <[email protected]> Co-authored-by: Silvano Cerza <[email protected]> Co-authored-by: ZanSara <[email protected]> Co-authored-by: Michel Bartels <[email protected]> Co-authored-by: ZanSara <[email protected]> Co-authored-by: Julian Risch <[email protected]> Co-authored-by: Julian Risch <[email protected]> Co-authored-by: Vladimir Blagojevic <[email protected]>

tanaysoni self-assigned this May 19, 2020

thaisnang closed this as completed May 19, 2020

masci pushed a commit that referenced this issue Nov 27, 2023

fix: draw function vs init parameters (#115)

e2f5187

* fix draw * stray print

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

FileNotFoundError: [Errno 2] No such file or directory: 'roberta-base-squad2/processor_config.json' #115

FileNotFoundError: [Errno 2] No such file or directory: 'roberta-base-squad2/processor_config.json' #115

thaisnang commented May 18, 2020

tanaysoni commented May 19, 2020

thaisnang commented May 19, 2020

thaisnang commented May 19, 2020 •

edited

Loading

tanaysoni commented May 19, 2020

thaisnang commented May 19, 2020

thaisnang commented May 19, 2020

FileNotFoundError: [Errno 2] No such file or directory: 'roberta-base-squad2/processor_config.json' #115

FileNotFoundError: [Errno 2] No such file or directory: 'roberta-base-squad2/processor_config.json' #115

Comments

thaisnang commented May 18, 2020

tanaysoni commented May 19, 2020

thaisnang commented May 19, 2020

thaisnang commented May 19, 2020 • edited Loading

tanaysoni commented May 19, 2020

thaisnang commented May 19, 2020

thaisnang commented May 19, 2020

thaisnang commented May 19, 2020 •

edited

Loading