Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pipeline's YAML: syntax validation #2226

Merged
merged 177 commits into from
Mar 15, 2022
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
Show all changes
177 commits
Select commit Hold shift + click to select a range
c51480f
Add BasePipeline.validate_config, BasePipeline.validate_yaml, and som…
ZanSara Feb 21, 2022
38246af
Update Documentation & Code Style
github-actions[bot] Feb 21, 2022
0183a94
Rephrase docstring
ZanSara Feb 21, 2022
3c225ba
Update Documentation & Code Style
github-actions[bot] Feb 21, 2022
64cb96b
Make error composition work properly
ZanSara Feb 21, 2022
c12d595
Merge branch 'yaml_validation' of github.com:deepset-ai/haystack into…
ZanSara Feb 21, 2022
c8a410b
Update Documentation & Code Style
github-actions[bot] Feb 21, 2022
2ca5afc
Clarify typing
ZanSara Feb 21, 2022
6dd894c
Merge branch 'yaml_validation' of github.com:deepset-ai/haystack into…
ZanSara Feb 21, 2022
e239543
Help mypy a bit more
ZanSara Feb 21, 2022
7fbb9d5
Update Documentation & Code Style
github-actions[bot] Feb 21, 2022
282be4a
Enable autogenerated docs for Milvus1 and 2 separately
ZanSara Feb 24, 2022
74648f0
Revert "Enable autogenerated docs for Milvus1 and 2 separately"
ZanSara Feb 24, 2022
eca1835
Merge branch 'master' into yaml_validation
ZanSara Feb 24, 2022
47714cd
Update Documentation & Code Style
github-actions[bot] Feb 24, 2022
4a17b31
Re-enable 'additionalProperties: False'
ZanSara Feb 24, 2022
90f8e82
Add pipeline.type to JSON Schema, was somehow forgotten
ZanSara Feb 24, 2022
25d9cf5
Disable additionalProperties on the pipeline properties too
ZanSara Feb 24, 2022
a265bf7
Fix json-schemas for 1.1.0 and 1.2.0 (should not do it again in the f…
ZanSara Feb 24, 2022
6e1a5fd
Merge branch 'yaml_validation' of github.com:deepset-ai/haystack into…
ZanSara Feb 24, 2022
57580cb
Cal super in PipelineValidationError
ZanSara Feb 24, 2022
ec4422a
Improve _read_pipeline_config_from_yaml's error handling
ZanSara Feb 24, 2022
4f7efa5
Fix generate_json_schema.py to include document stores
ZanSara Feb 24, 2022
05760d5
Fix json schemas (retro-fix 1.1.0 again)
ZanSara Feb 24, 2022
d722e15
Improve custom errors printing, add link to docs
ZanSara Feb 24, 2022
f19d7be
Add function in BaseComponent to list its subclasses in a module
ZanSara Feb 24, 2022
a05965c
Make some document stores base classes abstract
ZanSara Feb 24, 2022
ff839ff
Add marker 'integration' in pytest flags
ZanSara Feb 24, 2022
0aee894
Slighly improve validation of pipelines at load
ZanSara Feb 24, 2022
53ac2ef
Adding tests for YAML loading and validation
ZanSara Feb 24, 2022
2efc524
Make custom_query Optional for validation issues
ZanSara Feb 24, 2022
fe55af9
Fix bug in _read_pipeline_config_from_yaml
ZanSara Feb 24, 2022
0e151c2
Improve error handling in BasePipeline and Pipeline and add DAG check
ZanSara Feb 25, 2022
3c0c626
Move json schema generation into haystack/nodes/_json_schema.py (usef…
ZanSara Feb 25, 2022
04bec37
Simplify errors slightly
ZanSara Feb 25, 2022
44b906c
Add some YAML validation tests
ZanSara Feb 25, 2022
d2025cb
Remove load_from_config from BasePipeline, it was never used anyway
ZanSara Feb 25, 2022
c2b0896
Improve tests
ZanSara Feb 25, 2022
23d430d
Include json-schemas in package
ZanSara Feb 25, 2022
a687244
Fix conftest imports
ZanSara Feb 25, 2022
2990277
Make BasePipeline abstract
ZanSara Feb 25, 2022
aa5c9a8
Improve mocking by making the test independent from the YAML version
ZanSara Feb 25, 2022
699fe30
Merge branch 'master' into yaml_validation
ZanSara Feb 25, 2022
dcd64d6
Add exportable_to_yaml decorator to forget about set_config on mock n…
ZanSara Feb 25, 2022
58fbf08
Merge branch 'yaml_validation' of github.com:deepset-ai/haystack into…
ZanSara Feb 25, 2022
70e4c51
Fix mypy errors
ZanSara Feb 25, 2022
e7b3151
Comment out one monkeypatch
ZanSara Feb 25, 2022
98a9803
Fix typing again
ZanSara Feb 25, 2022
9ec74ed
Improve error message for validation
ZanSara Feb 28, 2022
7dbea65
Add required properties to pipelines
ZanSara Mar 1, 2022
5ed37df
Fix YAML version for REST API YAMLs to 1.2.0
ZanSara Mar 1, 2022
922e75f
Fix load_from_yaml call in load_from_deepset_cloud
ZanSara Mar 1, 2022
f8a984e
fix HaystackError.__getattr__
ZanSara Mar 1, 2022
8eb71e8
Add super().__init__()in most nodes and docstore, comment set_config
ZanSara Mar 1, 2022
4721843
Remove type from REST API pipelines
ZanSara Mar 1, 2022
cede763
Remove useless init from doc2answers
ZanSara Mar 1, 2022
2b5d3ef
Call super in Seq3SeqGenerator
ZanSara Mar 1, 2022
e7f6a16
Typo in deepsetcloud.py
ZanSara Mar 1, 2022
741846b
Fix rest api indexing error mismatch and mock version of JSON schema …
ZanSara Mar 1, 2022
2aac9f2
Working on pipeline tests
ZanSara Mar 3, 2022
4863f4f
Improve errors printing slightly
ZanSara Mar 3, 2022
770c614
Add back test_pipeline.yaml
ZanSara Mar 3, 2022
4342c8a
_json_schema.py supports different versions with identical schemas
ZanSara Mar 3, 2022
84317af
Add type to 0.7 schema for backwards compatibility
ZanSara Mar 3, 2022
4aa9aea
Merge branch 'master' into yaml_validation
ZanSara Mar 3, 2022
f78d839
Fix small bug in _json_schema.py
ZanSara Mar 3, 2022
a90bb21
Try alternative to generate json schemas on the CI
ZanSara Mar 3, 2022
16354a7
Update Documentation & Code Style
github-actions[bot] Mar 3, 2022
08bf24d
Make linux CI match autoformat CI
ZanSara Mar 3, 2022
0d90314
Fix super-init-not-called
ZanSara Mar 3, 2022
836138f
Merge branch 'yaml_validation' of github.com:deepset-ai/haystack into…
ZanSara Mar 3, 2022
e5e2b5d
Accidentally committed file
ZanSara Mar 3, 2022
37a3ffa
Update Documentation & Code Style
github-actions[bot] Mar 3, 2022
f53d6f1
fix test_summarizer_translation.py's import
ZanSara Mar 3, 2022
d0e1374
Mock YAML in a few suites, split and simplify test_pipeline_debug_and…
ZanSara Mar 3, 2022
7d6b696
Fix json schema for ray tests too
ZanSara Mar 3, 2022
693ef14
Merge branch 'yaml_validation' of github.com:deepset-ai/haystack into…
ZanSara Mar 3, 2022
314177f
Update Documentation & Code Style
github-actions[bot] Mar 3, 2022
3fa05ef
Merge branch 'master' into yaml_validation
ZanSara Mar 3, 2022
378ccfa
Merge branch 'yaml_validation' of github.com:deepset-ai/haystack into…
ZanSara Mar 3, 2022
8ba6575
Reintroduce validation
ZanSara Mar 3, 2022
8e3f8a7
Usa unstable version in tests and rest api
ZanSara Mar 3, 2022
b48e53e
Make unstable support the latest versions
ZanSara Mar 3, 2022
9fe3f80
Update Documentation & Code Style
github-actions[bot] Mar 3, 2022
dfb50a2
Remove needless fixture
ZanSara Mar 3, 2022
6b1f1a3
Remove needless fixture
ZanSara Mar 3, 2022
7fe7c20
Merge branch 'master' into yaml_validation
ZanSara Mar 3, 2022
0be3021
Make type in pipeline optional in the strings validation
ZanSara Mar 3, 2022
9084d3d
Merge branch 'yaml_validation' of github.com:deepset-ai/haystack into…
ZanSara Mar 3, 2022
69c5084
Fix schemas
ZanSara Mar 3, 2022
745ebd4
Fix string validation for pipeline type
ZanSara Mar 3, 2022
5ad6680
Improve validate_config_strings
ZanSara Mar 3, 2022
ace938d
Remove type from test p[ipelines
ZanSara Mar 3, 2022
019769c
Update Documentation & Code Style
github-actions[bot] Mar 3, 2022
a5ae5e1
Fix test_pipeline
ZanSara Mar 3, 2022
9f584f0
Removing more type from pipelines
ZanSara Mar 7, 2022
9a1f064
Merge branch 'yaml_validation' of github.com:deepset-ai/haystack into…
ZanSara Mar 7, 2022
4aecb69
Temporary CI patc
ZanSara Mar 7, 2022
9d46ba0
Fix issue with exportable_to_yaml never invoking the wrapped init
ZanSara Mar 7, 2022
f07d4c6
rm stray file
ZanSara Mar 7, 2022
70342c0
pipeline tests are green again
ZanSara Mar 7, 2022
8b1e136
Merge branch 'master' into yaml_validation
ZanSara Mar 7, 2022
59e8603
Linux CI now needs .[all] to generate the schema
ZanSara Mar 8, 2022
bb1bda6
Merge branch 'master' into yaml_validation
ZanSara Mar 8, 2022
cd8c8b0
Bugfixes, pipeline tests seems to be green
ZanSara Mar 8, 2022
9227f83
Merge branch 'yaml_validation' of github.com:deepset-ai/haystack into…
ZanSara Mar 8, 2022
4f8c3ee
Typo in version after merge
ZanSara Mar 8, 2022
29c2079
Implement missing methods in Weaviate
ZanSara Mar 8, 2022
6f7564f
Trying to avoid FAISS tests from running in the Milvus1 test suite
ZanSara Mar 8, 2022
5477a10
Fix some stray test paths and faiss index dumping
ZanSara Mar 8, 2022
a406a0c
Fix pytest markers list
ZanSara Mar 8, 2022
94d3c1f
Temporarily disable cache to be able to see tests failures
ZanSara Mar 8, 2022
283cad3
Fix pyproject.toml syntax
ZanSara Mar 8, 2022
e59e848
Use only tmp_path
ZanSara Mar 8, 2022
8330466
Merge branch 'master' into yaml_validation
ZanSara Mar 8, 2022
31e6874
Fix preprocessor signature after merge
ZanSara Mar 8, 2022
fe2fcbf
Fix faiss bug
ZanSara Mar 8, 2022
0868ad5
Fix Ray test
ZanSara Mar 8, 2022
9c4bffc
Fix documentation issue by removing quotes from faiss type
ZanSara Mar 9, 2022
4afe3f1
Merge branch 'master' into yaml_validation
ZanSara Mar 9, 2022
f1408e4
Update Documentation & Code Style
github-actions[bot] Mar 9, 2022
de4566a
use document properly in preprocessor tests
ZanSara Mar 9, 2022
63be81c
Update Documentation & Code Style
github-actions[bot] Mar 9, 2022
424c833
make preprocessor capable of handling documents
ZanSara Mar 9, 2022
861d270
import document
ZanSara Mar 9, 2022
ac5833b
Revert support for documents in preprocessor, do later
ZanSara Mar 9, 2022
a375b66
Fix bug in _json_schema.py that was breaking validation
ZanSara Mar 9, 2022
8a5be32
Merge branch 'master' into yaml_validation
ZanSara Mar 10, 2022
078f22b
re-enable cache
ZanSara Mar 10, 2022
8379f97
Update Documentation & Code Style
github-actions[bot] Mar 10, 2022
2c46b30
Simplify calling _json_schema.py from the CI
ZanSara Mar 10, 2022
4165670
Remove redundant ABC inheritance
ZanSara Mar 10, 2022
60ea88a
Ensure exportable_to_yaml works only on implementations
ZanSara Mar 10, 2022
2d12ecb
Rename subclass to class_ in Meta
ZanSara Mar 10, 2022
3e0bf77
Make run() and get_config() abstract in BasePipeline
ZanSara Mar 10, 2022
1eea0b6
Revert unintended change in preprocessor
ZanSara Mar 10, 2022
ce27da0
Move outgoing_edges_input_node check inside try block
ZanSara Mar 10, 2022
a4c0f37
Rename VALID_CODE_GEN_INPUT_REGEX into VALID_INPUT_REGEX
ZanSara Mar 10, 2022
8bfc579
Add check for a RecursionError on validate_config_strings
ZanSara Mar 10, 2022
9b558d4
Address usages of _pipeline_config in data silo and elasticsearch
ZanSara Mar 11, 2022
06eb04f
Rename _pipeline_config into _init_parameters
ZanSara Mar 11, 2022
70afc9a
Fix pytest marker and remove unused imports
ZanSara Mar 11, 2022
ac76c1f
Remove most redundant ABCs
ZanSara Mar 11, 2022
3a8a8e0
Rename _init_parameters into _component_configuration
ZanSara Mar 11, 2022
3a195eb
Remove set_config and type from _component_configuration's dict
ZanSara Mar 11, 2022
6bb5dff
Remove last instances of set_config and replace with super().__init__()
ZanSara Mar 11, 2022
3fb5248
Implement __init_subclass__ approach
ZanSara Mar 11, 2022
27f286f
Simplify checks on the existence of _component_configuration
ZanSara Mar 11, 2022
4e9ed19
Fix faiss issue
ZanSara Mar 11, 2022
a259a79
Dynamic generation of node schemas & weed out old schemas
ZanSara Mar 11, 2022
010f965
Add debatable test
ZanSara Mar 11, 2022
5492d3d
Add docstring to debatable test
ZanSara Mar 11, 2022
cbd3b40
Positive diff between schemas implemented
ZanSara Mar 11, 2022
9779220
Improve diff printing
ZanSara Mar 11, 2022
9334d86
Rename REST API YAML files to trigger IDE validation
ZanSara Mar 14, 2022
438261b
Fix typing issues
ZanSara Mar 14, 2022
15a1495
Fix more typing
ZanSara Mar 14, 2022
ae6d533
Typo in YAML filename
ZanSara Mar 14, 2022
007f9fd
Remove needless type:ignore
ZanSara Mar 14, 2022
1d3caee
Add tests
ZanSara Mar 14, 2022
52ea72c
Fix tests & validation feedback for accessory classes in custom nodes
ZanSara Mar 14, 2022
c4bf4f3
Refactor RAGeneratorType out
ZanSara Mar 14, 2022
3878aec
Fix broken import in conftest
ZanSara Mar 14, 2022
365a855
Improve source error handling
ZanSara Mar 14, 2022
a70fa02
Remove unused import in test_eval.py breaking tests
ZanSara Mar 14, 2022
f2e34f0
Fix changed error message in tests matches too
ZanSara Mar 14, 2022
4468fea
Normalize generate_openapi_specs.py and generate_json_schema.py in th…
ZanSara Mar 14, 2022
4124544
Fix path to generate_openapi_specs.py in autoformat.yml
ZanSara Mar 14, 2022
e400831
Update Documentation & Code Style
github-actions[bot] Mar 14, 2022
2d56a71
Add test for FAISSDocumentStore-like situations (superclass with init…
ZanSara Mar 14, 2022
90cfdcf
Update Documentation & Code Style
github-actions[bot] Mar 14, 2022
4301335
Fix indentation
ZanSara Mar 14, 2022
70cd8b2
Merge branch 'yaml_validation' of github.com:deepset-ai/haystack into…
ZanSara Mar 14, 2022
1475f3e
Remove commented set_config
ZanSara Mar 15, 2022
70b27ab
Store model_name_or_path in FARMReader to use in DistillationDataSilo
ZanSara Mar 15, 2022
f568459
Rename _component_configuration into _component_config
ZanSara Mar 15, 2022
b78a6c8
Update Documentation & Code Style
github-actions[bot] Mar 15, 2022
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
Refactor RAGeneratorType out
  • Loading branch information
ZanSara committed Mar 14, 2022
commit c4bf4f34fdba656b94b418220f13662c78f8635a
25 changes: 3 additions & 22 deletions haystack/nodes/answer_generator/transformers.py
Original file line number Diff line number Diff line change
Expand Up @@ -23,11 +23,6 @@
logger = logging.getLogger(__name__)


class RAGeneratorType(Enum):
TOKEN = (1,)
SEQUENCE = 2


class RAGenerator(BaseGenerator):
"""
Implementation of Facebook's Retrieval-Augmented Generator (https://arxiv.org/abs/2005.11401) based on
Expand Down Expand Up @@ -76,7 +71,7 @@ def __init__(
model_name_or_path: str = "facebook/rag-token-nq",
model_version: Optional[str] = None,
retriever: Optional[DensePassageRetriever] = None,
generator_type: RAGeneratorType = RAGeneratorType.TOKEN,
generator_type: str = "token",
top_k: int = 2,
max_length: int = 200,
min_length: int = 2,
Expand All @@ -94,7 +89,7 @@ def __init__(
See https://huggingface.co/models for full list of available models.
:param model_version: The version of model to use from the HuggingFace model hub. Can be tag name, branch name, or commit hash.
:param retriever: `DensePassageRetriever` used to embedded passages for the docs passed to `predict()`. This is optional and is only needed if the docs you pass don't already contain embeddings in `Document.embedding`.
:param generator_type: Which RAG generator implementation to use? RAG-TOKEN or RAG-SEQUENCE
:param generator_type: Which RAG generator implementation to use ("token" or "sequence")
:param top_k: Number of independently generated text to return
:param max_length: Maximum length of generated text
:param min_length: Minimum length of generated text
Expand All @@ -104,20 +99,6 @@ def __init__(
:param use_gpu: Whether to use GPU. Falls back on CPU if no GPU is available.
"""
super().__init__()
# save init parameters to enable export of component config as YAML
# self.set_config(
# model_name_or_path=model_name_or_path,
# model_version=model_version,
# retriever=retriever,
# generator_type=generator_type,
# top_k=top_k,
# max_length=max_length,
# min_length=min_length,
# num_beams=num_beams,
# embed_title=embed_title,
# prefix=prefix,
# use_gpu=use_gpu,
# )

self.model_name_or_path = model_name_or_path
self.max_length = max_length
Expand All @@ -138,7 +119,7 @@ def __init__(

self.tokenizer = RagTokenizer.from_pretrained(model_name_or_path)

if self.generator_type == RAGeneratorType.SEQUENCE:
if self.generator_type == "sequence":
raise NotImplementedError("RagSequenceForGeneration is not implemented yet")
# TODO: Enable when transformers have it. Refer https://github.com/huggingface/transformers/issues/7905
# Also refer refer https://github.com/huggingface/transformers/issues/7829
Expand Down
2 changes: 1 addition & 1 deletion test/conftest.py
Original file line number Diff line number Diff line change
Expand Up @@ -378,7 +378,7 @@ def deepset_cloud_document_store(deepset_cloud_fixture):

@pytest.fixture(scope="function")
def rag_generator():
return RAGenerator(model_name_or_path="facebook/rag-token-nq", generator_type=RAGeneratorType.TOKEN, max_length=20)
return RAGenerator(model_name_or_path="facebook/rag-token-nq", generator_type="token", max_length=20)


@pytest.fixture(scope="function")
Expand Down