Skip to content

Releases: deepset-ai/haystack

v2.2.4

04 Jul 14:42
Compare
Choose a tag to compare

Release Notes

v2.2.4

⚡️ Enhancement Notes

  • Added the apply_filter_policy function to standardize the application of filter policies across all document store-specific retrievers, allowing for consistent handling of initial and runtime filters based on the chosen policy (replace or merge).
  • Introduced a 'filter_policy' init parameter for both InMemoryBM25Retriever and InMemoryEmbeddingRetriever, allowing users to define how runtime filters should be applied with options to either 'replace' the initial filters or 'merge' them, providing greater flexibility in filtering query results.

🐛 Bug Fixes

  • Meta handling of bytestreams in Azure OCR has been fixed.
  • Fix some bugs running a Pipeline that has Components with conditional outputs. Some branches that were expected not to run would run anyway, even if they received no inputs. Some branches instead would cause the Pipeline to get stuck waiting to run that branch, even if they received no inputs. The behaviour would depend whether the Component not receiving the input has a optional input or not.

v2.2.4-rc1

03 Jul 11:49
Compare
Choose a tag to compare
v2.2.4-rc1 Pre-release
Pre-release

Release Notes

v2.2.4-rc1

⚡️ Enhancement Notes

  • Added the apply_filter_policy function to standardize the application of filter policies across all document store-specific retrievers, allowing for consistent handling of initial and runtime filters based on the chosen policy (replace or merge).
  • Introduced a 'filter_policy' init parameter for both InMemoryBM25Retriever and InMemoryEmbeddingRetriever, allowing users to define how runtime filters should be applied with options to either 'replace' the initial filters or 'merge' them, providing greater flexibility in filtering query results.

🐛 Bug Fixes

  • Fix some bugs running a Pipeline that has Components with conditional outputs. Some branches that were expected not to run would run anyway, even if they received no inputs. Some branches instead would cause the Pipeline to get stuck waiting to run that branch, even if they received no inputs. The behaviour would depend whether the Component not receiving the input has a optional input or not.

v2.2.3

17 Jun 12:25
Compare
Choose a tag to compare

Release Notes

v2.2.3

🐛 Bug Fixes

  • Pin numpy<2 to avoid breaking changes that cause several core integrations to fail. Pin tenacity too (8.4.0 is broken).

⚡️ Enhancement Notes

  • Export ChatPromptBuilder in builders module

v2.2.2

17 Jun 08:43
Compare
Choose a tag to compare

Release Notes

v2.2.2

🐛 Bug Fixes

  • Add missing metrics column in DataFrame returned by EvaluationRunResult.score_report()

v2.2.2-rc1

13 Jun 14:36
Compare
Choose a tag to compare
v2.2.2-rc1 Pre-release
Pre-release

Release Notes

v2.2.2-rc1

🐛 Bug Fixes

  • Add missing metrics column in DataFrame returned by EvaluationRunResult.score_report()

v2.2.1

06 Jun 08:16
Compare
Choose a tag to compare

Release Notes

v2.2.1

⬆️ Upgrade Notes

  • trafilatura must now be manually installed with pip install trafilatura to use the HTMLToDocument Component.

⚡️ Enhancement Notes

  • Remove trafilatura as direct dependency and make it a lazily imported one

v1.26.2

07 Jun 08:36
Compare
Choose a tag to compare

Release Notes

v1.26.2

🐛 Bug Fixes

  • Export fetch_archive_from_http in utils/__init__.py

v2.2.1-rc1

05 Jun 16:10
Compare
Choose a tag to compare

Release Notes

v2.2.1-rc1

⬆️ Upgrade Notes

  • trafilatura must now be manually installed with pip install trafilatura to use the HTMLToDocument Component.

⚡️ Enhancement Notes

  • Remove trafilatura as direct dependency and make it a lazily imported one

v1.26.1

05 Jun 13:32
Compare
Choose a tag to compare

Release Notes

v1.26.1

🚀 New Features

  • Add previously removed fetch_archive_from_http util function to fetch zip and gzip archives from url

v1.26.0

04 Jun 14:08
4188bf9
Compare
Choose a tag to compare

Release Notes

v1.26.0

Prelude

We are announcing that Haystack 1.26 is the final minor release for Haystack 1.x. Although we will continue to release bug fixes for this version, we will neither be adding nor removing any functionalities. Instead, we will focus our efforts on Haystack 2.x. Haystack 1.26 will reach its end-of-life on March 11, 2025.

The utility functions fetch_archive_from_http, build_pipeline and add_example_data were removed from Haystack.

This release changes the PDFToTextConverter so that it doesn't support PyMuPDF anymore. The converter will always assume xpdf is used by default.

⬆️ Upgrade Notes

  • We recommend replacing calls to the fetch_archive_from_http function with other tools available in Python or in the operating system of use.
  • To keep using PyMuPDF you must create a custom node, you can use the previous Haystack version for inspiration.

⚡️ Enhancement Notes

  • Add raise_on_failure flag to BaseConverter class so that big processes can optionally continue without breaking from exceptions.

  • Support for Llama3 models on AWS Bedrock.

  • Support for MistralAI and new Claude 3 models on AWS Bedrock.

  • Upgrade Transformers to the latest version 4.37.2. This version adds support for the Phi-2 and Qwen2 models and improves support for quantization.

  • Upgrade transformers to version 4.39.3 so that Haystack can support the new Cohere Command R models.

  • Add support for latest OpenAI embedding models text-embedding-3-large and text-embedding-3-small.

  • API_BASE can now be passed as an optional parameter in the getting_started sample. Only openai provider is supported in this set of changes. PromptNode and PromptModel were enhanced to allow passing of this parameter. This allows RAG against a local endpoint (e.g, http:https://localhost:1234/v1), so long as it is OpenAI compatible (such as LM Studio)

    Logging in the getting started sample was made more verbose, to make it easier for people to see what was happening under the covers.

  • Added new option split_by="page" to the preprocessor so we can chunk documents by page break.

  • Review and update context windows for OpenAI GPT models.

  • Support gated repos for Huggingface inference.

  • Add a check to verify that the embedding dimension set in the FAISS Document Store and retriever are equal before running embedding calculations.

🐛 Bug Fixes

  • Pipeline run error when using the FileTypeClassifier with the raise_on_error: True option. Instead of returning an unexpected NoneType, we route the file to a dead-end edge.

  • Ensure that the crawled files are downloaded to the output_dir directory, as specified in the Crawler constructor. Previously, some files were incorrectly downloaded to the current working directory.

  • Fixes SearchEngineDocumentStore.get_metadata_values_by_key method to make use of self.index if no index is provided.

  • Fixes OutputParser usage in PromptTemplate after making invocation context immutable in #7510.

  • When using a Pipeline with a JoinNode (e.g. JoinDocuments) all information from the previous nodes was lost other than a few select fields (e.g. documents). This was due to the JoinNode not properly passing on the information from the previous nodes. This has been fixed and now all information from the previous nodes is passed on to the next node in the pipeline.

    For example, this is a pipeline that rewrites the query during pipeline execution combined with a hybrid retrieval setup that requires a JoinDocuments node. Specifically the first prompt node rewrites the query to fix all spelling errors, and this new query is used for retrieval. And now the JoinDocuments node will now pass on the rewritten query so it can be used by the QAPromptNode node whereas before it would pass on the original query. `python from haystack import Pipeline from haystack.nodes import BM25Retriever, EmbeddingRetriever, PromptNode, Shaper, JoinDocuments, PromptTemplate from haystack.document_stores import InMemoryDocumentStore document_store = InMemoryDocumentStore(use_bm25=True) dicts = [{"content": "The capital of Germany is Berlin."}, {"content": "The capital of France is Paris."}] document_store.write_documents(dicts) query_prompt_node = PromptNode( model_name_or_path="gpt-3.5-turbo", api_key="", default_prompt_template=PromptTemplate("You are a spell checker. Given a user query return the same query with all spelling errors fixed.\nUser Query: {query}\nSpell Checked Query:") ) shaper = Shaper( func="join_strings", inputs={"strings": "results"}, outputs=["query"], ) qa_prompt_node = PromptNode( model_name_or_path="gpt-3.5-turbo", api_key="", default_prompt_template=PromptTemplate("Answer the user query. Query: {query}") ) sparse_retriever = BM25Retriever( document_store=document_store, top_k=2 ) dense_retriever = EmbeddingRetriever( document_store=document_store, embedding_model="intfloat/e5-base-v2", model_format="sentence_transformers", top_k=2 ) document_store.update_embeddings(dense_retriever) pipeline = Pipeline() pipeline.add_node(component=query_prompt_node, name="QueryPromptNode", inputs=["Query"]) pipeline.add_node(component=shaper, name="ListToString", inputs=["QueryPromptNode"]) pipeline.add_node(component=sparse_retriever, name="BM25", inputs=["ListToString"]) pipeline.add_node(component=dense_retriever, name="Embedding", inputs=["ListToString"]) pipeline.add_node( component=JoinDocuments(join_mode="concatenate"), name="Join", inputs=["BM25", "Embedding"] ) pipeline.add_node(component=qa_prompt_node, name="QAPromptNode", inputs=["Join"]) out = pipeline.run(query="What is the captial of Grmny?", debug=True) print(out["invocation_context"]) # Before Fix # {'query': 'What is the captial of Grmny?', <-- Original Query!! # 'results': ['The capital of Germany is Berlin.'], # 'prompts': ['Answer the user query. Query: What is the captial of Grmny?'], <-- Original Query!! # After Fix # {'query': 'What is the capital of Germany?', <-- Rewritten Query!! # 'results': ['The capital of Germany is Berlin.'], # 'prompts': ['Answer the user query. Query: What is the capital of Germany?'], <-- Rewritten Query!!`

  • When passing empty inputs (such as query="") to PromptNode, the node would raise an error. This has been fixed.

  • Change the dummy vector used internally in the Pinecone Document Store. A recent change to the Pinecone API does not allow to use vectors filled with zeros as was the previous dummy vector.

  • The types of meta data values accepted by RouteDocuments was unnecessarily restricted to string types. This causes validation errors (for example when loading from a yaml file) if a user tries to use a boolean type for example. We add boolean and int types as valid types for metadata_values.

  • Fixed a bug that made it impossible to write Documents to Weaviate when some of the fields were empty lists (e.g. split_overlap for preprocessed documents).