Releases: deepset-ai/haystack
v2.11.2
Release Notes
v2.11.2
Enhancement Notes
- Refactored the processing of streaming chunks from OpenAI to simplify logic.
- Added tests to ensure expected behavior when handling streaming chunks when using include_usage=True.
Bug Fixes
- Fixed issue with MistralChatGenerator not returning a finish_reason when using streaming. Fixed by adjusting how we look for the finish_reason when processing streaming chunks. Now, the last non-None finish_reason is used to handle differences between OpenAI and Mistral.
v2.11.1
Release Notes
v2.11.1
Bug Fixes
- Add dataframe to legacy fields for the Document dataclass. This fixes a bug where Document.from_dict() in haystack-ai>=2.11.0 could not properly deserialize a Document dictionary obtained with document.to_dict(flatten=False) in haystack-ai<=2.10.0.
v2.11.1-rc1
v2.11.1-rc1
v2.11.0
⭐️ Highlights
Faster Imports
With lazy importing, importing individual components now requires 50% less CPU time on average. Overall import performance has also significantly improved: for example, import haystack now consumes only 2-5% of the CPU time it previously did.
Extended Async Run Support
As of this release, all chat generators and retrievers in the core package now include a run_async method, enabling asynchronous execution at the component level. When used in an AsyncPipeline, this method runs automatically, providing native async capabilities.
New MSGToDocument Component
Use MSGToDocument to convert Microsoft Outlook .msg files into Haystack documents. This component extracts the email metadata (such as sender, recipients, CC, BCC, subject) and body content and converts any file attachments into ByteStream objects.
Turn off Validation for Pipeline Connections
Set connection_type_validation to false when initializing Pipeline to disable type validation for pipeline connections. This will allow you to connect any edges and bypass errors you might get, for example, when you connect Optional[str] output to str input.
⬆️ Upgrade Notes
-
The
ExtractedTableAnswerdataclass and thedataframefield in theDocumentdataclass, deprecated in Haystack 2.10.0, have now been removed.pandasis no longer a required dependency for Haystack, making the installation lighter. If a component you use requirespandas, an informative error will be raised, prompting you to install it. For details and motivation, see the GitHub discussion #8688. -
Starting from Haystack 2.11.0 Python 3.8 is no longer supported. Python 3.8 reached its end of life on October 2024.
-
The AzureOCRDocumentConverter no longer produces
Documentobjects with the deprecateddataframefield.Am I affected?
- If your workflow relies on the
dataframefield inDocumentobjects generated by AzureOCRDocumentConverter, you are affected. - If you saw a
DeprecationWarningin Haystack 2.10 when initializing aDocumentwith adataframe, this change will now remove that field entirely.
How to handle the change:
- Instead of storing detected tables as a
dataframe, AzureOCRDocumentConverter now represents tables as CSV-formatted text in thecontentfield of theDocument. - Update your processing logic to handle CSV-formatted tables instead of a
dataframe. If needed, you can convert the CSV text back into adataframeusingpandas.read_csv().
- If your workflow relies on the
🚀 New Features
- Add a new MSGToDocument component to convert .msg files into Haystack Document objects.
- Extracts email metadata (e.g. sender, recipients, CC, BCC, subject) and body content into a Document.
- Converts attachments into ByteStream objects which can be passed onto a FileTypeRouter + relevant converters.
- We've introduced a new type_validation parameter to control type compatibility checks in pipeline connections. It can be set to True (default) or False which means no type checks will be done and everything is allowed.
- Add
run_asyncmethod to HuggingFaceAPIChatGenerator. This method relies internally on theAsyncInferenceClientfrom huggingface to generate chat completions and supports the same parameters as therunmethod. It returns a coroutine that can be awaited. - Add
run_asyncmethod to OpenAIChatGenerator. This method internally uses the async version of the OpenAI client to generate chat completions and supports the same parameters as therunmethod. It returns a coroutine that can be awaited. - The InMemoryDocumentStore and the associated InMemoryBM25Retriever and InMemoryEmbeddingRetriever retrievers now support async mode.
- Add
run_asyncmethod to DocumentWriter. This method supports the same parameters as therunmethod and relies on the DocumentStore to implementwrite_documents_async. It returns a coroutine that can be awaited. - Add
run_asyncmethod to AzureOpenAIChatGenerator. This method usesAsyncAzureOpenAIto generate chat completions and supports the same parameters as therunmethod. It returns a coroutine that can be awaited. - Sentence Transformers components now support ONNX and OpenVINO backends through the "backend" parameter. Supported backends are torch (default), onnx, and openvino. Refer to the Sentence Transformers documentation for more information.
- Add
run_asyncmethod to HuggingFaceLocalChatGenerator. This method internally uses ThreadPoolExecutor to return coroutines that can be awaited.
⚡️ Enhancement Notes
- Improved AzureDocumentEmbedder to handle embedding generation failures gracefully. Errors are logged, and processing continues with the remaining batches.
- In the FileTypeRouter add explicit support for classifying .msg files with mimetype "application/vnd.ms-outlook" since the mimetypes module returns None for .msg files by default.
- Added the store_full_path init variable to XLSXToDocument to allow users to toggle whether to store the full path of the source file in the meta of the Document. This is set to False by default to increase privacy.
- Increased default timeout for Mermaid server to 30 seconds. Mermaid server is used to draw Pipelines. Exposed the timeout as a parameter for the
Pipeline.showandPipeline.drawmethods. This allows users to customize the timeout as needed. - Optimize import times through extensive use of lazy imports across packages. Importing one component of a certain package, no longer leads to importing all components of the same package. For example, importing OpenAIChatGenerator no longer imports AzureOpenAIChatGenerator.
- Haystack now officially supports Python 3.13. Some components and integrations may not yet be compatible. Specifically, the NamedEntityExtractor does not work with Python 3.13 when using the
spacybackend. Additionally, you may encounter issues installingopenai-whisper, which is required by the LocalWhisperTranscriber component, if you useuvorpoetryfor installation. In this case, we recommend usingpipfor installation. EvaluationRunResultcan now output the results in JSON, a pandas Dataframe or in a CSV file.- Update ListJoiner to only optionally need list_type to be passed. By default it uses type List which acts like List[Any].
- This allows the ListJoiner to combine any incoming lists into a single flattened list.
- Users can still pass list_type if they would like to have stricter type validation in their pipelines.
- Added PDFMinerToDocument functionality to detect and report undecoded CID characters in PDF text extraction, helping users identify potential text extraction quality issues when processing PDFs with non-standard fonts.
- Simplified the serialization code for better readability and maintainability.
- Updated deserialization to allow users to omit the
typing.prefix for standard typing library types (e.g.,List[str]instead oftyping.List[str]).
- Updated deserialization to allow users to omit the
⚠️ Deprecation Notes
- The use of pandas Dataframe in
EvaluationRunResultis now optional and the methodsscore_report,to_pandasandcomparative_individual_scores_reportare deprecated and will be removed in the next haystack release.
🐛 Bug Fixes
- In the
ChatMessage.to_openai_dict_formatutility method, include thenamefield in the returned dictionary, if present. Previously, thenamefield was erroneously skipped. - Pipelines with components that return plain pandas dataframes failed. The comparison of socket values is now 'is not' instead of '!=' to avoid errors with dataframes.
- Make sure that OpenAIChatGenerator sets
additionalProperties: Falsein the tool schema whentool_strictis set toTrue. - Fix a bug where the
output_typeof a ConditionalRouter was not being serialized correctly. This would cause the router to work incorrectly after being serialized and deserialized. - Fixed accumulation of a tools arguments when streaming with an [OpenAIChatGenerator](https://d...
v2.11.0-rc3
v2.11.0-rc3
v2.11.0-rc2
v2.11.0-rc2
v2.10.3
Release Notes
v2.10.3
Bug Fixes
- Fixed accumulation of a tools arguments when streaming with an OpenAIChatGenerator
v2.10.3-rc1
v2.10.3-rc1
v2.10.2
Release Notes
v2.10.2
Bug Fixes
- Pipelines with components that return plain pandas dataframes failed. The comparison of socket values is now 'is not' instead of '!=' to avoid errors with dataframes.
v2.10.2-rc1
v2.10.2-rc1
