Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: implement serialization for InMemoryDocumentStore #7887

Open
davidberenstein1957 opened this issue Jun 18, 2024 · 3 comments · May be fixed by #7888
Open

feat: implement serialization for InMemoryDocumentStore #7887

davidberenstein1957 opened this issue Jun 18, 2024 · 3 comments · May be fixed by #7888
Labels
type:feature New feature or request

Comments

@davidberenstein1957
Copy link
Contributor

davidberenstein1957 commented Jun 18, 2024

Is your feature request related to a problem? Please describe.
InMemoryDocumentStore is really nice for showcasing demos and it is relatively easy to implement a to_disk and from_disk method to make this easy. I wrote something custom and easy.

Describe the solution you'd like

# Copyright 2024-present, David Berenstein, Inc.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#     http:https://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

import json
from pathlib import Path
from typing import Any, Dict

from haystack import Document
from haystack.document_stores.in_memory import InMemoryDocumentStore

class Database(InMemoryDocumentStore):
    def to_disk(self, path: str):
        """Write the database and its' data to disk as a JSON file."""
        data: Dict[str, Any] = self.to_dict()
        data["documents"] = [doc.to_dict(flatten=False) for doc in self.storage.values()]
        with open(path, "w") as f:
            json.dump(data, f)

    @classmethod
    def from_disk(cls, path: str) -> "Database":
        """Load the database and its' data from disk as a JSON file."""
        if Path(path).exists():
            try:
                with open(path, "r") as f:
                    data = json.load(f)
                cls_object = cls.from_dict(data)
                cls_object.write_documents([Document(**doc) for doc in data["documents"]])
                return cls_object
            except Exception as e:
                return cls()
        else:
            return cls()

Describe alternatives you've considered
N.A.

Additional context
N.A.

@anakin87 anakin87 added the type:feature New feature or request label Jun 18, 2024
@davidberenstein1957
Copy link
Contributor Author

Would love to work on a potential PR too.

@silvanocerza
Copy link
Contributor

@davidberenstein1957 that would be great!

This methods could be added directly to InMemoryDocumentStore really, no need for a wrapper. 😁

I would be a bit more specific and call them save_to_disk and load_to_disk maybe. 🤔

@davidberenstein1957
Copy link
Contributor Author

davidberenstein1957 commented Jun 18, 2024

@davidberenstein1957 that would be great!

This methods could be added directly to InMemoryDocumentStore really, no need for a wrapper. 😁

I would be a bit more specific and call them save_to_disk and load_to_disk maybe. 🤔

Yes it was an actual copy-paste from code I was using internally.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type:feature New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants