Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Safely maintaining a rolling history of large documents #8244

Closed
jpike88 opened this issue Dec 1, 2020 · 6 comments
Closed

Safely maintaining a rolling history of large documents #8244

jpike88 opened this issue Dec 1, 2020 · 6 comments

Comments

@jpike88
Copy link
Contributor

jpike88 commented Dec 1, 2020

Got a question...

Up until now I have an 'autosave' bucket that I dump a large JSON (can be up to 30-300mb) into from time to time, with the key 'autosave'. I just upsert when a new autosave is done, in order not to bloat storage. I've noticed that in rare cases, the process of autosaving can fail in some way, and for some reason on a refresh, autosave fails to complete correctly and data loss might occur (I'm still not sure how this happens, but it's a rare occurrence)

I want to upgrade this by having a sort of rolling history of autosaves... is there a benefit to using 10 buckets with one document each, vs 10 documents in the one bucket? How does this impact risk of data corruption/failure, is there a solid failsafe approach here? And should such an approach be recommended in pouchdb documentation?

@dheimoz
Copy link

dheimoz commented Dec 1, 2020

Hello @jpike88 intriguing case you have here.

My first question should be: do you need replication to Couchdb or Cloudant? If that is the case, maybe you should read carefully these:
apache/couchdb#1200
apache/couchdb#1253

Long story short, it is not recommended to have large documents stored, transferred and replicated between the Couchdb Replication protocol. I have my share of nightmares with large images (between 5 and 20 MB) between Pouchdb and Cloudant. It is not very performant and can block access to Cloudant if several users are replicating at the same time (lots of 409 errors). You have to take into consideration that when you "delete" a document in Pouchdb-Couchdb the revisions are kept. You can recover a little bit of disk space with compact, but it is minimal.

In the event you do not need or plan for replication and only require to store attachments or files, maybe you should consider store them directly to IndexedDB. You do not need to worry too much about free space:
https://web.dev/storage-for-the-web/#check
Chrome allows the browser to use up to 80% of total disk space. An origin can use up to 60% of the total disk space.

There are a couple of libraries that I recommend that can complement Pouchdb, one is Localforage https://localforage.github.io/localForage/ and idb https://github.com/jakearchibald/idb

Both of them can store files to IndexedDB. When I need replication and attachments I save the file with Localforage with a key, that key is added to a Document in Pouchdb for offline access, then if I have internet connection I save the file to cloudinary and save the key to Pouchdb.

With all being said, if you really need to store attachments directly to Pouchdb I suggest you first make sure they are in blob format and save them individually.

Hope all of this makes sense for you.

@jpike88
Copy link
Contributor Author

jpike88 commented Dec 2, 2020

I should also add:

The files are actually just JSON, so they've naturally worked as documents. Is there a big performance/reliability difference between storing them in their natural JSON form vs using them as attachments?

@dheimoz
Copy link

dheimoz commented Dec 2, 2020

Hello @jpike88 , I am terribly sorry. I did not read that you specifically pointed out that the actual documents were that size. Well, personally speaking I would not work with JSON Documents larger than 2 MB's. I would rather having several KB documents than 1 single document 1 MB size. This stackoverflow question was shared with me long time ago:

https://stackoverflow.com/questions/47692745/is-20-mb-document-size-limit-in-couchdb-inclusive-of-attachment-size

To sum up, you can store larger documents as attachments, however, it is not recommended. I would store the larger JSON directly to IndexedDB.

@daleharvey
Copy link
Member

We dont actually currently store JSON natively in indexeddb, there were problems with deeply nested objects and so we end up stringifying them via https://github.com/nolanlawson/vuvuzela although thats code I would like to remove.

One thing to be very aware of is we will store a number of copies of every object saved, that number can be configured via revs_limit https://pouchdb.com/api.html#create_database but is global to the database and defaults to ~100, so if you autosave 100 times there will be 1000 documents around, this isnt the same for attachments.

@jpike88
Copy link
Contributor Author

jpike88 commented Dec 5, 2020

@daleharvey thanks for your insight, do you have a recommendation for my particular scenario? I already have max copies set to 1 as I don't really need that function. Is there a change in risk if I i don't use pouchdb for this particular large JSON autosave use case (I still use pouchdb for plenty of other stuff and it works great)

@github-actions
Copy link

github-actions bot commented Feb 4, 2021

This issue is stale because it has been open 60 days with no activity. Remove stale label or comment or this will be closed in 7 days

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants