Safely maintaining a rolling history of large documents #8244

jpike88 · 2020-12-01T15:01:45Z

Got a question...

Up until now I have an 'autosave' bucket that I dump a large JSON (can be up to 30-300mb) into from time to time, with the key 'autosave'. I just upsert when a new autosave is done, in order not to bloat storage. I've noticed that in rare cases, the process of autosaving can fail in some way, and for some reason on a refresh, autosave fails to complete correctly and data loss might occur (I'm still not sure how this happens, but it's a rare occurrence)

I want to upgrade this by having a sort of rolling history of autosaves... is there a benefit to using 10 buckets with one document each, vs 10 documents in the one bucket? How does this impact risk of data corruption/failure, is there a solid failsafe approach here? And should such an approach be recommended in pouchdb documentation?

dheimoz · 2020-12-01T15:52:07Z

Hello @jpike88 intriguing case you have here.

My first question should be: do you need replication to Couchdb or Cloudant? If that is the case, maybe you should read carefully these:
apache/couchdb#1200
apache/couchdb#1253

Long story short, it is not recommended to have large documents stored, transferred and replicated between the Couchdb Replication protocol. I have my share of nightmares with large images (between 5 and 20 MB) between Pouchdb and Cloudant. It is not very performant and can block access to Cloudant if several users are replicating at the same time (lots of 409 errors). You have to take into consideration that when you "delete" a document in Pouchdb-Couchdb the revisions are kept. You can recover a little bit of disk space with compact, but it is minimal.

In the event you do not need or plan for replication and only require to store attachments or files, maybe you should consider store them directly to IndexedDB. You do not need to worry too much about free space:
https://web.dev/storage-for-the-web/#check
Chrome allows the browser to use up to 80% of total disk space. An origin can use up to 60% of the total disk space.

There are a couple of libraries that I recommend that can complement Pouchdb, one is Localforage https://localforage.github.io/localForage/ and idb https://github.com/jakearchibald/idb

Both of them can store files to IndexedDB. When I need replication and attachments I save the file with Localforage with a key, that key is added to a Document in Pouchdb for offline access, then if I have internet connection I save the file to cloudinary and save the key to Pouchdb.

With all being said, if you really need to store attachments directly to Pouchdb I suggest you first make sure they are in blob format and save them individually.

Hope all of this makes sense for you.

jpike88 · 2020-12-02T05:26:26Z

I should also add:

The files are actually just JSON, so they've naturally worked as documents. Is there a big performance/reliability difference between storing them in their natural JSON form vs using them as attachments?

dheimoz · 2020-12-02T11:09:18Z

Hello @jpike88 , I am terribly sorry. I did not read that you specifically pointed out that the actual documents were that size. Well, personally speaking I would not work with JSON Documents larger than 2 MB's. I would rather having several KB documents than 1 single document 1 MB size. This stackoverflow question was shared with me long time ago:

https://stackoverflow.com/questions/47692745/is-20-mb-document-size-limit-in-couchdb-inclusive-of-attachment-size

To sum up, you can store larger documents as attachments, however, it is not recommended. I would store the larger JSON directly to IndexedDB.

daleharvey · 2020-12-05T09:40:51Z

We dont actually currently store JSON natively in indexeddb, there were problems with deeply nested objects and so we end up stringifying them via https://github.com/nolanlawson/vuvuzela although thats code I would like to remove.

One thing to be very aware of is we will store a number of copies of every object saved, that number can be configured via revs_limit https://pouchdb.com/api.html#create_database but is global to the database and defaults to ~100, so if you autosave 100 times there will be 1000 documents around, this isnt the same for attachments.

jpike88 · 2020-12-05T10:04:01Z

@daleharvey thanks for your insight, do you have a recommendation for my particular scenario? I already have max copies set to 1 as I don't really need that function. Is there a change in risk if I i don't use pouchdb for this particular large JSON autosave use case (I still use pouchdb for plenty of other stuff and it works great)

github-actions · 2021-02-04T01:57:32Z

This issue is stale because it has been open 60 days with no activity. Remove stale label or comment or this will be closed in 7 days

daleharvey added the question label Dec 5, 2020

github-actions bot added the stale label Feb 4, 2021

github-actions bot closed this as completed Feb 11, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Safely maintaining a rolling history of large documents #8244

Safely maintaining a rolling history of large documents #8244

jpike88 commented Dec 1, 2020 •

edited

Loading

dheimoz commented Dec 1, 2020 •

edited

Loading

jpike88 commented Dec 2, 2020

dheimoz commented Dec 2, 2020

daleharvey commented Dec 5, 2020

jpike88 commented Dec 5, 2020

github-actions bot commented Feb 4, 2021

Safely maintaining a rolling history of large documents #8244

Safely maintaining a rolling history of large documents #8244

Comments

jpike88 commented Dec 1, 2020 • edited Loading

dheimoz commented Dec 1, 2020 • edited Loading

jpike88 commented Dec 2, 2020

dheimoz commented Dec 2, 2020

daleharvey commented Dec 5, 2020

jpike88 commented Dec 5, 2020

github-actions bot commented Feb 4, 2021

jpike88 commented Dec 1, 2020 •

edited

Loading

dheimoz commented Dec 1, 2020 •

edited

Loading