Streaming API for attachment data #1540

wohali · 2018-08-07T16:28:23Z

It would be nice to have a more efficient method of replicating attachments to/from Couch. Currently we use multipart for uploads and GET /db/doc/att for downloading (see pouchdb/pouchdb#3964 (comment) for why). It'd be nice to be able to stream and restart attachment requests.

Emerging browser spec for background uploads/downloads: https://github.com/WICG/background-fetch

/cc @daleharvey @janl

cluxter · 2018-10-08T21:11:04Z

In an ideal situation, I would like to be able to:

upload attachments of unlimited size, ie. only limited by the file system, not by the CouchDB storage system (so nothing like this: [DISCUSS] Validate new document writes against max_http_request_size #1253 )
have a smooth replication of these attachments between the CouchDB instances, ie. huge attachments replications won't clog up CouchDB in any way (which doesn't mean the replication wouldn't be slowed down, obviously; we don't have unlimited bandwidth).

This desire implies that:

being able to store huge attachments in a database is not seen as bad practice. I'm certain some people will come up and say "Hey, ending up storing files of thousands of gigabytes in a database is silly, this means that your storage design is wrong, go fix that now instead of using CouchDB as a file system". Well, in 10 or 15 years, files of hundreds of gigabytes might be normal for some activities and I would like CouchDB to be able to scale by design, not because of the hardware available through time. The idea here is not to use CouchDB as a file system, but being able to have a place in which all data of a software system could fit. I don't like the idea that I have to use one storage system for small files (CouchDB) and one other storage system for big files, especially when the size limit of the files is arbitrary and depends on the bandwidth/CPU available (or some vague notion). Basically putting a maximum size limit on attachments means that we don't want to deal with this issue and that we let it for another system to fix it. Or worse: we make people believe that they can use attachments but... not really actually.
we need a strong resilient and reliable replication system which can operate under bad conditions. This would align on the strong resiliency CouchDB already offers with regards to unexpected shutdowns. My instinct tells me that a P2P system similar to Kazaa/eMule/Bittorrent (I'm looking at the multi-sources P2P paradigm, not the protocols per se) would be ideal because it's fast, efficient and resilient. But maybe this is not well suited for CouchDB. Or maybe we are using this already (not what I understood so far though). I'm pretty sure this would require a lot of work, but I would at least like to know that it's somewhere on the long term road-map.

Now this is a personal vision of what CouchDB should look like but maybe this is not shared by many other people. Or maybe it is. Please don't hesitate to (respectfully and constructively) criticize my views and argument on them, I'm eager to learn more about why this should or should not be done.

wohali · 2018-10-08T21:55:22Z

@cluxter Right now, large attachments (>16MB of attachments per JSON document) aren't a first order design scenario for CouchDB internal storage or so-called "internal replication" between nodes in a cluster. That needs to be resolved before thinking about any sort of "external" replication enhancements that specifically address large files.

The people who get to make that decision are the people who actually develop CouchDB. If you're an Erlang developer and think you have the chops to tackle this, we'd love to see your patches.

anuragvohraec · 2020-03-16T06:08:08Z

@cluxter Right now, large attachments (>16MB of attachments per JSON document) aren't a first order design scenario for CouchDB internal storage or so-called "internal replication" between nodes in a cluster. That needs to be resolved before thinking about any sort of "external" replication enhancements that specifically address large files.

The people who get to make that decision are the people who actually develop CouchDB. If you're an Erlang developer and think you have the chops to tackle this, we'd love to see your patches.

Is there any progress on this?
Or any road map in these direction?

Cannot agree more, for these requirements.
In today's world, streaming must be the first hand support function of any database.

VladimirCores · 2020-07-28T16:29:17Z

I'm Interested in the progress of the feature.

nkev · 2020-11-17T05:06:23Z

I wonder if attitudes regarding the use of CouchDB as a replicating-streaming server will improve in V4 with FoundationDB as the underlying engine...

wohali added api performance roadmap labels Aug 7, 2018

wohali mentioned this issue Aug 7, 2018

Streaming API for attachments janl/couchdb-next#69

Closed

wohali added this to In Discussion in Roadmap Jul 11, 2019

wohali moved this from Proposed for 3.x to Proposed (backlog) in Roadmap Jul 11, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Streaming API for attachment data #1540

Streaming API for attachment data #1540

wohali commented Aug 7, 2018

cluxter commented Oct 8, 2018

wohali commented Oct 8, 2018

anuragvohraec commented Mar 16, 2020

VladimirCores commented Jul 28, 2020

nkev commented Nov 17, 2020

Streaming API for attachment data #1540

Streaming API for attachment data #1540

Comments

wohali commented Aug 7, 2018

cluxter commented Oct 8, 2018

wohali commented Oct 8, 2018

anuragvohraec commented Mar 16, 2020

VladimirCores commented Jul 28, 2020

nkev commented Nov 17, 2020