Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Streaming API for attachment data #1540

Open
wohali opened this issue Aug 7, 2018 · 5 comments
Open

Streaming API for attachment data #1540

wohali opened this issue Aug 7, 2018 · 5 comments

Comments

@wohali
Copy link
Member

wohali commented Aug 7, 2018

@nolanlawson:

It would be nice to have a more efficient method of replicating attachments to/from Couch. Currently we use multipart for uploads and GET /db/doc/att for downloading (see pouchdb/pouchdb#3964 (comment) for why). It'd be nice to be able to stream and restart attachment requests.

Emerging browser spec for background uploads/downloads: https://github.com/WICG/background-fetch

/cc @daleharvey @janl

@cluxter
Copy link

cluxter commented Oct 8, 2018

In an ideal situation, I would like to be able to:

  1. upload attachments of unlimited size, ie. only limited by the file system, not by the CouchDB storage system (so nothing like this: [DISCUSS] Validate new document writes against max_http_request_size #1253 )
  2. have a smooth replication of these attachments between the CouchDB instances, ie. huge attachments replications won't clog up CouchDB in any way (which doesn't mean the replication wouldn't be slowed down, obviously; we don't have unlimited bandwidth).

This desire implies that:

  1. being able to store huge attachments in a database is not seen as bad practice. I'm certain some people will come up and say "Hey, ending up storing files of thousands of gigabytes in a database is silly, this means that your storage design is wrong, go fix that now instead of using CouchDB as a file system". Well, in 10 or 15 years, files of hundreds of gigabytes might be normal for some activities and I would like CouchDB to be able to scale by design, not because of the hardware available through time. The idea here is not to use CouchDB as a file system, but being able to have a place in which all data of a software system could fit. I don't like the idea that I have to use one storage system for small files (CouchDB) and one other storage system for big files, especially when the size limit of the files is arbitrary and depends on the bandwidth/CPU available (or some vague notion). Basically putting a maximum size limit on attachments means that we don't want to deal with this issue and that we let it for another system to fix it. Or worse: we make people believe that they can use attachments but... not really actually.
  2. we need a strong resilient and reliable replication system which can operate under bad conditions. This would align on the strong resiliency CouchDB already offers with regards to unexpected shutdowns. My instinct tells me that a P2P system similar to Kazaa/eMule/Bittorrent (I'm looking at the multi-sources P2P paradigm, not the protocols per se) would be ideal because it's fast, efficient and resilient. But maybe this is not well suited for CouchDB. Or maybe we are using this already (not what I understood so far though). I'm pretty sure this would require a lot of work, but I would at least like to know that it's somewhere on the long term road-map.

Now this is a personal vision of what CouchDB should look like but maybe this is not shared by many other people. Or maybe it is. Please don't hesitate to (respectfully and constructively) criticize my views and argument on them, I'm eager to learn more about why this should or should not be done.

@wohali
Copy link
Member Author

wohali commented Oct 8, 2018

@cluxter Right now, large attachments (>16MB of attachments per JSON document) aren't a first order design scenario for CouchDB internal storage or so-called "internal replication" between nodes in a cluster. That needs to be resolved before thinking about any sort of "external" replication enhancements that specifically address large files.

The people who get to make that decision are the people who actually develop CouchDB. If you're an Erlang developer and think you have the chops to tackle this, we'd love to see your patches.

@wohali wohali added this to In Discussion in Roadmap Jul 11, 2019
@wohali wohali moved this from Proposed for 3.x to Proposed (backlog) in Roadmap Jul 11, 2019
@anuragvohraec
Copy link

@cluxter Right now, large attachments (>16MB of attachments per JSON document) aren't a first order design scenario for CouchDB internal storage or so-called "internal replication" between nodes in a cluster. That needs to be resolved before thinking about any sort of "external" replication enhancements that specifically address large files.

The people who get to make that decision are the people who actually develop CouchDB. If you're an Erlang developer and think you have the chops to tackle this, we'd love to see your patches.

Is there any progress on this?
Or any road map in these direction?

Cannot agree more, for these requirements.
In today's world, streaming must be the first hand support function of any database.

@VladimirCores
Copy link

I'm Interested in the progress of the feature.

@nkev
Copy link

nkev commented Nov 17, 2020

I wonder if attitudes regarding the use of CouchDB as a replicating-streaming server will improve in V4 with FoundationDB as the underlying engine...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Roadmap
  
Proposed (backlog)
Development

No branches or pull requests

5 participants