Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

🚀 Feature: Bulk Document Creation #3051

Open
2 tasks done
Shadowfita opened this issue Apr 2, 2022 · 23 comments
Open
2 tasks done

🚀 Feature: Bulk Document Creation #3051

Shadowfita opened this issue Apr 2, 2022 · 23 comments
Labels
enhancement New feature or request product / databases Fixes and upgrades for the Appwrite Database.

Comments

@Shadowfita
Copy link

Shadowfita commented Apr 2, 2022

🔖 Feature description

Create a "createDocuments" post endpoint that takes an array of documents.

🎤 Pitch

In my project, I am trying to insert 12,000 documents in one go. It is inefficient running 12,000 external createDocument API calls to achieve this.

To remedy this, there should be a "createDocuments" endpoint that allows you to pass an array of documents. That way you could easily break up the number of external calls required, and allow the stack to process the creation of the large amount of documents interally, quickly and efficiently.

A potential workaround would be creating a function that does it locally, but I don't believe this is a proper solution.

👀 Have you spent some time to check if this issue has been raised before?

  • I checked and didn't find similar issue

🏢 Have you read the Code of Conduct?

@Shadowfita
Copy link
Author

This is not achieveable with functions due to the 8192 character limit.

@Shadowfita
Copy link
Author

I have a achieved an okay workaround by creating a ".json" file inside a bucket that contains an array of documents, which is then read by a function that inserts the required documents.

@Shadowfita
Copy link
Author

I have a achieved an okay workaround by creating a ".json" file inside a bucket that contains an array of documents, which is then read by a function that inserts the required documents.

It's currently taking about 46 seconds to process 15,863 simple json objects.

@eldadfux eldadfux added the product / databases Fixes and upgrades for the Appwrite Database. label Apr 16, 2022
@zcoderr
Copy link

zcoderr commented Apr 16, 2022

Need this feature too

@elunatix
Copy link

I have a achieved an okay workaround by creating a ".json" file inside a bucket that contains an array of documents, which is then read by a function that inserts the required documents.

Hi, can you point me to a demo file please. NodeJS if possible.

@Shadowfita
Copy link
Author

I have a achieved an okay workaround by creating a ".json" file inside a bucket that contains an array of documents, which is then read by a function that inserts the required documents.

Hi, can you point me to a demo file please. NodeJS if possible.

No worries.

You can find it at the link below. It was thrown together pretty quickly.

It expects a JSON file to be created in a bucket with the following structure

{ 'collection': 'collection_id', 'data': [ 'object-array' ] }

https://gist.github.com/Shadowfita/b5ccd20f65566cb9f2b40d416c5201a2

@pilcrowOnPaper
Copy link

Would also like a batch delete too.

@FuadEfendi
Copy link

Using functions and bucket is brilliant solution!

Another potential workaround is to use multithreaded client and upload documents in parallel; for instance, client application developed in Kotlin.
And, perhaps Server SDK can allow to create server-side extension; or, trivial, server-side application (such as "function", or Kotlin-based standalone) can read batch JSON from local filesystem (from "bucket", FTP, S3, etc) and create records, so we avoid excessive HTTP traffic.

Or, another solution:

  • use "Staging" instance in local network, load data, backup MariaDB, restore MariaDB in production system
  • backup production MariaDB, analyze SQL dump file, programmatically generate additional 12,000 SQL statements, and restore it

The best approach would be if Appwrite API has such functions: export (permissions/objects/buckets/collections/functions) (JSON), import, etc.; in this case we "abstract" underlaying implementation details and can do more granular export/import. For example, right now we don't have implementation-independent backup/restore (except that executing sqldump locally; what about cluster then?)

@balachandarlinks
Copy link

It would be great if bulk update is also considered as it ll greatly reduce the number of requests I make in my application.

@singhbhaskar
Copy link
Contributor

Hey @stnguyen90 ,
Can I work on this. I have gone through the requirements and tried to implement a basic endpoint to achieve this and was able to implement this on my local setup for Bulk create. I would be happy to contribute to this.

@stnguyen90
Copy link
Contributor

@singhbhaskar, thanks for your interest! 🙏 However, it would be best for the core team to figure out how it should work.

@rafagazani
Copy link

Need this feature too

@danilo73r
Copy link

I Need bulk operations too, please

@Vedsaga
Copy link

Vedsaga commented May 4, 2023

I need this also, need to create 18K document which are Pincode I should say...

@ashuvssut
Copy link

ashuvssut commented Jun 8, 2023

I want to delete multiple docs by their IDs too

@Shiba-Kar
Copy link

So I am working on a scrapping project , there are almost 2K records , it takes almost 20 min to write !!!

{"ColumnRef":"Name","index":"0","value":"orange house"}

this is the size of each record !!!

is there any way to speed up the writes !!

@Shadowfita
Copy link
Author

So I am working on a scrapping project , there are almost 2K records , it takes almost 20 min to write !!!

{"ColumnRef":"Name","index":"0","value":"orange house"}

this is the size of each record !!!

is there any way to speed up the writes !!

You should run all write requests asynchronously and wrap them in Promise.all . Appwrite is built to scale and will handle that many concurrent requests with no issue, and it will reduce your wait-time exponentionally.

@ashuvssut
Copy link

@Shadowfita Is there a guarantee that all promises will resolve if we do it with Promise.all now?

Last time I checked like 2 months ago, I tried to do bulk document deletion with Promise.all and it seems some of them failed.

@Vedsaga
Copy link

Vedsaga commented Aug 7, 2023

Is there a guarantee that all promises will resolve if we do it with Promise.all now?

Answer depends actually,

INstead you should delete in batches, meaning 10-50 per Promise instead of all at one. if you do all at one then it will likely fail because of limitation of the server.

@tripolskypetr
Copy link

Right now I am using that lib for bulk document-from-json creation

@stnguyen90 stnguyen90 added enhancement New feature or request and removed feature labels Mar 20, 2024
@jquiros2
Copy link

jquiros2 commented Apr 30, 2024

I agree a csv importer in the dash would be very useful.

I also had gone down the road of: csv--> json, then json--> appwrite but throttling and processing just made it unbearable for larger workloads. This is using appwrite self-hosted.
you need to figure out what your own _3_database_x_collection_x table is.

use appwrite;
LOAD DATA LOCAL INFILE '/path_to_your.csv'
INTO TABLE _3_database_x_collection_x
CHARACTER SET latin1 
FIELDS TERMINATED BY ','
ENCLOSED BY '"'
LINES TERMINATED BY '\n'
IGNORE 1 ROWS
(list, the, fields, in, your csv)
SET _uid = LEFT(UUID(), 20), 
_createdAt = DATE_FORMAT(NOW(), '%Y-%m-%d %H:%i:%s.%f'),
_updatedAt = DATE_FORMAT(NOW(), '%Y-%m-%d %H:%i:%s.%f'),
_permissions="[]";

For _uid I had used LEFT(MD5(RAND()), 20) at first but I got too many repeated values.

For testing, best to copy the _3_database_x_collection_x table into another and feed that one. then

insert into a select * from b

This seems to work. ymmv and proceed at your own risk.

@TechComet
Copy link

Any new ?

@tripolskypetr
Copy link

Guys use https://github.com/react-declarative/appwrite-backup-tool

The restore script can upload over than 10_000 documents without loading all of them to the RAM. That means it scalable and work with any size of data

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request product / databases Fixes and upgrades for the Appwrite Database.
Projects
Status: Todo
Development

No branches or pull requests