Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance issue with the write operations #36

Open
angoenka opened this issue Mar 5, 2021 · 6 comments
Open

Performance issue with the write operations #36

angoenka opened this issue Mar 5, 2021 · 6 comments

Comments

@angoenka
Copy link

angoenka commented Mar 5, 2021

The write operation creates a connection per bundle which can very small for streaming causing a flood of short lived connections and without connection pooling, this could become a performance bottleneck.
Also, having a batch write option would significantly reduce the number of writes.

@chishankar-work
Copy link

chishankar-work commented Mar 9, 2021

+1

I have a suggested pull request I can add here in a little bit.

The idea is to emulate how we write in batches with the JdbcIO connector for Java.

We can call .write_record() without doing the .commit() every time. We then move the .commit() to the .finishBundle() and then a little bit of logic for committing batches for N amount of records to prevent the commit from getting too large.

@chishankar-work
Copy link

Additionally, opening and creating a new connection is a an expensive operation from the server side. This stackexchange post talks more about why. Moving the connections to a static pool allows the reuse of connections, which is best practice for SQL servers.

Added advantages of writing in batches, after a GBK, allows you to keep all transactions for the affected record in one place. This drastically reduces the amount of thread contention in the SQL server as threads (each handling a connection and a transaction) won't be waiting to access the same row. Lastly it also allows the users to take advantage of server optimizations since we submit it all as one transaction.

@mohaseeb
Copy link
Owner

Thanks @chishankar-work; I'll have a look soon at your PR.

@manuelnucci
Copy link

In which state is this improvement? I'm worried that all the single inserts will hammer very hard the PostgreSQL database that I have.

It's a bit tricky in my opinion when I want a custom UPSERT and, at the same time, doing it in batch mode.

@chishankar-work will your PR support this feature?

@medzin
Copy link

medzin commented Nov 3, 2022

@mohaseeb It seems that the original author dropped this feature. I'm happy to prepare a new PR.

@StevieBurnsSky
Copy link

Did this enhancement ever get merged in?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants