Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hibernate couch_stream after each write #510

Merged
merged 1 commit into from
May 9, 2017
Merged

Conversation

wohali
Copy link
Member

@wohali wohali commented May 9, 2017

In COUCHDB-1946 Adam Kocoloski investigated a memory explosion resulting
from replication of databases with large attachments (npm fullfat). He
was able to stabilize memory usage to a much lower level by hibernating
couch_stream after each write. While this increases CPU utilization when
writing attachments, it should help reduce memory utilization.

This patch is the single change that affected a ~70% reduction in
memory.

No alteration to the spawn of couch_stream to change the fullsweep_after
setting has been made, in part because this can be adjusted at the erl
command line if desired (-erl ERL_FULLSWEEP_AFTER 0).

Testing recommendations

Replicate a database with a lot of attachments and observe memory usage with and without this patch.

JIRA issue number

COUCHDB-1946

In COUCHDB-1946 Adam Kocoloski investigated a memory explosion resulting
from replication of databases with large attachments (npm fullfat). He
was able to stabilize memory usage to a much lower level by hibernating
couch_stream after each write. While this increases CPU utilization when
writing attachments, it should help reduce memory utilization.

This patch is the single change that affected a ~70% reduction in
memory.

No alteration to the spawn of couch_stream to change the fullsweep_after
setting has been made, in part because this can be adjusted at the erl
command line if desired (-erl ERL_FULLSWEEP_AFTER 0).
@wohali
Copy link
Member Author

wohali commented May 9, 2017

Reminder to myself that if we agree to do this in 2.x, we might want to do it in the 1.6.x branch as well.

@@ -259,7 +259,7 @@ handle_call({write, Bin}, _From, Stream) ->
buffer_len=0,
md5=Md5_2,
identity_md5=IdenMd5_2,
identity_len=IdenLen + BinSize}};
identity_len=IdenLen + BinSize}, hibernate};

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i don't understand the changes, why to use hibernate

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@savanmorya There's a known issue with some uses of large (> 64bits) binaries when the process using them doesn't do much work. Here are two good writeups:

http:https://blog.bugsense.com/post/74179424069/erlang-binary-garbage-collection-a-lovehate
https://blog.heroku.com/logplex-down-the-rabbit-hole

The reason for hibernate is that it forces a full garbage collection where as erlang:garbage_collect(self()) won't always clean out all binary usage. As Fred says its the least ugly solution when its available and the processes are known.

Copy link
Member

@davisp davisp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

@@ -259,7 +259,7 @@ handle_call({write, Bin}, _From, Stream) ->
buffer_len=0,
md5=Md5_2,
identity_md5=IdenMd5_2,
identity_len=IdenLen + BinSize}};
identity_len=IdenLen + BinSize}, hibernate};
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@savanmorya There's a known issue with some uses of large (> 64bits) binaries when the process using them doesn't do much work. Here are two good writeups:

http:https://blog.bugsense.com/post/74179424069/erlang-binary-garbage-collection-a-lovehate
https://blog.heroku.com/logplex-down-the-rabbit-hole

The reason for hibernate is that it forces a full garbage collection where as erlang:garbage_collect(self()) won't always clean out all binary usage. As Fred says its the least ugly solution when its available and the processes are known.

@wohali wohali merged commit 7c3aef6 into master May 9, 2017
wohali added a commit that referenced this pull request May 9, 2017
In COUCHDB-1946 Adam Kocoloski investigated a memory explosion resulting
from replication of databases with large attachments (npm fullfat). He
was able to stabilize memory usage to a much lower level by hibernating
couch_stream after each write. While this increases CPU utilization when
writing attachments, it should help reduce memory utilization.

This patch is the single change that affected a ~70% reduction in
memory.

No alteration to the spawn of couch_stream to change the fullsweep_after
setting has been made, in part because this can be adjusted at the erl
command line if desired (-erl ERL_FULLSWEEP_AFTER 0).

+1 for 2.0.0 and 1.6.x from @davisp, see #510 for details.
@janl janl deleted the 1943-attachment-perf branch May 10, 2017 19:05
@janl
Copy link
Member

janl commented May 10, 2017

Reminder to myself that if we agree to do this in 2.x, we might want to do it in the 1.6.x branch as well.

+1

@wohali
Copy link
Member Author

wohali commented May 10, 2017

@janl Already done! f073391

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants