Apply random jitter during initial _replicator shard discovery #484

nickva · 2017-04-21T20:04:33Z

This is bringing back previous code:

couchdb/src/couch_replicator_manager.erl

Lines 940 to 946 in 884cf3e

    
                  DbName = ?l2b(filename:rootname(RelativeFilename, ".couch")), 
        
                  Jitter = jitter(Acc), 
        
                  spawn_link(fun() -> 
        
                      timer:sleep(Jitter), 
        
                      gen_server:cast(Server, {resume_scan, DbName}) 
        
                  end), 
        
           Acc + 1

The rationale is the following: during shard scanning a lot of resume_scan
messages are sent back to back. This causes the replicator manager to open
change feeds for all of those shards. By delaying resume_scan message by
a jitter proportional to the number messages sent to far, it gives replicator
manager a chance to open some change feeds, finish processing them and close
them before newer resume_scan messages arrive.

The random delay average starts 10 msec for first message, up to 1 min for 6000th and higher.
Some sample values:

For 100 messages, average wait will be 1 second
For 1000 - 10 seconds
For 6000 and higher - 1 minute

Jira: COUCHDB-3389

wohali · 2017-04-30T18:17:00Z

@nickva Seeing conflicts here, does this still make sense with the scheduler merged?

nickva · 2017-05-01T14:26:12Z

@wohali you're right this will need to be updated for the scheduling replicator.

This is bringing back previous code: https://github.com/apache/couchdb/blob/884cf3e55f77ab1a5f26dc7202ce21771062eae6/src/couch_replicator_manager.erl#L940-L946 This is to avoid a stampede during startup when potentially a large number shards are found and change feeds have to be opened for all of them at the same time. The average jitter value starts at 10 msec for first shard, then goes up to 1 minute for 6000th shard and stays clamped at 1 minute afterwards. (Note: that's the average, the range is 1 -> 2 * average as this is a uniform random distribution). Some sample values: * 100 - 1 second * 1000 - 10 seconds * 6000 and higher - 1 minute Jira: COUCHDB-3389

iilyak · 2017-05-05T19:25:41Z

src/couch/src/couch_multidb_changes.erl

    end.


+notify_fold(DbName, {Server, DbSuffix, Count}) ->
+    Jitter = jitter(Count),
+    spawn_link(fun() ->


We might want to seed random somehow.

In this case we don't see it because it runs in the same process. If it was running in side the individually spawned process we'd need to seed it.

Right. Thank you for pointing this out.

iilyak

+1

nickva force-pushed the couchdb-3389 branch from d8516fc to 715ff44 Compare April 21, 2017 21:22

wohali added the replication label Apr 30, 2017

nickva force-pushed the couchdb-3389 branch from 715ff44 to e0823b5 Compare May 5, 2017 07:22

nickva force-pushed the couchdb-3389 branch from e0823b5 to 50a738a Compare May 5, 2017 14:24

iilyak reviewed May 5, 2017

View reviewed changes

iilyak approved these changes May 5, 2017

View reviewed changes

nickva merged commit 4a63d22 into apache:master May 5, 2017

nickva deleted the couchdb-3389 branch May 5, 2017 19:36

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Apply random jitter during initial _replicator shard discovery #484

Apply random jitter during initial _replicator shard discovery #484

nickva commented Apr 21, 2017

wohali commented Apr 30, 2017

nickva commented May 1, 2017

iilyak May 5, 2017

nickva May 5, 2017 •

edited

Loading

iilyak May 5, 2017

iilyak left a comment

	DbName = ?l2b(filename:rootname(RelativeFilename, ".couch")),
	Jitter = jitter(Acc),
	spawn_link(fun() ->
	timer:sleep(Jitter),
	gen_server:cast(Server, {resume_scan, DbName})
	end),
	Acc + 1

Apply random jitter during initial _replicator shard discovery #484

Apply random jitter during initial _replicator shard discovery #484

Conversation

nickva commented Apr 21, 2017

wohali commented Apr 30, 2017

nickva commented May 1, 2017

iilyak May 5, 2017

Choose a reason for hiding this comment

nickva May 5, 2017 • edited Loading

Choose a reason for hiding this comment

iilyak May 5, 2017

Choose a reason for hiding this comment

iilyak left a comment

Choose a reason for hiding this comment

nickva May 5, 2017 •

edited

Loading