The known_applied_index is not increasing but applying_index is #277

kishorenc · 2021-04-20T03:30:17Z

We have a braft node (no clustering, just a single node) which is under high write load in which the nodeStatus.applying_index and nodeStatus.committed_index are increasing but the nodeStatus.known_applied_index index is just stuck and seems to increase only after every snapshot. During snapshot nodeStatus.known_applied_index increases, and then after that it is just stuck even as the other two indices increase.

Can someone help explain when this can happen? Help needed @PFZheng @Edward-xk @ipconfigme

The text was updated successfully, but these errors were encountered:

PFZheng · 2021-04-20T05:09:58Z

It seems to be the problem of FSMCaller::do_committed. Each time this function pops a batch of logs to apply, but the applied index is only updated after the entire batch processed.

PFZheng · 2021-04-20T05:11:49Z

A solution to this problem is to update applied index in a smaller batch.

kishorenc · 2021-04-20T05:20:12Z

Thank you for replying. Locally I was able to reproduce this problem with the following sequence:

a) Start a node and write 1,000 records (no snapshotting is done)
b) Stop node and modify the on_apply()function by adding a 5 second sleep
c) Restart node

This time, as the entries from the WAL are replayed, I see that the known_applied_index remains at 0 even as the applying_index index progresses. The known_applied_index index is updated only when all entries WAL in the WAL are processed.

Each time this function pops a batch of logs to apply, but the applied index is only updated after the entire batch processed.

So it seems like braft does not update applied index until the WAL is entirely caught up? This can cause issues when a node is under heavy load after a restart, since the WAL will continue growing and so applied_index will never progress.

PFZheng · 2021-04-20T08:19:33Z

So it seems like braft does not update applied index until the WAL is entirely caught up? This can cause issues when a node is under heavy load after a restart, since the WAL will continue growing and so applied_index will never progress.

It will increase, the delay time depends on how long it takes to apply a batch of logs. The workflow of state machine can be described as following:

the state machine see the current maximum committed index A;
the state machine apply logs, whose index <= A. At the duration of process, committed index still increase, but the state machine will process them in the next cycle;
state machine update |known_applied_index| to A;
repeat the step 1-3.

If it's in heavy load, step 2 may take a long time, and the gap between the committed index and |known_applied_index| will be large.

However, it may confuse users, we will fix this problem.

PFZheng · 2021-04-20T09:14:07Z

#278 @kishorenc

kishorenc · 2021-04-20T09:27:30Z

Thank you, that looks good 👍

PFZheng mentioned this issue Apr 20, 2021

Add a new configuration |raft_fsm_caller_commit_batch|, which controls the batch size for FSMCaller to increase |known_applied_index|. #278

Merged

PFZheng linked a pull request Apr 20, 2021 that will close this issue

Add a new configuration |raft_fsm_caller_commit_batch|, which controls the batch size for FSMCaller to increase |known_applied_index|. #278

Merged

Edward-xk closed this as completed in #278 Apr 26, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

The known_applied_index is not increasing but applying_index is #277

The known_applied_index is not increasing but applying_index is #277

kishorenc commented Apr 20, 2021

PFZheng commented Apr 20, 2021

PFZheng commented Apr 20, 2021

kishorenc commented Apr 20, 2021 •

edited

Loading

PFZheng commented Apr 20, 2021 •

edited

Loading

PFZheng commented Apr 20, 2021

kishorenc commented Apr 20, 2021

The known_applied_index is not increasing but applying_index is #277

The known_applied_index is not increasing but applying_index is #277

Comments

kishorenc commented Apr 20, 2021

PFZheng commented Apr 20, 2021

PFZheng commented Apr 20, 2021

kishorenc commented Apr 20, 2021 • edited Loading

PFZheng commented Apr 20, 2021 • edited Loading

PFZheng commented Apr 20, 2021

kishorenc commented Apr 20, 2021

kishorenc commented Apr 20, 2021 •

edited

Loading

PFZheng commented Apr 20, 2021 •

edited

Loading