Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

3364 fix view compactor unknown info #474

Merged
merged 4 commits into from
Apr 19, 2017

Conversation

jaydoane
Copy link
Contributor

@jaydoane jaydoane commented Apr 6, 2017

Overview

couch_index traps exits, but might get an EXIT message from a crashing mrview_compactor, while it only expects message from its own linked compact process.

This adds a new EXIT info handler to couch_index_compactor to prevent the compactor from crashing when a compacting process fails, and also cleans up the view compact file.

Testing recommendations

We have implemented a new couch_mrview_compact_test should_remove which verifies that a crash in the view compacting process doesn't also cause the index or compactor processes to crash. It also ensures couch_mrview_compactor:remove_compacted/1 is called.

COUCHDB-3364

Checklist

  • Code is written and works correctly;
  • Changes are covered by tests;
  • Documentation reflects the changes;
  • I will not forget to update rebar.config.script
    with the correct commit hash once this PR get merged.

@jaydoane
Copy link
Contributor Author

jaydoane commented Apr 6, 2017

I expect to squash these commits once approved, but want to give appropriate credit to @eiri for the majority of this work.

@jaydoane jaydoane force-pushed the 3364-fix-view-compactor-unknown_info branch 5 times, most recently from a373aae to a39bfd1 Compare April 10, 2017 22:07
Copy link
Contributor

@iilyak iilyak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 in general. However I would recommend to not use gen_server:call from outside of the module (even though this style is used in lots of places in couchdb). Tests pass locally.

 Running test function(s):
      couch_mrview_compact_tests:compaction_test_/0
======================== EUnit ========================
Compaction tests
  couch_mrview_compact_tests:52: should_swap...[0.225 s] ok
  couch_mrview_compact_tests:78: should_remove...[0.165 s] ok
  [done in 0.994 s]
=======================================================
  All 2 tests passed.

waiters = Waiters
} = State,
send_all(Waiters, Reason),
{ok, NewIdxState} = Mod:remove_compacted(OldIdxState),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a braking change. It introduces a new API call which could be missing on custom index type.
If the Mod doesn't have remove_compacted function it would crash here.
I think it is acceptable behavior though since it is currently failing anyway.

@@ -81,6 +85,14 @@ handle_cast(_Mesg, State) ->

handle_info({'EXIT', Pid, normal}, #st{pid=Pid}=State) ->
{noreply, State#st{pid=undefined}};
handle_info({'EXIT', Pid, Reason}, #st{pid = Pid} = State) ->
#st{idx = Idx, mod = Mod} = State,
{ok, IdxState} = gen_server:call(Idx, {compaction_failed, Reason}),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the following would be better. Since it would allow us to hide the internal representation.
{ok, IdxState} = couch_index:handle_failure(Pid, Reason)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we don't do that with gen_call for compacted though. I think the intent here is not to expose internals of couch_index compactor's flow, we want public API on triggering compaction, but not on interrupting its execution except from compaction process.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since there seems to be a disagreement, should we try to find a tiebreaker?

In any case, I'm unable to merge this PR, so a committer will need to do the honors.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can unfold my reasoning if it helps.

First, there are nothing inherently wrong with using gen_server's call. Wrapping it into convenience function is not a requirement for its usage and rides on disputable notion that messages' format got changed more often than functions' name or signature. In this particular case trading arity 2 gen_server's function for arity 2 couch_index's function gives no convenience advantages, but exporting handle_failure encourages to think about it as a public method that could be called from more than one place and this is dangerous, as it could delete compaction file under running compactor.

Second, index_compactor that knows what message to send to index server no more exposed to its internals than couch_compactor that knows what function to call, it is just one-to-one trading of tag of the message to the name of the function. Consider that those are not isolated apps, but two server processes that share a general task of managing index, an introduction of new ambiguous method is going to make a task of understanding how couch_index works more challenging for future curious reader.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the detailed explanation, Eric. Makes sense to me to leave it as you wrote it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First, there are nothing inherently wrong with using gen_server's call.

I disagree, for few reasons:

  1. I do believe that message format changes more often
  2. I can use a stable expression for grep (such as <module>:<function>) to find all places which need to be updated in the case we need to update in case we have to.
  3. If we have an API function in the module which implements gen_server behavior we can use guards to do basic argument validation. This is helpful because most of the time we want to crash the caller and not the gen_server process.

Anyway I am fine with merging it as is. Since couchdb code base uses gen_server:call everywhere.

eiri and others added 4 commits April 19, 2017 11:23
Useful for debugging and testing
- Verify couch_mrview_compactor:remove_compacted is called
- Use couch_index_compactor:get_compacting_pid/1 to simplify
- Rename Pids to be more descriptive
- Fix grammar in assertion_failed reason
@jaydoane jaydoane force-pushed the 3364-fix-view-compactor-unknown_info branch from a39bfd1 to f7767a3 Compare April 19, 2017 18:50
@iilyak iilyak merged commit c3ff408 into apache:master Apr 19, 2017
@jaydoane jaydoane deleted the 3364-fix-view-compactor-unknown_info branch April 24, 2017 18:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants