Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimise fabric:all_dbs() #5037

Merged
merged 1 commit into from
Apr 25, 2024
Merged

Optimise fabric:all_dbs() #5037

merged 1 commit into from
Apr 25, 2024

Conversation

nickva
Copy link
Contributor

@nickva nickva commented Apr 24, 2024

Previous version was awfully inefficient: we only need the dbname, but we opened each document body, parsed the shards out of it, generated Q*N shards, then put them in a list and usorted them back down to a list of dbnames.

We don't use this in many places (mem3 security sync, fabric bench) but it might be nice to have an efficient version of it anyway.

Previous version was awefully inefficient: we only need the dbname, but we
opened each document body, parsed the shards out of it, generated Q*N shards,
then put them in a list and usorted them back down to a list of dbnames.

We don't use this in many places (mem3 security sync, fabric bench) but it
might be nice to have an efficient version of it anyway.
@nickva nickva requested a review from jaydoane April 24, 2024 21:04
Copy link
Contributor

@jaydoane jaydoane left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tested this out on a cluster with a large shards db. The current implementation takes over 12 seconds:

(dbcore@db10.bigblue.cloudant.net)7> timer:tc(fun() -> fabric:all_dbs(<<"jdoane">>) end).
{12599252,
 {ok,[<<"jdoane/cats">>,<<"jdoane/d/">>,<<"jdoane/db+plus">>,
      <<"jdoane/db_$()f">>,<<"jdoane/devops-dev-products">>,
      <<"jdoane/foo/bar">>,<<"jdoane/hogs">>,<<"jdoane/large">>,
      <<"jdoane/n0hjj4$x$a">>,<<"jdoane/replicated-cats">>,
      <<"jdoane/sec-test">>,<<"jdoane/testy_db_jydrjarchp">>,
      <<"jdoane/testy_db_target">>,
      <<"jdoane/testy_db_uxlqexkazh">>,<<"jdoane/tmp">>,
      <<"jdoane/wgb4g">>]}}

I copied a slightly modified mem3_shards:fold_dbs/3 into a remsh and it only takes about a couple milliseconds:

(dbcore@db10.bigblue.cloudant.net)10> timer:tc(fun() -> Dbs = _fold_dbs(<<"jdoane">>, FoldFun, []), {ok, lists:reverse(Dbs)} end).
{1755,
 {ok,[<<"jdoane/cats">>,<<"jdoane/d/">>,<<"jdoane/db+plus">>,
      <<"jdoane/db_$()f">>,<<"jdoane/devops-dev-products">>,
      <<"jdoane/foo/bar">>,<<"jdoane/hogs">>,<<"jdoane/large">>,
      <<"jdoane/n0hjj4$x$a">>,<<"jdoane/replicated-cats">>,
      <<"jdoane/sec-test">>,<<"jdoane/testy_db_jydrjarchp">>,
      <<"jdoane/testy_db_target">>,
      <<"jdoane/testy_db_uxlqexkazh">>,<<"jdoane/tmp">>,
      <<"jdoane/wgb4g">>]}}

which is several orders of magnitude improvement.

Fantastic work!

FoldFun = fun(#full_doc_info{id = Id}, Acc) ->
case Id of
<<Prefix:Len/binary, _/binary>> -> {ok, Fun(Id, Acc)};
<<_/binary>> -> {stop, Acc}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems like a clever way of not having to use an end key like e.g. <<Prefix/binary, ?HIGH_VALUE_UNICODE/binary>>

@nickva nickva merged commit 5cb8529 into main Apr 25, 2024
14 checks passed
@nickva nickva deleted the optimise-all-dbs branch April 25, 2024 01:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants