Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Calculate ExternalSize Correctly for Views #608

Merged
merged 1 commit into from
Jul 15, 2017

Conversation

tonysun83
Copy link
Contributor

@tonysun83 tonysun83 commented Jun 21, 2017

Overview

Previously, we were calculating the ExternalSize for views by summing
up all the nodes in the btree. Furthermore, this was the compressed
size. Now we modify the reduce function to return an ExternalSize for
uncompressed values in the KVList.

Testing recommendations

Performance tests were run to compare External Size Info pre-fix and post-fix.

Version        Compression    DB Dize     View External Size

Pre-Fix         snappy            70 MB        26123231     
Pre-Fix         snappy            700 MB      493542711    
Pre-Fix         snappy            2.1 GB       795168531   
                                        
Post-Fix       snappy            70 MB        81775342
Post-Fix       snappy            700 MB      818752126        
Post-Fix       snappy            2.1 GB       2457701744       
                                        
Pre-Fix         deflate_6        70 MB        26109645         
Pre-Fix         deflate_6        700 MB      249677553     
Pre-Fix         deflate_6        2.1 GB       797076702        
                                        
Post-Fix       deflate_6        70 MB        81775244
Post-Fix       deflate_6        700 MB      818752412
Post-Fix       deflate_6        2.1 GB       2478167958

GitHub issue number

PR is number

Related Pull Requests

Checklist

  • Code is written and works correctly;
  • Changes are covered by tests;
  • Documentation reflects the changes;

Copy link
Member

@davisp davisp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as for the db sizes we should figure out why the numbers are different when there's a different compression scheme. And we'll also want to make the same checks for pre/post compaction equivalence.

@@ -609,7 +612,6 @@ make_header(State) ->
log_btree=LogBtree,
views=Views
} = State,

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove this WS only hunk

end
end,
{ok, lists:foldl(SumFun, 0, Views)}.


sum_btree_sizes(nil, _) ->
sum_btree_sizes(null, _) ->
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This nil/null change strikes me as possibly problematic for the non KV trees.

Copy link
Contributor Author

@tonysun83 tonysun83 Jun 27, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@davisp: I changed reduced_external_size/1 to return 0 for cases when an ExternalSize is not returned. Previously, the value returned when there was no ExternalSize existed was null. So here I was accounting for the situation when it was null. Now that I changed the value 0, it seems the only possible values here are 0 or some other number. It can't be be nil or null? I think it's probably due to my lack of understanding of non KV trees.


kv_external_size(KVList, Reduction) ->
lists:foldl(fun([[Key, _], Value], Acc) ->
size(term_to_binary(Key)) + size(term_to_binary(Value)) + Acc
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use ?term_size/1 instead of size(term_to_binary(Term)) for these.

case couch_btree:full_reduce(Tree) of
{ok, {_, _, Size}} -> Size;
% return null for versions of the reduce function without Size
{ok, {_, _}} -> null
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're crossing streams a bit here. This returns null when there's no external size. But below your reduce function returns 0 when there's no external size. This means that with no write, we'll get null, but the first change will give us a number. We should pick one or the other.

@tonysun83
Copy link
Contributor Author

tonysun83 commented Jun 28, 2017

hmmm, pre compaction and post compaction is slightly off by a few hundred bytes, but the compression method does not matter. need to figure out why

@tonysun83
Copy link
Contributor Author

tonysun83 commented Jun 28, 2017

looks like {ok, Result} = couch_query_servers:reduce(Lang, [FunSrc], KVs), returns a slightly different value after compaction. That's why post-compaction, when we do ?term_size(Reduction), it returns a slightly different value.

Copy link
Member

@davisp davisp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One minor change but looks good otherwise.

@@ -1038,22 +1047,32 @@ get_user_reds(Reduction) ->
element(2, Reduction).


get_external_reds(Reduction) when tuple_size(Reduction) == 2 ->
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor nit, call this get_external_size_reds. external_reds could be a bit misleading here.

Copy link
Member

@davisp davisp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

Good work!

tonysun83 pushed a commit that referenced this pull request Jul 15, 2017
Previously, we were calculating the ExternalSize for views by summing
up all the nodes in the btree. Furthermore, this was the compressed
size. Now we modify the reduce function to return an ExternalSize for
uncompressed values in the KVList.
PR: #608

COUCHDB-3430
@tonysun83 tonysun83 merged commit 6d06dcf into master Jul 15, 2017
Previously, we were calculating the ExternalSize for views by summing
up all the nodes in the btree. Furthermore, this was the compressed
size. Now we modify the reduce function to return an ExternalSize for
uncompressed values in the KVList.
PR: #608

COUCHDB-3430
@tonysun83 tonysun83 deleted the 3430-external-size-views branch July 17, 2017 15:43
tonysun83 pushed a commit to cloudant/couchdb that referenced this pull request Jul 19, 2017
Previously, we were calculating the ExternalSize for views by summing
up all the nodes in the btree. Furthermore, this was the compressed
size. Now we modify the reduce function to return an ExternalSize for
uncompressed values in the KVList.
PR: apache#608

COUCHDB-3430
nickva pushed a commit to nickva/couchdb that referenced this pull request Sep 7, 2022
* Description of Standalone vs Cluster

Added a short description of Standalone and Clustermode which should make clearer what the purposes of the modes are.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants