Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Post-order traversal on a medium-size and large graph #27

Open
amatiushkin opened this issue May 31, 2022 · 0 comments
Open

Post-order traversal on a medium-size and large graph #27

amatiushkin opened this issue May 31, 2022 · 0 comments
Assignees
Labels
help wanted Extra attention is needed

Comments

@amatiushkin
Copy link
Contributor

amatiushkin commented May 31, 2022

Traverser context builder defines a context strategy (see TraverseContextBuilder.ContextStrategy), which helps to control result during traversal.

In case of post order traversal, the action to set result is invoked once traversal of a current node is complete (hence the name).
On a medium and large graphs this final termination could happen very far from start/root node, the path from start to this node could easily include thousands of intermediate nodes.

Current implementation of NORMAL_STRATEGY propagates result from current context to parent, and then from parent to grandparent, which utilize application call stack.

On a JVM with standard configuration (512K), this could generate StackOverflowError once call stack is exceeded.
In my case, it took 14-15K nodes in a path to cause the error.

There are several ways to mitigate it.

Increase JVM stack size

Use configuration to increase stack size with java -Xss1M (10M, etc). This is usually enough to enable traversal on a graph with millions nodes.

Create custom context strategy

Another way to avoid stack overflow is to handle result differently by utilizing custom context strategy.

This example is for illustration purposes, it assumes default configuration and expects custom strategy. It gives up ability to call strategy per every result, therefore it is not general enough.

        private static final ContextStrategy CUSTOM_NORMAL_STRATEGY = new ContextStrategy() {
            @Override
            public <U> void setResult(TraverseContextBuilder<?, ?, ?> outer, U result) {
                outer.result = result;
                rootContext(outer).setResult(result);
            }

            @Override
            public <U> U getResult(TraverseContextBuilder<?, ?, ?> outer) {
                return rootContext(outer).getResult();
            }

            private TraverseContextBuilder<?, ?, ?> rootContext(TraverseContextBuilder<?, ?, ?> outer) {
                TraverseContextBuilder<?, ?, ?> parent = outer.parentContext;
                while(parent.contextStrategy == CUSTOM_NORMAL_STRATEGY) { // walk till ROOT
                    parent = parent.parentContext;
                }
                return parent;
            }

            @Override
            public <U> U getVar(TraverseContextBuilder<?, ?, ?> outer, Class<? super U> key) {
                U value;
                return ((value = (U)outer.vars.get(key)) != null || outer.vars.containsKey(key))
                        ? value
                        : outer.parentContext.getVar(key);
            }

            @Override
            public <U> U setVar(TraverseContextBuilder<?, ?, ?> outer, Class<? super U> key, U newValue) {
                return outer.vars.containsKey(key)
                        ? (U)outer.vars.put(key, newValue)
                        : outer.parentContext.setVar(key, newValue);
            }

            @Override
            public String toString() {
                return "CUSTOM_NORMAL_STRATEGY";
            }
        };

Find very unit test to illustrate above in my private gist.

Question to the community

Traverser is designed as a general-purpose library and some limitations and tradeoffs needs to be made.

What is more important for default configuration: ability to traverse large graphs or functionality set? Shall it strive to have both?

PS
This is practical question for anyone, who operates with linked and connected data on a scale of wikidata (~100M "entities").

@amatiushkin amatiushkin added the help wanted Extra attention is needed label May 31, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

2 participants