Skip to content

vitalius/BinaryTree

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

23 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

What is it?

Attempt to empirically show the difference in search time between Ordered and Unordered Binary Tree.

Setup

We're going to populate Ordered and Unordered Binary Trees with nodes that use words (strings of text) for keys, see Node.java. In case of Ordered tree, String.compareTo() is used to order keys based on lexicographical order. While Unordered tree is populated by randomly picking a path (left/right) from root to the leaf level. See Ordered.java and Unordered.java

The expected result for searching Ordered Binary Tree is O(clog(n)), and O(cn) for Unordered (pre-order tree traversal). Where n is the number of nodes (unique terms) in the Binary Tree and c is some constant representing the beefines of my computer.

Procedures

  1. Build Ordered/Unordered Binary Tree with sub-set (size n) of terms from the corpus.
  2. Compute the average search time by searching for every term in the tree and then taking the average.
  3. Perform the experiment as the size of sub-set expands from 1,...,entire corpus.

Experiment 1

A short story by Ellena Ashley, The Dragon Rock is used for the first experiment (dragon-rock.txt). Dragon Rock contains 636 unique terms out of 1429 total, which means total of 636 nodes in a Binary Tree, and depth of 7 when tree is balanced.

The difference between Ordered and Unordered search is visible, however, due to much noise from Java's VM and unbalace in trees, it is hard to approximate the function that best fits the results. We need more data.

Since the number of unique terms grows approximately linearly with the total number of terms

Let's go ahead and use all of the terms.

Looks better but still too noisy for approximation. How about we copy/paste the story 9 times.

Looks like we now have enough data to start approximating O(n) equation, however the Binary Tree stops growing after all unique terms are exhausted (after about 2000 terms).

Clearly, we need a better corpus.

Experiment 2

The Adventures of Sherlock Holmes by Sir Arthur Conan Doyle (holmes.txt) contains 9580 unique words out of 120376 total!

Search in Unordered Binary Tree looks linear with respect to n while Ordered is way at the bottom accross entire graph.

We can now approximate Unordered Binary Tree search. Linear function should look something like t = c*n where we know n (number of nodes) and t (milliseconds) average time it takes to find a node. Averaging c by looking at few experiment data points we find that on this machine, c ~= 0.000016.

Similarly when we zoom-in on Ordered Binary Tree data, it begins to resemble a logarithmic curve: t = c*log(n) where c ~= 0.000055.

Wrap-up

We have empirically shown the difference in search times between Unordered (pre-order tree traversal) and Ordered Binary Trees.

About

Testing Binary-Tree look-up performance on word keys

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published