Skip to content

tousekjan/fit-ctu-vwm-rtree

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

BI-VWM R-Tree

This project was implemented as semestral work for class Searching Web and Multimedia Databases at Czech Technical University, Faculty of Information Technology.

Authors: @tousekjan, @prixladi

R-Tree

This document contains the project documentation for the semester project for the BI-VWM course. It is an implementation of R-tree and web-based graphical interface that allows to perform insert operations over the tree, query and save operations.

Project description

The goal of this work was to create a custom implementation of R-Tree. R-tree is hyerarchical data structure, based on B+-tree. It is used for indexing d-dimensional objects that are represented by their Minimum Bounding Rectangle, the MBR. [1]

Inputs

Inputs are made up of randomly generated data or data entered by the user. However, thanks to the API created, it is possible to easily import into any data into the tree, as long as it is points in a d-dimensional space.Project description

Output

The output is then the results of searches using different types of queries (Range query, Nearest neighbour query...).

Solution

The objects the tree works with are points in d-dimensional space. These points are stored in the leaves of the tree. Each node of the tree is defined by its MBR, which contains all nodes - if the node is internal, or points if it is a leaf.

Implemented methods

We have implemented the following methods for the R-Tree.

Insert

Allows you to insert a new point into the tree. We used the Insert algorithm, described in the book R-trees: theory and applications. [1] we modified it for our purposes because we're working with points. When inserting a new point into the tree, it depends on the node splitting algorithm used, if the node fills up. We have implemented three different algorithms - linear, quadratic and exponential. Comparison of these algorithms and their advantages and disadvantages are described in more detail in section Experiments.

Contains

Returns a bool value indicating whether the tree contains the specified point.

RangeQuery

For a given point and range, returns all points that are within the given range.

NNQuery

Nearest Neighbor query. Returns the point that is closest to the specified point.

KNNQuery

Returns the $k$ closest points to the given point.

Implementation

Architecture

We decided to create a user interface for the R-tree in the form of web application. This interface communicates with the API that the server works with the R-tree.

Diagram

Used technologies

We wrote the backend and API in C# using the .NET Core framework. The web application is created in JavaScript. We have decided to use these technologies because we have a good experience with them and because they provided us all the functionality we needed. As a development environment we used Visual Studio 2017 Community and Visual Studio Code.

Used libraries

Backend

Newtonsoft.Json – Json framework for .NET [3]

NETStandard.Library – meta library for .NET core [4]

API

Microsoft.AspNetCore.All – default API for ASP.NET Core applications [5]

Microsoft.NETCore.App – default API for ASP.NET Core applications [6]

Microsoft.Extensions.PlatformAbstractions – abstraction for .NET Framework, .NET Core and Mono API [7]

Swashbuckle.AspNetCore – Swagger tool to generate API documentation [8]

Frontend {#frontend .unnumbered}

React – JavaScript knihovna pro tvorbu uživatelských rozhraní [9]

Ant Design – knihovna pro grafický design webových aplikací [10]

Konva – knihovna pro kreslení ve 2d [11]

Requirements


To run the project the following needs to be installed

.NET Core SDK – .NET platforma [12]

Node.js – JavaScript runtime [13]

Node package manager – správce JavaScript balíčků [14]

Run on Windows platform

Backend

To build the backend part you need to move from the root directory project to the API folder.

cd ./server/Vwm.RTree.Api

And then run the powershell script BuildAndRun.ps1, which will build and run API project in the Release configuration. The API is then available by default on at http:https://localhost:8080/

param(
[switch] $noBuild)

if(!$noBuild)
{
    dotnet build --configuration Release
}
dotnet exec .\bin\Release\netcoreapp2.2\Vwm.RTree.Api.dll

Frontend

To build the frontend, move to the front folder and run commands npm install and npm start. The web application is running by default at http:https://localhost:3000/

cd ./front
npm install
npm start

Example output

We have created a web application for inputting inputs and displaying outputs, that allows you to use all the methods implemented on the R-tree. Below we will describe the tabs that this application contains and add screenshots with examples of inputs and outputs.

Web Application

Add Point

Allows you to add points to the R-tree. AddPoint

Queries

In the query interface, the user has two options to perform a query.

Perform Query

  • performs a query over an R-tree, returns the results and displays the duration of the query

Perform query with benchmark

  • execute the query over both the R-tree and a linear list and display both results and the duration of both queries for possible comparison

Range Query

Allows you to specify the point and range in which the results should be located. Returns and displays the searched results. RangeQuery

NN Query (Nearest Neighbors)

Allows you to enter a point and find the closest point to it. NNQuery

KNN Query (K-Nearest Neighbors)

Allows you to specify a point and find the $k$ points that are closest to it. KNNQuery

Contains

Scans the R-Tree to see if it contains the specified point.

Data

On the data tab, you can view the tree structure, including the points that the tree contains. Here we can also find the Replace button with fresh, which displays a form to create a new tree. In this form, we can specify tree parameters such as the number of dimensions, the maximum number of nodes in a node, the maximum number of points in a leaf node. We can also randomly generate a specified number of nodes.

Data Replace

With the Take a snapshot button we can save the current state of the tree to disk and with the saved snapshot and work with it on the Snapshots tab.

Visualize

On the visualize tab is a canvas that graphically displays the tree points and MBR nodes if the tree is dimension 2. This tab shows how the points are stored in the nodes and how other nodes are sequentially extracted and rectangles. In this visualization, you can observe the differences between the methods used to divide the full nodes. In the example shown in the figure Quadratic split is used.

Visualize

Snapshots

The Snapshots tab is the interface for working with saved tree snapshots. A snapshot of a tree can be loaded, deleted, or downloaded in JSON format.

Snapshot

Experiments

When creating an R-tree, we can set several parameters.

  • Number of dimensions

  • Max number of children

  • Max number of points in leaf

  • Split policy

It is possible to change these parameters and monitor the speed of creation of the tree or the speed of queries over the tree depending on these parameters. We can also compare the query performance over the tree with the speed of queries executed over the same data, but stored in a linear data structure.

Speed of inserting points

In this test, we generated 100,000 random points, which we inserted into the tree and measured the speed of this operation as a function of the parameters Max number of children and Max number of points in leaf, which are both set to the same value, given on the $x$-axis. Parameter Number of dimensions was set to 3. We ran the test ten times and averaged the results for higher precision.

In the graph we can see that while the Linear and Quadratic algorithms increasing the number of Max children still run for almost the same amount of time, Exponential algorithm slows down significantly as expected.

Insert-1

Speed of queries

In the query speed test we set Number of dimensions to 2 and Max number of children to 15. Then we randomly generated 2 000 000 points, which were inserted into the tree. On this tree, we executed 1,000 random queries. Range query - selected random point and random range, KNN query - selected a random point and searched for 100 nearest neighbors.

Each query was executed 1000 times and the times were summed for each executions. We measured data structure creation time, run time of the range query and the run time of the KNN query.

Linear search

  • Creating a data structure for a linear search is very fast, while queries are slow compared to R-tree, and the especially KNN query

R-tree - Linear split

  • it takes longer to create a tree, however we can depend on the speed of the queries. a significant improvement can be observed

R-tree - Quadratic split

  • it takes the longest time to create the tree, but on the other hand the tree achieves the best results when comparing query speed

Query-1

Discussion

The experiments turned out as expected. Inserting new points into the tree with using an exponential algorithm to split nodes proved to be very slow. We therefore expect that the algorithm will be most useful quadratic.

Both Linear split and Quadratic split algorithms achieve very good results compared to linear search. The biggest improvement is seen in execution of KNN queries, which is roughly 3 times faster with Quadratic split, than Linear split and even 25 times faster than linear search.

Possible improvements - we could also add a delete operation to the R-tree element. Related to this operation is the operation to reduce the number of tree levels, when the number of elements in a node falls below a certain threshold.

Conclusion

The goal was to create an R-Tree, and we succeeded. Thanks to the created API the R-tree can easily be used for some real data. So it should be relatively easy to use this implementation in a real project, that works with points in space. For both range queries and KNN queries the tree is several times faster than a linear search and generally both the tree and the web application return results correctly and quickly.

References

[1] MANOLOPOULOS, Yannis. R-trees: theory and applications. London: Springer, c2006. Advanced information and knowledge processing. ISBN 18-523-3977-2.

[2] Nearest Neighbour Queries [online]. [cit. 2019-05-07]. Available from: https://infolab.usc.edu/csci599/Fall2009/slides/Nearest%20Neighbor%20Queries%20Slides.pdf

[3] Json.NET - Newtonsoft [online]. [cit. 2019-05-07]. Available from: https://www.newtonsoft.com/json

[4] NETStandard.Library [online]. [cit. 2019-05-07]. Available from: https://www.nuget.org/packages/NETStandard.Library/

[5] Microsoft.AspNetCore.All [online]. [cit. 2019-05-07]. Available from: https://www.nuget.org/packages/Microsoft.AspNetCore.All/

[6] Microsoft.NETCore.App [online]. [cit. 2019-05-07]. Available from: https://www.nuget.org/packages/Microsoft.NETCore.App

[7] Microsoft.Extensions.PlatformAbstractions [online]. [cit. 2019-05-07]. Available from: https://www.nuget.org/packages/Microsoft.Extensions.PlatformAbstractions/

[8] Swashbuckle.AspNetCore [online]. [cit. 2019-05-07]. Available from: https://www.nuget.org/packages/Swashbuckle.AspNetCore/5.0.0-rc2

[9] React [online]. [cit. 2019-05-07]. Available from: https://reactjs.org/

[10] Ant Design [online]. [cit. 2019-05-07]. Available from: https://ant.design/

[11] Konva [online]. [cit. 2019-05-07]. Available from: https://konvajs.org/

[12] Download .NET [online]. [cit. 2019-05-08]. Available from: https://dotnet.microsoft.com/download

[13] Node.js [online]. [cit. 2019-05-08]. Available from: https://nodejs.org/en/

[14] Node package manager [online]. [cit. 2019-05-08]. Available from: https://www.npmjs.com/

About

R-Tree implementation

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages