GitHub - saleem-muhammad/CostFed: Cost-Based Query Optimization for SPARQL Endpoint Federation

CostFed: Cost-Based Query Optimization for SPARQL Endpoint Federation

CostFed is an index-assisted federation engine for federated SPARQL query processing over multiple SPARQL endpoints. CostFed makes use of statistical information collected from endpoints to perform efficient source selection and cost-based query planning. In contrast to the state of the art, it relies on a non-linear model for the estimation of the selectivity of joins. Therewith, it is able to generate better plans than the state-of-the-art federation engines. In an experimental evaluation based on FedBench benchmark, we show that CostFed is 3 to 121 times faster than the state of the art SPARQL endpoint federation engines.

Live Demo (For ISWC 2017)

The CostFed live demo comprise the following two main applications:

The endpoint manager is is available here. Using endpoint manager you can select the endpoints to be included in the federation. Also it allows to create/update CostFed's indexes.
The query formulator/executer is availble here. This is the main interface which allows executing both federated and non-federated queries.

To help user, we provided some federated queries here from FedBench and LargeRDFBench which can be directly executed.

How to Run CostFed?

Checkout: the source code and import as new maven project. it will create three sub-projects, i.e, costfed, fex, and semagrow-bench.
Create Index: Since CostFed is an index-assisted appraoch, the first step is to generate an index for all the endpoints in hand. The index generation, updation is given costfed/src/main/java/org/aksw/simba/quetsal/util/TBSSSummariesGenerator.java. Note for FedBench, LargeRDFBench, the index is already given at costfed/summaries/sum-localhost.n3.
Configuration File: Set properties in /costfed/costfed.props or run with default
Query Execution: costfed/src/main/java/org/aksw/simba/start/QueryEvaluation.java. Here you need to specify the URLs of the SPARQL endpoints which you want the given query to be federated and provide the configuration file, i.e., costfed.props as argument.

Used Benchmarks

The queries used in the evaluation can be downloaded from FedBench and LargeRDFBech homepage.

Datasets Availability

All the datasets and corresponding virtuoso SPARQL endpoints can be downloaded from the links given below. You may start a SPARQL endpoint from bin/start.bat (for windows) and bin/start_virtuoso.sh (for linux).

Dataset	Data-dump	Windows Endpoint	Linux Endpoint	Local Endpoint Url	Live Endpoint Url
ChEBI	Download	Download	Download	your.system.ip.address:8890/sparql	-
DBPedia-Subset	Download	Download	Download	your.system.ip.address:8891/sparql	https://dbpedia.org/sparql
DrugBank	Download	Download	Download	your.system.ip.address:8892/sparql	https://wifo5-04.informatik.uni-mannheim.de/drugbank/sparql
Geo Names	Download	Download	Download	your.system.ip.address:8893/sparql	https://factforge.net/sparql
Jamendo	Download	Download	Download	your.system.ip.address:8894/sparql	https://dbtune.org/jamendo/sparql/
KEGG	Download	Download	Download	your.system.ip.address:8895/sparql	https://cu.kegg.bio2rdf.org/sparql
Linked MDB	Download	Download	Download	your.system.ip.address:8896/sparql	https://www.linkedmdb.org/sparql
New York Times	Download	Download	Download	your.system.ip.address:8897/sparql	-
Semantic Web Dog Food	Download	Download	Download	your.system.ip.address:8898/sparql	https://data.semanticweb.org/sparql
Affymetrix	Download	Download	Download	your.system.ip.address:8899/sparql	https://cu.affymetrix.bio2rdf.org/sparql

Evaluation Results and Runtime Errors

We have compared 5 - FedX, SPLENDID, ANAPSID, SemaGrow, HiBISUCuS - state-of-the-art SPARQL endpoint federation systems with CostFed. Our complete evaluation results can be downloaded from here.

Authors

Alexander Potocki (AKSW, University of Leipzig)
Muhammad Saleem (AKSW, University of Leipzig)
Tommaso Soru (AKSW, University of Leipzig)
Olaf Hartig (Linköping University, Sweden)
Axel-Cyrille Ngonga Ngomo (AKSW, University of Leipzig)

We are especially thankful to Andreas Schwarte (fluid Operations, Germany), Olaf Görlitz (University Koblenz, Germany), and Angelos Charalambidis (Institute of Informatics and Telecommunication, Paraskevi, Greece) for all their email conversations, feedbacks, and explanations.

Name		Name	Last commit message	Last commit date
Latest commit History 91 Commits
costfed-web		costfed-web
costfed		costfed
fedx		fedx
queries		queries
semagrow-bench		semagrow-bench
stats		stats
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
Results.xlsx		Results.xlsx
desc.pdf		desc.pdf
desc.tex		desc.tex
pom.xml		pom.xml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

CostFed: Cost-Based Query Optimization for SPARQL Endpoint Federation

Live Demo (For ISWC 2017)

How to Run CostFed?

Used Benchmarks

Datasets Availability

Evaluation Results and Runtime Errors

Authors

About

Releases

Packages

Languages

License

saleem-muhammad/CostFed

Folders and files

Latest commit

History

Repository files navigation

CostFed: Cost-Based Query Optimization for SPARQL Endpoint Federation

Live Demo (For ISWC 2017)

How to Run CostFed?

Used Benchmarks

Datasets Availability

Evaluation Results and Runtime Errors

Authors

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages