As an increasing amount of statistical data is published as RDF, intuitive ways of satisfying information needs and getting new insights out of this type of data becomes increasingly important. Question answering systems provide intuitive access to data by translating natural language queries into SPARQL, which is the native query language of RDF knowledge bases. Statistical data, however, is structurally very different from other data and cannot be queried using existing approaches. Building upon a question corpus established in previous work, we created a benchmark for evaluating questions on statistical Linked Data in order to evaluate statistical question answering algorithms and to stimulate further research. Furthermore, we designed a question answering algorithm for statistical data, which covers a wide range of question types. To our knowledge, this is the first question answering approach for statistical RDF data and could open up a new research area. Apart from providing evaluation results, we discuss future challenges in this field.
- CubeQA 1.0 requires Java 11, Git and Maven 3 installed.
- further versions may requirer higher Java versions.
- Clone the project via "
git clone https://github.com/AKSW/cubeqa.git
" to get the current state. - You may checkout release 1.0 for a stable version that runs on Java 11.
If you use an IDE, you also need to download and execute lombok.jar (doubleclick it, or run java -jar lombok.jar). Follow instructions. That is because CubeQA uses Project Lombok, which removes much boilerplate from Java.
CubeQA contains a benchmark (View Benchmark) that runs on 50 datasets of LinkedSpending (Download | Browse LinkedSpending).
The benchmark source package is org.aksw.cubeqa.benchmark
.
We believe that good science should be open and reproducible. Feel free to verify our claims by running our evaluation yourself. Please contact us if you encounter issues.
- run the evaluation main classes e.g. for QALD6 Task 3 training set via
mvn compile exec:java -Dexec.mainClass="org.aksw.cubeqa.scripts.EvaluateQald6T3Train"
. - You will see the results on the console and also in the file
benchmark/qbench<timestamp>.csv
.
The evaluation code and the JUnit tests are preconfigured to use the SPARQL endpoint https://cubeqa.aksw.org/sparql but but that is not active anymore. You can install and load your own SPARQL endpoint and change the configuration to use your own endpoint as described below.
- install OpenLink Virtuoso (a different triple store may work as well) on your machine and load the datasets (see below)
- download the datasets
- upload the LinkedSpending ontology into graph https://linkedspending.aksw.org/ontology/ and add that graph to the graph group https://linkedspending.aksw.org/
- upload each .nt file into graph
https://linkedspending.aksw.org/<x>
and add them to graph group https://linkedspending.aksw.org/ - you can automate this with the
virtloadbench
script adapted to your use case - then go to the folder containing the dataset ntriples files and execute the shell command
ls | sed "s|\\.nt||" | xargs -I @ virtloadbench @.nt https://linkedspending.aksw.org/@
- alternative virtload scripts are at https://github.com/SmartDataAnalytics/aksw-commons/tree/master/aksw-commons-scripts/virtuoso
- in https:///conductor add prefixes qb: https://purl.org/linked-data/cube#, ls: https://linkedspending.aksw.org/instance/ and lso: https://linkedspending.aksw.org/ontology/
- set the URI, such as "localhost:8890" (default) in org.aksw.cubeqa.CubeSparql.java.
- start Virtuoso
CubeQA can be used as a plugin for openQA, which offers a graphical user interface.
While CubeQA is implemented in Java using Maven so it theoretically should run everywhere, it is under development, using snapshots and generally of the status of a research prototype so I don't give any guarantee of it successfully running on your machine but I'm happy to help with your questions (best to open a new issue). CubeQA was part of my PhD thesis and is not my current research topic, so I can perform maintenance only very rarely. While I do plan on creating a version 2 eventually, this will just be a quick move to Java 16. If you want to know more about current research, I recommend reading "R. Cocco, M. Atzori, and C. Zaniolo. Machine learning of SPARQL templates for Question Answering over LinkedSpending. In 2019 IEEE 28th International Conference on Enabling Technologies: Infrastructure for Collaborative Enterprises (WETICE), pages 156–161, 06 2019." (IEEE page, PDF).
The source code of CubeQA is freely available under the GPLv3 license (see the LICENSE file), which requires you to publish derivative works under the same license. If this creates a licensing conflict or for commercial usage, please contact us.