Skip to content

A Scalable Online Analysis tool for Semantic Web Data based on Apache Spark

License

Notifications You must be signed in to change notification settings

martinpz/Spark-RDF-Analyzer

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Running the Spark RDF Analyzer

In this tutorial we will show how you can run the Spark RDF Analyzer right from eclipse using a Docker container for the Tomcat webservice.

Setup Prerequisites

docker pull tomcat:8.0-jre8
  • Create a file /my/path/to/tomcat-users.xml, which will be injected into the Docker container to allow us to access the Tomcat manager and executing scripts using the text API of Tomcat.
<?xml version="1.0" encoding="UTF-8"?>
<tomcat-users xmlns="http:https://tomcat.apache.org/xml"
              xmlns:xsi="http:https://www.w3.org/2001/XMLSchema-instance"
              xsi:schemaLocation="http:https://tomcat.apache.org/xml tomcat-users.xsd"
              version="1.0">
	<user username="manager" password="manager" roles="manager-gui" />
	<user username="demo" password="demo" roles="manager-script" />
</tomcat-users>
  • Create one folder /my/path/to/data where you put in all the datasets that should be available to the running RDF Analyzer.
    The directory structure on you local system should look similar to the following:
/
+-- my
|   +-- path
|   |   +-- to
|   |   |   +-- tomcat-users.xml
|   |   |   +-- data
|   |   |   |   +-- sib200
|   |   |   |   |   +-- sib200.nt
|   |   |   |   +-- sib400
|   |   |   |   |   +-- sib400.nt
|   |   |   |   +-- storage
|   |   |   |   |   +-- parquet
  • By executing the following command, you can run the Tomcat container.
    This will overwrite the default Tomcat user config and mount the local directory /my/path/to/data into the container, accessible at /home/data.
docker run \
	-dit \
	--name tomcat \
	-p 8080:8080 \
	-v /my/path/to/tomcat-users.xml:/usr/local/tomcat/conf/tomcat-users.xml \
	-v /my/path/to/data:/home/data \
	tomcat:8.0-jre8 \
&& docker logs -f tomcat
  • You will have to setup a local maven profile that will be used to overwrite the maven variables during the deployment process. This way we can ensure that real credentials are hidden from public and excluded from the POM.
    • Open your local settings file, located at ${user.home}/.m2/settings.xml.
      If the file is not already there, simply create it!
    • Insert the following lines:
<?xml version="1.0" encoding="UTF-8"?>
<settings>
        <servers>
                <server>
                        <id>tomcat-localhost</id>
                        <username>demo</username>
                        <password>demo</password>
                </server>
        </servers>

        <profiles>
                <profile>
                        <id>tomcat-localhost</id>
                        <properties>
                                <tomcat.deploy.server>tomcat-localhost</tomcat.deploy.server>
                                <tomcat.deploy.url>http:https://127.0.0.1:8080/manager/text</tomcat.deploy.url>
                        </properties>
                </profile>
        </profiles>
</settings>

Run the Spark RDF Analyzer

  • Check out the project and import it into eclipse as a maven project.
  • Set up a new run configuration for it.
    • Right click on the project > "Run as" > "Maven build..."
    • Set following properties:
      • Name: RDF Analyzer (Tomcat)
      • Goals: clean tomcat7:redeploy
      • Profiles: tomcat-localhost
      • Parameter: p.type=war
  • Click "Run" and check the console output in eclipse. The war file gets deployed to the running Tomcat instance.
  • Go back to the console, where Tomcat logs to and wait for the completion of deploament.
  • Open your web browser at http:https://127.0.0.1:8080/spark-rdfanalyzer2/ and you should see the RDF Analyzer running. Unless you changed the mount path, your datasets will be available under /home/data inside the container.
    By clicking on "Add new Graph", you will be prompted for the path where your data files reside. Enter /home/data/sib200 to add the graph for the sib 200 dataset.

About

A Scalable Online Analysis tool for Semantic Web Data based on Apache Spark

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors 4

  •  
  •  
  •  
  •