knnMeetsConnectedComponents

Publications

Alessandro Lulli, Thibault Debatty, Laura Ricci, Matteo Dell’Amico, and Pietro Michiardi, Scalable k-NN based text clustering, Accepted ad IEEE BigData 2015

How to build

The project can be built using Maven. From the main dir: mvn package

How to run

The main class is: util.KnnMeetsConnectedComponents

It is possible to execute the job in two ways:

Submit a job to your Spark environment
use the script in run/runKnnMeetsConnectedComponents.sh

It is required also to provide under the lib folder the Spark lib. A pre-built Spark lib can be downloaded from the following URL: https://www.dropbox.com/s/xnfqs0ht4nqv5lc/spark-assembly-1.2.0-hadoop2.2.0.jar?dl=0

Configuration

The application requires a configuration file. An example of configuration file is: run/config_knnMeetsCC

Dataset format

The application requires the following format:

vertexIdentifierseparatorstringValue

Where separator can be configured using the edgelistSeparator configuration variable An example is: run/subjectSmall

Contact

In case of any issues / suggestions or to have further details please contact: [email protected] http:https://www.di.unipi.it/~lulli

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
run		run
src/main/java		src/main/java
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
pom.xml		pom.xml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

knnMeetsConnectedComponents

Publications

How to build

How to run

Configuration

Dataset format

Contact

About

Releases

Packages

Languages

License

alessandrolulli/knnMeetsConnectedComponents

Folders and files

Latest commit

History

Repository files navigation

knnMeetsConnectedComponents

Publications

How to build

How to run

Configuration

Dataset format

Contact

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages