Name		Name	Last commit message	Last commit date
Latest commit History 18 Commits
Build_Library		Build_Library
Data_Analysis		Data_Analysis
Parallel_Algorithm		Parallel_Algorithm
.gitignore		.gitignore
README.md		README.md

Repository files navigation

CS205 2017 Spring Final Project

Group Members (in alphabetical order):
Jiahua Guo
Jiachen Song
Xinyuan Wang
Jiawei Zhuang

Parallel software solution

We choose MPI + OpenMP/OpenACC/CUDA as our heterogenous computing environment.

Data science problem

Many huge data sets are now publicly available. There are several ways to turn those large amounts of data into useful knowledge. Here we focus on exploratory data analysis, or unsupervised machine learning, which means finding structural information without prior knowledge.

Among all the unsupervised learning methods, k-means is a commonly used algorithm, which partitions observations into k clusters in which each observation belongs to the cluster with the nearest mean. Finding the minimum of a k-means cost function is a NP-hard problem when the dimension d>1 and the number of clusters k>1. Scientists came up with several heuristic methods to find the local minimum, but the process is still highly computationally-intensive, especially with huge data sets. We want to implement a parallel version of a k-means heuristic method on a cluster of machines, to significantly speed up the computing time of the clustering process, without any reduction on the accuracy rate of the clustering model.

Some data set options

(Preliminary plan. Might change in the future.)

Hubway system data:
https://www.thehubway.com/system-data

Airbnb data:
http:https://data.beta.nyc/dataset/inside-airbnb-data

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

CS205 2017 Spring Final Project

Parallel software solution

Data science problem

Some data set options

About

Releases

Packages

Contributors 4

Languages

JiaweiZhuang/CS205_final_project

Folders and files

Latest commit

History

Repository files navigation

CS205 2017 Spring Final Project

Parallel software solution

Data science problem

Some data set options

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 4

Languages

Packages