Home

Jump to bottom Edit New page

ceys edited this page May 12, 2014 · 11 revisions

###算法描述与应用文档：

####聚类

####推荐

###开发相关：

稀疏数据结构

###TODO：

1. 监控：
方便的查看web ui
For instance, a Ganglia dashboard can quickly reveal whether a particular workload is disk bound, network bound, or CPU bound.
OS profiling tools such as dstat, iostat, and iotop can provide fine-grained profiling on individual nodes.
JVM utilities such as jstack for providing stack traces, jmap for creating heap-dumps, jstat for reporting time-series statistics and jconsole for visually exploring various JVM properties are useful for those comfortable with JVM internals.
2. 性能：
按照参数，对开发的算法做性能调优。
https://spark.apache.org/docs/latest/configuration.html
https://spark.apache.org/docs/latest/tuning.html
3. DBSCAN利弊与参数选择调研
https://en.wikipedia.org/wiki/DBSCAN
4. dbscan spark与单机对比：
spark集群配置，计算时间，计算资源
与单机效果对比
与单机性能对比