Skip to content
ceys edited this page May 12, 2014 · 11 revisions

###算法描述与应用文档:

####聚类

DBSCAN

####推荐

MF-ALS

###开发相关:

稀疏数据结构

git相关

###TODO:

  • 1. 监控:

  • 方便的查看web ui

  • For instance, a Ganglia dashboard can quickly reveal whether a particular workload is disk bound, network bound, or CPU bound.

  • OS profiling tools such as dstat, iostat, and iotop can provide fine-grained profiling on individual nodes.

  • JVM utilities such as jstack for providing stack traces, jmap for creating heap-dumps, jstat for reporting time-series statistics and jconsole for visually exploring various JVM properties are useful for those comfortable with JVM internals.

  • 2. 性能:

  • 按照参数,对开发的算法做性能调优。

  • https://spark.apache.org/docs/latest/configuration.html

  • https://spark.apache.org/docs/latest/tuning.html

  • 3. DBSCAN利弊与参数选择调研

  • https://en.wikipedia.org/wiki/DBSCAN

  • 4. dbscan spark与单机对比:

  • spark集群配置,计算时间,计算资源

  • 与单机效果对比

  • 与单机性能对比

Clone this wiki locally