- Most connected user in social graph based on graph degrees.
- Degree of separation for single user based on Breadth-first search with Pregel. Input is user's id - output is list of tuple with users and degree of separation between them.
- Degree of separation between two defined users, as degree of separation for the single user, it's based on Breadth first search with Pregel. Input is two user's ids - output is degree of separation between them.
- Measuring importance/rating of users with PageRank with two strategies:
iterative
anduntil convergence (Pregel based)
. - Connected component for social graph - under the hood it delegates to Pregel.
- Triangle count - the number of triangles passing through each vertex.
- Page Rank for defined users: dynamic and iterative versions.
PageRank measures the importance of each vertex in
a social graph. Spark allows to build PageRank with two strategies: dynamically
, this implementation uses the
Pregel interface and runs PageRank until convergence
and iterative
, it runs PageRank for a fixed number of iterations.
Dynamically approach with strategy until convergence
denotes that computation will continue until
[R(t+1)-R(t)] < e
. Convergence is achieved when the error rate for any vertex in the graph falls below
a given threshold. The error rate of a vertex is defined as the difference between the “real” score of the vertex R(Vi)
and the score computed at iteration k, R^K(Vi)
Since the real score is not known apriori, this error rate is
approximated with the difference between the scores computed at two successive iterations: R(t+1)
and R(t)
.
//dynamic version
val dynamicRank = socialGraph.graph.pageRank(tol = 0.0001)
//iterative version
val iterativeRank = socialGraph.graph.staticPageRank(numIter = 20)
Launching with SBT
like a simple batch job:
>> sbt
>> runMain com.github.graphx.pregel.jobs.social.SocialGraphJoB
In graph theory, the shortest path problem is the problem of finding a path between two vertices (or nodes) in a graph such that the sum of the weights of its constituent edges is minimized.
This project solves shortest path problem
with Dijkstra's
algorithm with relying on Pregel algorithm for
propagating messages.
The next code snapshot demonstrates how easy you can implement similar algorithms:
initialGraph.pregel(Double.PositiveInfinity)(
(_, dist, newDist) => math.min(dist, newDist),
triplet => {
//Distance accumulator
if (triplet.srcAttr + triplet.attr < triplet.dstAttr) {
Iterator((triplet.dstId, triplet.srcAttr + triplet.attr))
} else {
Iterator.empty
}
},
(a, b) => math.min(a, b)
)
Connected Components algorithm labels each connected component of the graph with the ID of its lowest-numbered vertex. For example:
val component = graph.connectedComponents().vertices
Variable component
has type Graph[VertexId, ED]
and contains
tuple of vertex id and lowest vertex id in a component.
Launching with SBT
:
sbt runMain com.github.graphx.pregel.ssp.demo.ShortestPathProblemDemo
Social graph is represented in the next form: one user -> multiple friends
Datasets
has the next structure: user_id array of related user's ids
,
file is stored in resources directory - UserGraph.txt
. For example, user with id 5988
has
friends with ids 748 1722 3752
will be represented in pretty obvious form:
5988 748 1722 3752
Tested data is stored in resource directory - UserNames.tsv
- file in
format user_id -> user_name
, necessary for joining with graph of contacts.
It can be treated as a simple tuple - id from UserGraph.txt
and name of vertex of graph.
Names of user are random, any coincidence are accidental.