Skip to content

Graph analysis at scale with browser based visualisation

License

Notifications You must be signed in to change notification settings

TrevorDArcyEvans/GraphML

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

GraphML

GraphML analyses graphs for the following measures:

ranked shortest paths

These calculations help your users understand ways to travel through (or ‘traverse’) a network.

The distance function measures how many hops apart two nodes are in a network. Shortest path highlights the route that passes through the lowest number of nodes.

Hops can also be weighted, meaning you can calculate actual distances, as well as the number of hops.

Wikipedia

finding communities

Uses Louvain method for finding communities in large networks as described in [Blondel et. al, 2008]. The main concept is that of network modularity that assesses the quality of the current community partition. The algorithm works by successively improving the network's modularity by trying to change the community that each node belongs to. If there is no improvement in modularity this means that the best community partition has been found.

Wikipedia

finding duplicates

Uses Double Metaphone phonetic encoding algorithm to find potentially duplicate entities.

Wikipedia

Social Network Analysis (SNA)

closeness

This is the measure that helps you find the nodes that are closest to the other nodes in a network, based on their ability to reach them.

To calculate this, the algorithm finds the shortest path between each node, then assigns each node a score based on the sum of all the paths.

Nodes with a high closeness value have a lower distance to all other nodes. They’d be efficient broadcasters of information.

Wikipedia

betweeness

Nodes with a high betweenness centrality score are the ones that most frequently act as ‘bridges’ between other nodes. They form the shortest pathways of communication within the network.

Usually this would indicate important gatekeepers of information between groups.

Wikipedia

degree The degree centrality measure finds nodes with the highest number of links to other nodes in the network.

Nodes with a high degree centrality have the best connections to those around them – they might be influential, or just strategically well-placed.

Wikipedia

Prerequisites

  • Mandatory:

    • host:
      • Linux
      • Windows (not tested but should work)
    • target:
      • Linux (services)
      • Google Chrome web browser
    • .NET Core SDK v5.0
    • integrated development environment:
      • Visual Studio Code (Linux or Windows)
      • JetBrains Rider (Linux or Windows)
      • Visual Studio (Windows)
    • nodejs
    • git
    • Google Chrome web browser
      • primary web client
      • does not work with Apache ActiveMQ admin page
    • Firefox web browser
      • required to view Apache ActiveMQ admin page
    • database:
      • Microsoft SQL Server
      • MySQL or MariaDB
      • PostgreSQL
      • SQLite (local development only)
    • message queue:
    • results store:
  • Optional

Getting Started

Building

  1. clone repo
  git clone https://github.com/TrevorDArcyEvans/GraphML.git
  1. build
  dotnet restore
  dotnet build
  1. run tests
  dotnet test
  1. run code coverage
  dotnet test /p:CollectCoverage=true /p:CoverletOutputFormat=opencover
  1. generate code coverage report
  reportgenerator -reports:**/coverage.opencover.xml -targetdir:./CodeCoverage
  1. generate documentation
  doxygen

open documentation

Back End

  1. run API
  export ASPNETCORE_ENVIRONMENT=Development
  cd GraphML.API/bin/Debug/net5.0 
  ./GraphML.API
  1. open Swagger UI
  2. start Apache ActiveMQ
  3. start Redis
  4. run IdentityServer4
  export ASPNETCORE_ENVIRONMENT=Development
  cd IdentityServerAspNetIdentity/bin/Debug/net5.0
  ./IdentityServerAspNetIdentity
  1. open IdentityServer4 Login
  2. open IdentityServer4 Discovery Document
  3. run Analysis Server
  export ASPNETCORE_ENVIRONMENT=Development
  cd GraphML.API/bin/Debug/net5.0 
  ./GraphML.Analysis.Server
  1. open Apache ActiveMQ management console
  2. start Redis Commander
  redis-commander --port 8080
  1. open Redis Commander management console
Front End/s

GraphML.UI.Web

  export ASPNETCORE_ENVIRONMENT=Development
  cd GraphML.UI.Web/bin/Debug/net5.0
  ./GraphML.UI.Web

open https://localhost:5002/

Docker
  docker-compose build
  docker-compose up

open https://localhost:5002/

Environment Variables

Backend API

Variable Description Example Value
ASPNETCORE_ENVIRONMENT ASP.NET Core runtime environment Production, Development, Test
API_URI API server URL

used by GraphML.API.Server to retrieve data

DATASTORE_CONNECTION SqLite
DATASTORE_CONNECTION_TYPE SqLite
DATASTORE_CONNECTION_STRING Data Source=|DataDirectory|Data/GraphML.sqlite3;Foreign Keys=True;
LOG_CONNECTION_STRING .NET connection string for database logging
RESULT_DATASTORE Redis URL localhost:6379
MESSAGE_QUEUE_URL Apache ActiveMQ URL activemq:tcp:https://localhost:61616
MESSAGE_QUEUE_NAME GraphML
MESSAGE_QUEUE_POLL_INTERVAL_S time in seconds between checking for new analysis jobs 5
MESSAGE_QUEUE_USE_THREADS False

Overview

GraphML.Overview

Architecture

Overview

GraphML.Architecture

Analysis

Components

The following components are used to analyse a graph:

Data flow

GraphML.Analysis

Data Model

Classes

GraphML.Classes

Composition

GraphML.Composition

Description
Base Abstract entities which are ancestors for other GraphML entities.
  • Item
    • Ultimate ancestor of all GraphML objects.
    • Models something which can be persisted.
    • Every item ultimately belongs to an Organisation
  • OwnedItem
    • Something which has an immediate owner, other than an Organisation
Containers Entities which serve as a holding place for other entities.
  • Organisation
    • Typically a company, organisation or other legal entity in which people work together.
      • police force
      • GCHQ
      • FBI
      • military
      • bank
    • Used to isolate information between different Organisations
    • Id and OrganisationId must be the same
  • RepositoryManager
    • A means to group a subset of Repository in an Organisation in some logical manner.
    • For example, repositories could be grouped at a departmental level eg 'Financial Fraud' or 'Credit Control'.
    • ItemAttributeDefinition are held at RepositoryManager level so they can be shared across Repository.
  • Repository
    • A complete collection of Node and Edge representing an area of interest.
  • Graph
    • A subset of Nodes and Edges from a Repository which have been extracted for separate analysis.
    • A Graph may be directed; in contrast to a Repository, which has no notion of direction.
  • Chart
    • A 2D pictorial representation of a subset of Nodes and Edges from a Graph.
    • Generally used to visualise analysis results.
    • Default implementation is a Diagram.
    • Layout algorithms can be applied to change the position of Nodes and Edges.
  • Timeline
    • A 2D pictorial representation of a subset of Nodes and Edges from a Graph.
    • Generally used to visualise temporal (time based) data.
    • Default implementation is a gantt chart.
Graph
  • RepositoryItem
    • Something which is in a Repository, either a Node or an Edge
  • Node
    • A vertex representing something of interest.
    • A Node may be connected to zero or one other Nodes by an Edge
    • A Node may have properties associated with it via an NodeItemAttribute
  • Edge
    • A link connecting two Node.
    • An Edge may have a 'weight/s' (or other properties) associated with it via an EdgeItemAttribute
    • An Edge is not directed 'per se'; this is set on the Graph

  • GraphItem
    • Something which is in a Graph, either a GraphNode or a GraphEdge
  • GraphNode
    • A Node which appears in a Graph.
    • Name may be different to that of underlying Node
  • GraphEdge
    • An Edge which appears in a Graph.
    • Name may be different to that of underlying Edge

  • ChartItem
    • Something which is in a Chart, either a ChartNode or a ChartEdge
  • ChartNode
    • A Node which appears in a Chart.
    • Name may be different to that of underlying Node
  • ChartEdge
    • An Edge which appears in a Chart.
    • Name may be different to that of underlying Edge
Attributes ItemAttributeDefinition are held at RepositoryManager level so they can be shared across Repository.
  • ItemAttributeDefinition
    • Defines shape (name and data type) of information in an ItemAttribute
  • RepositoryItemAttributeDefinition
    • Defines shape of information in a RepositoryItemAttribute
  • GraphItemAttributeDefinition
    • Defines shape of information in a GraphItemAttribute
  • NodeItemAttributeDefinition
    • Defines shape of information in a NodeItemAttribute
  • EdgeItemAttributeDefinition
    • Defines shape of information in an EdgeItemAttribute

  • ItemAttribute
    • Additional information attached to an Item
  • RepositoryItemAttribute
    • Additional information attached to a Repository
  • GraphItemAttribute
    • Additional information attached to a Graph
  • NodeItemAttribute
    • Additional information attached to a Node
  • EdgeItemAttribute
    • Additional information attached to an Edge

  • Currently supported data types:
    • string
    • bool
    • int
    • double
    • DateTime (UTC)
    • DateInterval (UTC)
Support
  • Contact
    • A person identified by their email address.
    • The email address (Name) is used to link authentication (IdentityServer4) to Role.
  • Role
    • The function performed by a Contact in the context of GraphML.
    • There are several, predefined functions in Roles
    • A Contact may have one or more Roles
  • Roles
    • User roles within GraphML

Authentication & Authorisation

Roles and Users
  • enable Development mode by setting env var:
  export ASPNETCORE_ENVIRONMENT=Development
  • authentication (who you are) is handled by IdentityServer
  • authorisation (what you can do) is handled by GraphML, based on an email claim
  • security is role based, with the following predefined roles:
Role Description
User An entity using GraphML
UserAdmin An entity managing a subset of data within GraphML, typically data belonging to a single organisation
Admin An entity managing all data within GraphML
  • the above roles are owned by System organisation
  • SwaggerUI is only enabled in Development mode
  • SwaggerUI authentication will redirect to a login screen in IdentityServer
  • GraphML and IdentityServer4 have some test users:
UserName Password Email Roles Notes
alice Pass123$ [email protected] Admin system wide admin
bob Pass123$ [email protected] none known to IdentityServer4 but not GraphML
carol Pass123$ [email protected] UserAdmin
dave Pass123$ [email protected] User
eric Pass123$ [email protected] User
How to add a new user
  • add user to GraphML
    • GraphML:./GraphML.Datastore.Database/Data/Import.sql
    • import into database
  • add user to IdentityServer4
    • GraphML:./IdentityServerAspNetIdentity/SeedData.cs
    • import into database
      ./IdentityServerAspNetIdentity.exe /seed

User Interface

A reference browser based GUI is provided. This is written in Blazor and uses the following components:

At this stage, printing is limited to using the web browser's native printing. Export to PDF (or other formats) is not supported by the current diagramming component (Z.Blazor.Diagrams) but may be possible with other components eg Syncfusion or Blazor.Diagrams. Obviously, replacing such a fundamental component is risky and difficult.

Icons should be 32x32 pixels in size and are resized to this for display.

There are many sources of free or low cost icons on the internet eg:

Sample data

Real world, large datasets can be obtained from:

Multi-Tenancy

At this stage, multi-tenancy isolation is implemented in GraphML.Logic:

  • GraphML.Logic.Validators
    • does the initial call even make sense
    • only allow calls on items which caller is allowed to access
  • GraphML.Logic.Filters
    • only return items relevant to the caller
    • only return items caller is allowed to see

Future work will change to a database-per-client type of isolation which is better suited to high security environments. This will make validators and filters redundnant as all calls are guaranteed to come from the same organisation. In turn, this will make the Organisation entity redundant.

Alternatively, a dedicated deployment per organisation would achieve a similar effect at the expense of managing each deployment.

Third Party Bugs

Z.Blazor.Diagrams

MatBlazor

Misc

Port Allocations
Service Port Notes
IdentityServerAspnetIdentity 44387
GraphML.API 5001
GraphML.UI.Web 5002
Apache ActiveMQ 61616
Apache ActiveMQ console 8161
Redis 6379
Redis Commander 8080 default port 8081
Microsoft SQL Server 1443
MariaDB 3306
PostgreSQL 5432
Apache ActiveMQ

You can monitor ActiveMQ using the Web Console by pointing your browser at https://localhost:8161/admin .
From ActiveMQ 5.8 onwards the web apps is secured out of the box.
The default username and password is admin/admin.

There seems to be a problem accessing the Web Console from Google Chrome, so it is recommended to use Firefox (or Microsoft Edge).

Redis

Redis on Windows

Recommended method is to use a Docker container:

  docker pull redis
  docker run -p 6379:6379 redis

Alternate method is to install and run Redis on WSL:

https://redislabs.com/blog/redis-on-windows-10/

  sudo apt install redis-server
  sudo service redis-server status
  sudo service redis-server start
  sudo service redis-server stop

Redis Commander

  npm install -g redis-commander
  redis-commander --port 8080

open Redis Commander management console

Pro Tip : to reset the database, use flushdb

Markdown Viewer

This document is best view in Google Chrome with the Markdown Viewer extension. Remember to enable access to file urls in the settings.

Further Work

  • update ranked shortest path to support temporal analysis
    • going forward in time eg for financial transactions or phone calls
    • support DateTimeInterval
    • should be able to transform graph such that links which go backwards in time have infinite weight
    • provide UI to select time attribute
  • really improve timeline visualisation
    • probably best to invest in Syncfusion diagramming component (!)
  • improve printing/export
    • probably best to invest in Syncfusion diagramming component (!)
  • support AMQP
  • support other datastores
  • unit tests