Name	Name	Last commit message	Last commit date
Latest commit History 39 Commits
experiments	experiments
include	include
logo	logo
py	py
src	src
CMakeLists.txt	CMakeLists.txt
LICENSE	LICENSE
Makefile	Makefile
README.md	README.md

Fred

A fast, scalable and light-weight C++ Fréchet distance library, exposed to python and focused on (k,l)-clustering of polygonal curves.

Ingredients C++ Backend

import Fred.backend as fred

Number of Threads

By default, Fred will automatically determine the number of threads to use. If you want to set an upper limit, call fred.set_maximum_number_threads(number).

Curve

signature: fred.Curve(np.ndarray), fred.Curve(np.ndarray, str name)
properties: fred.Curve.values: curves as np.ndarray, fred.Curve.name: get name of curve, fred.Curve.dimensions: dimension of curve, fred.Curve.complexity: number of points of curve

Curves

signature: fred.Curves()
methods: fred.Curves.add(curve): add curve, fred.Curves[i]: get ith curve, len(fred.Curves): number curves, fred.Curves.simplify(l): return set of simplified curves
properties: fred.Curves.m: maximum complexity of the contained curves

continous Fréchet distance

signature: fred.continuous_frechet(curve1, curve2)
returns: fred.Continuous_Frechet_Result with members value, time_bounds: running-time for upper and lower bound, number_searches: number of free space diagrams built, time_searches: running-time for free spaces

continuous Frechet distance config

approximation error: fred.set_continuous_frechet_epsilon(epsilon) with parameter epsilon, which defaults to 0.001
rounding (rounding up to 3 decimals): fred.set_continuous_frechet_rounding(round) with parameter round, which defaults to true

discrete Fréchet distance

signature: fred.discrete_frechet(curve1, curve2)
returns: fred.Discrete_Frechet_Result with members value and time

Clustering

Distance_Matrix

A fred.Distance_Matrix() can be used to speed up consecutive calls of fred.discrete_klcenter and fred.discrete_klmedian. As the name suggests, it stores the distances already computed.

discrete (k,l)-center clustering (continuous Fréchet) -- multiple calls

from Approximating (k,l)-center clustering for curves
signature: fred.discrete_klcenter_multi(k, l, curves, distances, center_domain, random_first_center) with parameters
- k: number of centers
- l: maximum complexity of the centers, only used when center_domain is default value
- distances: fred.Distance_Matrix
- center_domain: possible centers, defaults to empty fred.Curves(), in this case the input is simplified and used as center domain
- random_first_center: determines if first center is chosen uniformly at random or first curve is used as first center, optional, defaults to true
returns: fred.Clustering_Result with mebers
- value: objective value
- time: running-time
- assignment: empty if compute_assignment has not been called

discrete (k,l)-median clustering (continuous Fréchet) -- multiple calls

Algorithm 6 in Coresets for (k,l)-Clustering under the Fréchet distance + simplification
signature: fred.discrete_klmedian_multi(k, l, curves, distances, center_domain) with parameters
- k: number of centers
- l: maximum complexity of the centers, only used when center_domain is default value
- distances: fred.Distance_Matrix
- center_domain: possible centers, optional parameter, if not given the input is simplified and used as center domain
returns: fred.Clustering_Result with mebers
- value: objective value
- time: running-time
- assignment: empty if compute_assignment has not been called

discrete (k,l)-center clustering (continuous Fréchet) -- oneshot

from Approximating (k,l)-center clustering for curves
signature: fred.discrete_klcenter(k, l, curves, center_domain, random_first_center) with parameters
- k: number of centers
- l: maximum complexity of the centers, only used when center_domain is default value
- center_domain: possible centers, optional parameter, if not given the input is simplified and used as center domain
- random_first_center: determines if first center is chosen uniformly at random or first curve is used as first center, optional, defaults to true
returns: fred.Clustering_Result with mebers
- value: objective value
- time: running-time
- assignment: empty if compute_assignment has not been called

discrete (k,l)-median clustering (continuous Fréchet) -- oneshot

Algorithm 6 in Coresets for (k,l)-Clustering under the Fréchet distance + simplification
signature: fred.discrete_klmedian(k, l, curves, center_domain) with parameters
- k: number of centers
- l: maximum complexity of the centers, only used when center_domain is default value
- center_domain: possible centers, optional parameter, if not given the input is simplified and used as center domain
returns: fred.Clustering_Result with mebers
- value: objective value
- time: running-time
- assignment: empty if compute_assignment has not been called

Clustering Result

signature: fred.Clustering_Result
methods: len(fred.Clustering_Result): number of centers, fred.Clustering_Result[i]: get ith center, fred.Clustering_Result.compute_assignment(fred.Curves): assigns every curve to its nearest center
members: value: objective value, time: running-time, assignment: empty if compute_assignment was not called

Cluster Assignment

signature: fred.Cluster_Assignment
methods: len(fred.Cluster_Assignment): number of centers, fred.Cluster_Assignment.count(i): number of curves assigned to center i, fred.Cluster_Assignment.get(i,j): get index of jth curve assigned to center i

Dimension Reduction via Gaussian Random Projection

Section 2 in Random Projections and Sampling Algorithms for Clustering of High Dimensional Polygonal Curves
signature: fred.dimension_reduction(curves, epsilon, empirical_constant) with parameters epsilon: (1+epsilon) approximation parameter, empirical_constant: use constant of empirical study (faster, but less accurate)
returns: fred.Curves collection of curves

Installation

Get requirements under Ubuntu: make pre

Python3 installation into userdir: make install

If something does not work with Boost

Manual installation of Boost

mkdir $HOME/boost (This folder is hardcoded in setup.py, another location won't work.)
cd /tmp
wget https://dl.bintray.com/boostorg/release/1.73.0/source/boost_1_73_0.tar.gz
tar -xzf boost_1_73_0.tar.gz
cd boost_1_73_0
./bootstrap.sh --with-python=/usr/bin/python3
./b2 install --prefix=$HOME/boost

After that, go back to Freds folder and run make clean and then make install

Test

Just run python py/test.py.

Mini Example

import Fred.backend as fred
import Fred
import numpy as np
import pandas as pd

curve1d = fred.Curve(np.array([1., 2.])) # Curve stores a polygonal curve with 
                                         # at least two points of at least one 
                                         # and equal number of dimensions

curve2d1 = fred.Curve(np.array([[1., 0.], [2., 1.], [3., 0.]])) # any number of dimensions and points works
curve2d2 = fred.Curve(np.array([[1., -1.], [2., -2.], [3., -1.]]), "optional name, e.g. displayed in plot") 

print(curve2d1)

Fred.plot_curve(curve2d1, curve2d2)
Fred.plot_curve(curve2d2, fred.weak_minimum_error_simplification(curve2d2, 2))

print("distance is {}".format(fred.continuous_frechet(curve2d1, curve2d2).value))

print("download HUGE curves") 

import requests, zipfile, io             # you can use all libraries 
                                         # that work with numpy to read data into fred
                                         
re = requests.get("https://archive.ics.uci.edu/ml/machine-learning-databases/00447/data.zip", stream=True)
zf = zipfile.ZipFile(io.BytesIO(re.content))

ps1 = fred.Curve(pd.read_csv(zf.open('PS1.txt'), delimiter="\t", header=None).values[:50], "PS1")
ps2 = fred.Curve(pd.read_csv(zf.open('PS2.txt'), delimiter="\t", header=None).values[:50], "PS2")
ps3 = fred.Curve(pd.read_csv(zf.open('PS3.txt'), delimiter="\t", header=None).values[:50], "PS3")
ps4 = fred.Curve(pd.read_csv(zf.open('PS4.txt'), delimiter="\t", header=None).values[:50], "PS4")
ps5 = fred.Curve(pd.read_csv(zf.open('PS5.txt'), delimiter="\t", header=None).values[:50], "PS5")
ps6 = fred.Curve(pd.read_csv(zf.open('PS6.txt'), delimiter="\t", header=None).values[:50], "PS6")

curves = fred.Curves() # for clustering or if you want to apply dimension reduction
                       # you need to encapsulate your curves in a Curves object
              
curves.add(ps1)
curves.add(ps2)
curves.add(ps3)
curves.add(ps4)
curves.add(ps5)
curves.add(ps6)

Fred.plot_curve(curves)

curves = fred.dimension_reduction(curves, 0.95) # fred is pretty fast but with high dimensional data
                                                # a dimension reduction massively improves running-time
                                                # even for smaller values of epsilon
                                                
Fred.plot_curve(curves)
                                  
# Oneshot clustering - if you already know the value of k
                                  
clustering = fred.discrete_klcenter(2, 10, curves) # fast but coarse
          
clustering = fred.discrete_klmedian(2, 10, curves) # slow but better results

print("clustering cost is {}".format(clustering.value))

for i, center in enumerate(clustering):
    print("center {} is {}".format(i, center))
    
    
Fred.plot_curve(clustering)

# Multiple clustering calls - if you need to find a suitable value for k

dm = fred.Distance_Matrix() # computing the Fréchet distance is costly,
                            # therefore we buffer each distance already computed to
                            # speed up consecutive clustering calls
                            
for k in range(2, 6):
    
    clustering = fred.discrete_klcenter_multi(k, 10, curves, dm)
    print("clustering cost is {}".format(clustering.value))
            
    clustering = fred.discrete_klmedian_multi(k, 10, curves, dm)
    print("clustering cost is {}".format(clustering.value))
    
clustering.compute_assignment(curves)

for i in range(0, len(clustering)):
    for j in range(0, clustering.assignment.count(i)):
        print("{} was assigned to center {}".format(curves[clustering.assignment.get(i,j)].name, clustering[i].name))

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Fred

Ingredients C++ Backend

Number of Threads

Curve

Curves

continous Fréchet distance

continuous Frechet distance config

discrete Fréchet distance

Clustering

Distance_Matrix

discrete (k,l)-center clustering (continuous Fréchet) -- multiple calls

discrete (k,l)-median clustering (continuous Fréchet) -- multiple calls

discrete (k,l)-center clustering (continuous Fréchet) -- oneshot

discrete (k,l)-median clustering (continuous Fréchet) -- oneshot

Clustering Result

Cluster Assignment

Dimension Reduction via Gaussian Random Projection

Installation

If something does not work with Boost

Test

Mini Example

About

Releases 14

Packages

Languages

License

derohde/Fred

Folders and files

Latest commit

History

Repository files navigation

Fred

Ingredients C++ Backend

Number of Threads

Curve

Curves

continous Fréchet distance

continuous Frechet distance config

discrete Fréchet distance

Clustering

Distance_Matrix

discrete (k,l)-center clustering (continuous Fréchet) -- multiple calls

discrete (k,l)-median clustering (continuous Fréchet) -- multiple calls

discrete (k,l)-center clustering (continuous Fréchet) -- oneshot

discrete (k,l)-median clustering (continuous Fréchet) -- oneshot

Clustering Result

Cluster Assignment

Dimension Reduction via Gaussian Random Projection

Installation

If something does not work with Boost

Test

Mini Example

About

Topics

Resources

License

Stars

Watchers

Forks

Releases 14

Packages 0

Languages

Packages