Skip to content

HenningWoydt/SubModST

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

15 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Submodular Subset Maximization

This repository contains code to exactly solve the Cardinality-Constrained Submodular Monotone Subset Maximization problem.

Given a universe $\mathcal{U}$ consisting of $n$ arbitrary items. Let $f : 2^{\mathcal{U}} \to \mathbb{R}$ be a set function.
Let $\Delta(e \mid A) \coloneqq f(A \cup {e}) - f(A)$ be the marginal gain of $e$.
Set function $f$ is submodular if for every $A \subseteq B \subseteq \mathcal{U}$ and $e \in \mathcal{U} \setminus B$ it holds that $\Delta(e \mid A) \geq \Delta(e \mid B)$.
Set function $f$ is monotone (increasing) if for every $A \subseteq \mathcal{U}$ and every $e \in \mathcal{U}$ it holds that $\Delta(e \mid A) \geq 0$.

Cardinality-Constrained Submodular Monotone Subset Maximization

Given a submodular and monotone set function $f$ and an integer $k$ determine a set $S$ with $|S| = k$ and
$$S = \text{arg}\max_{S' \subseteq \mathcal{U}, |S'| = k} f(S').$$

Available Functions

Currently, six functions can be optimized.

Group Closeness Centrality

Given a graph $G=(V, E)$ determine a set $S \subseteq V$ of size $k$, such that $$f(S) \coloneqq \sum_{v \in V \setminus S} \min_{u \in S} \text{dist}(u, v)$$ is minimal. Since this originally is a minimization problem, we will use Negative Group Farness $-f(S)$ as the function to maximize.

Partial Dominating Set

Given a graph $G=(V, E)$ determine a set $S \subseteq V$ of size $k$, such that $$f(S) \coloneqq |\bigcup_{v \in S} N[v]|$$ is maximized. $N[v] = {v} \cup {u \mid {u, v} \in E}$ is the closed neighborhood of $v$.

$k$-Medoid Clustering

Given a set $X$ consisting of $n$ datapoints, each of dimensionality $d$, determine a set $S \subseteq X$ of size $k$ such that $$f(S) \coloneqq \sum_{i=1}^{n} \min_{x_j \in S} d(x_i, x_j)$$ is minimized. $$d(x, y) = \sqrt{ \sum_{i = 1}^{d} (x_i - y_i)^{2} }$$ is the euclidian distance. Like Group Farness, we will maximize $-f(S)$.

Facility Location

Given a set of $n$ locations $N$ and a set of $m$ customers $M$. By $g_{ij} \geq 0$ we denote the benefit for customer $j$ when building a facility at location $i$. Determine a set $S \subseteq N$ of size $k$, such that $$f(S) = \sum_{j \in M} \max_{i \in S} g_{ij}$$ is maximized.

Weighted Coverage

Given a collection $N = {s_1, \ldots, s_n}$ of subsets of an item set $M = {1, \ldots, m}$ and a weight function $\omega: M \to \mathbb{R}$, choose $S \subseteq N$ of size $k$ such that the summed weights are maximized, e.g. $$f(S) = \sum_{i \in \bigcup S} \omega(i). $$

Bipartite Influence

Let $N = {1, \ldots, n}$ be a set of sources and $M = {1, \ldots, m}$ a set of targets. Let $G=(N \cup M, E)$ be a bipartite graph with $E \subseteq N \times M$. Let $0 \leq p_{ij} \leq 1$ be the activation probability of edge $(j, i)$ for target $i \in M$ and source $j \in N$. If the edge does not exist in $G$, then $p_{ij} = 0$. A target $i \in M$ is activated by a set $S \subseteq N$ of source with probability $1 - \prod_{j \in S}(1 - p_{ij})$. Determine a set $S \subseteq N$ of size $k$ that maximizes $$f(S) = \sum_{i \in M} \Big(1 - \prod_{j \in S}(1 - p_{ij})\Big). $$

Installation

Prerequisites

Komogorv's Blossom V implementation is required.

Vladimir Kolmogorov. "Blossom V: A new implementation of a minimum
cost perfect matching algorithm." In Mathematical Programming
Computation (MPC), July 2009, 1(1):43-67.

Its redistribution is prohibited, so we offer a script that will automatically download and extract it.
Move into directory src/3rd_party/ and execute setup_3rd_party.sh.
This script will automatically download and extract the files to the correct folder.
At the end the folder blossom5 should contain the downloaded code.

Build

To build the binary use ./build.sh. The binary submodst will be in the folder build.

Usage

Use the following options:

  • -i [ --input-file ] Path to the file.
  • -f [ --function ] Which function to optimize.
  • -k [ --k ] The solution budget.
  • -o [ --output-file ] Path to the output file.
  • --initial-solution-file (Optional) Path to a file holding an initial solution.
  • -t [ --time-limit ] (Optional) Maximum amount of running time (preprocessing not included).
  • -n [ --nickname ] Name of the algorithm. Currently Plain, Simple, Simple+, LE-Rank, LE-Score, LE-RankOrScore, LE-RankAndScore, PWG-k^, PWG-Sqrt-n^, PWM-k^, PWM-Sqrt-n^, PWD-k^, PWD-10 are available. We recommend LE-Score as the fastest.

Data

Graph

The file format should be

e11 e12  
e21 e22  
...  
en1 en2  

with ei1 ei2 denoting edge $i$ of the graph.

  • Each $e_{ij}$ should be a positive integer $\geq 0$.
  • The graph should be fully connected.
  • The algorithm assumes that the vertex with the smallest ID is 0.
  • Lines starting with a % are comments and will be ignored.

Datapoints

The file format should be

x11 x12 ... x1d
x21 x22 ... x2d
...
xn1 xn2 ... xnd

each line denotes one point of the dataset.

  • Each $x_{ij}$ should be a double value.
  • Lines starting with a % are comments and will be ignored.

Facility Location

The file format should be

g11 g12 ... g1m
g21 g22 ... g2m
...
gn1 gn2 ... gnm
  • Each entry $g_{ij}$ should be a non-negative double value for customer $i$ and location $j$.
  • Lines starting with a % are comments and will be ignored.

Weighted Coverage

The file format should be

w1 w2 ... wm
b11 b12 ... b1m
...
bn1 bn2 ... bnm
  • Each $w_j$ should be non-negative double value, that denotes the weight of item $j$.
  • Each entry $b_{ij}$ should be either 1.0 (item $j$ included in subset $i$) or 0.0 (not included).
  • Lines starting with a % are comments and will be ignored.

Bipartite Influence

The file format should be

p11 p12 ... p1m
p21 p22 ... p2m
...
pn1 pn2 ... pnm
  • Each entry $p_{ij}$ is the edge activation from source $i$ to target $j$. They should be double values in the range $[0, 1]$.
  • Lines starting with a % are comments and will be ignored.

License

GNU GENERAL PUBLIC LICENSE Version 3, 29 June 2007

Project status

Active development.

Bugs, Questions, Comments and Ideas

If any bugs arise, quesions occur, comments want to shared, or ideas discussed, please do not hesitate to contact the current repository owner ([email protected]) or leave a GitHub Issue or Discussion. Thanks!

Reference

If you use this work an any academic work, please cite

BibTeX to be added!

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published