VisDiff: SDF-Guided Polygon Generation for Visibility Reconstruction and Recognition

Rahul Moorthy and Volkan Isler Computer Science and Engineering, University of Minnesota. {mahes092, isler}@umn.edu
(August 2024)
Abstract

The capability to learn latent representations plays a key role in the effectiveness of recent machine learning methods. An active frontier in representation learning is understanding representations for combinatorial structures which may not admit well-behaved local neighborhoods or distance functions. For example, for polygons, slightly perturbing vertex locations might lead to significant changes in their combinatorial structure (expressed as their triangulation or visibility graph) and may even lead to invalid polygons. In this paper, we investigate representations to capture the underlying combinatorial structures of polygons. Specifically, we study the open problem of Visibility Reconstruction: Given a visibility graph G𝐺Gitalic_G, construct a polygon P𝑃Pitalic_P whose visibility graph is G𝐺Gitalic_G. Visibility Reconstruction belongs to the Existential Theory of Reals (\existsR) complexity class (which lies between NP and P-SPACE). Currently, reconstruction algorithms are available only for specific polygon classes. Establishing the hardness of the general problem is open.

We introduce VisDiff, a novel diffusion-based approach to reconstruct a polygon from its given visibility graph G𝐺Gitalic_G. Our method first estimates the signed distance function (SDF) of P𝑃Pitalic_P from G𝐺Gitalic_G. Afterwards, it extracts ordered vertex locations that have the pairwise visibility relationship given by the edges of G𝐺Gitalic_G. Our main insight is that going through the SDF significantly improves learning for reconstruction. In order to train VisDiff, we make two main contributions: (1) We design novel loss components for computing the visibility in a differentiable manner and (2) create a carefully curated dataset. We use this dataset to benchmark our method and achieve 21% improvement in F1-Score over standard methods. We also demonstrate effective generalization to out-of-distribution polygon types and show that learning a generative model allows us to sample the set of polygons with a given visibility graph. Finally, we extend our method to the related combinatorial problem of reconstruction from a triangulation. We achieve 95% classification accuracy of triangulation edges and a 4% improvement in Chamfer distance compared to current architectures. Lastly, we provide preliminary results on the harder visibility graph recognition problem in which the input G𝐺Gitalic_G is not guaranteed to be a visibility graph.

1 Introduction

Many types of objects ranging from molecules to organs to maps can be represented geometrically. Polygons are one of the most commonly used geometric representations. They are planar objects specified as a cyclically ordered set of points. The line segments connecting these pairs of points in the given order represent the boundary of an object such as the hand shown in Figure 1-left. As one considers the hands of various people, they realize that shape parameters such as the relative length and thickness of fingers or palm sizes vary across samples. At the same time, intuitively, most hands seem to share a common structure. This intuition can be formalized by studying the underlying combinatorial structures of the corresponding polygons representing the hands. For example, one can triangulate each polygon and construct its dual. The dual, with the appropriate embedding, closely resembles a skeleton (Figure 1-middle). Graphical structures such as the triangulation dual or the visibility graphs of polygons provide insights about the underlying combinatorial structures of shapes. In this paper, we study representations that link polygons to their combinatorial structures. We study polygons which are simple (the boundary does not self intersect) and simply-connected (no holes).

Refer to caption
Figure 1: Left: An object (a hand) represented as a polygon P𝑃Pitalic_P. The polygon is given by an ordered list of vertex locations X𝑋Xitalic_X. Also shown is a triangulation of P𝑃Pitalic_P. Middle: The dual of the triangulation of P𝑃Pitalic_P. It is represented as a graph G𝐺Gitalic_G which has a vertex for each triangle and an edge between two adjacent triangles. This drawing contains information about G𝐺Gitalic_G as well as X𝑋Xitalic_X because locations from the left figure were used for embedding the graph on the plane. Right: G𝐺Gitalic_G represented as an adjacency matrix. We seek to answer the question: How much information about X𝑋Xitalic_X can be recovered from G𝐺Gitalic_G alone? Also shown in the figure is a standard embedding of G𝐺Gitalic_G (with Isomap). Clearly, standard graph embedding algorithms are not sufficient to recover X𝑋Xitalic_X from G𝐺Gitalic_G.

The main question we study is the following: Suppose we are given a graph G𝐺Gitalic_G representing the combinatorial structure of a polygon. G𝐺Gitalic_G could be the visibility graph or a triangulation of the polygon. Note that G𝐺Gitalic_G does not contain any coordinate information X𝑋Xitalic_X. What can we say about the polygon, or the set of polygons, that have this structure G𝐺Gitalic_G? It might be tempting to use standard metric embedding methods such as Isomap (Tenenbaum et al., 2000) to reconstruct X𝑋Xitalic_X but since G𝐺Gitalic_G does not admit a natural distance metric such methods are doomed to fail as shown by the example in the right figure. Formally, let X(P)𝑋𝑃X(P)italic_X ( italic_P ) be the vertex locations of a polygon P𝑃Pitalic_P and G(P)𝐺𝑃G(P)italic_G ( italic_P ) be a graphical property of P𝑃Pitalic_P. In this paper, we consider visibility graphs and triangulations. We consider the following problems in increasing difficulty:

Problem 1 (Reconstruction)

Given a valid G𝐺Gitalic_G, generate a polygon P𝑃Pitalic_P such that G(P)=G𝐺𝑃𝐺G(P)=Gitalic_G ( italic_P ) = italic_G.

Problem 2 (Characterization)

Given a valid G𝐺Gitalic_G, generate all polygons P𝑃Pitalic_P such that G(P)=G𝐺𝑃𝐺G(P)=Gitalic_G ( italic_P ) = italic_G.

Note that in these two problems, the input G𝐺Gitalic_G is assumed to be valid – i.e., there exists a polygon P𝑃Pitalic_P whose visibility graph or triangulation dual is G𝐺Gitalic_G. While we primarily focus on reconstruction and characterization problems in this paper, we also provide insights into the more general recognition problem in which G𝐺Gitalic_G is arbitrary:

Problem 3 (Recognition)

Given an arbitrary graph G𝐺Gitalic_G, determine whether there exists a polygon P𝑃Pitalic_P such that G(P)=G𝐺𝑃𝐺G(P)=Gitalic_G ( italic_P ) = italic_G.

The primary combinatorial structure we study in this paper is the visibility graph. The visibility graph of P𝑃Pitalic_P, denoted Vis(P)𝑉𝑖𝑠𝑃Vis(P)italic_V italic_i italic_s ( italic_P ) is a graph which has a vertex for each vertex of P𝑃Pitalic_P. There is an edge between two vertices u𝑢uitalic_u and v𝑣vitalic_v if and only if u𝑢uitalic_u and v𝑣vitalic_v are visible to each other in P𝑃Pitalic_P. In other words, the line segment connecting them is completely inside P𝑃Pitalic_P. The visibility graph is an important combinatorial structure because it is unique for a given polygon, and contains many other important structures such as triangulations and shortest path trees (Guibas et al., 1986).

Our contributions: We present VisDiff: a generative model which takes a visibility graph G𝐺Gitalic_G as input and a seed for diffusion, and first generates a polygon P𝑃Pitalic_P represented as a signed distance function (SDF). Next, vertex locations on the zero level set are selected so that Vis(P)=G𝑉𝑖𝑠𝑃𝐺Vis(P)=Gitalic_V italic_i italic_s ( italic_P ) = italic_G. Our main insight is that going through the SDF as an intermediate representation yields superior results over using established methods to predict the vertex locations directly. In order to train VisDiff we design novel loss functions for evaluating the validity of the output polygon and comparing its visibility graph to the input in a differentiable manner. We also design a carefully curated dataset which captures a wide range of combinatorial properties of polygons. Current random polygon generation methods struggle to faithfully represent the visibility graph space. They are biased towards high concavity as the number of points increases. We address this problem by systematically rebalancing the dataset by the link diameter – which quantifies concavity.

We show that VisDiff can also be used for characterization since we can sample the set of polygons which have a given visibility graph. To show the generality of VisDiff, we apply it for the problem of reconstructing a polygon from its triangulation graph. Finally, we present preliminary results on how VisDiff can be used for recognition by turning it into a classification problem based on the difference between the input graph (which may not be a visibility graph) and the visibility graph of the output polygon. This last result suggests that VisDiff is learning a meaningful representation over the space of all polygons. Overall, our results provide evidence that recent architectures can learn representations of non-trivial combinatorial structures such as polygons. We start with overview of related work.

2 Related Work

We summarize the related work in three main directions: Visibility graph reconstruction and recognition, representation learning for shapes and graph neural networks.

Visibility graph reconstruction and recognition: The problem of reconstructing and recognizing visibility graphs is studied extensively in the computational geometry literature. Yet, it is still an open problem (Ghosh & Goswami, 2013). In the current literature, there are reconstruction and recognition for polygons of certain categories:  Ameer et al. (2022) solved the recognition and reconstruction problems for pseudo polygons.  Silva (2020) showed that visibility graphs of convex fans are equivalent to visibility graphs of terrain polygons with an addition of a universal vertex. Everett & Corneil (1990) proposed an algorithm to solve the recognition problem in spiral polygons.  Boomari & Zarei (2016) proposed reconstruction and recognition algorithm for anchor polygons.  Colley et al. (1997) proposed a linear time algorithm to recognize visibility graphs for tower polygon.  Dehghani & Morady (2009) solved the reconstruction problem for embedded planar graphs. On the hardness side, the complexity of the visibility graph recognition and reconstruction problem is known to belong to PSPACE (Everett, 1990) specifically in the Existential Theory of the Reals class (Boomari et al., 2018). The exact hardness of the problem is still open. In this work we explore it from the representation learning perspective to understand if generative models can learn the underlying manifold of the space of polygons and their visibility graphs in a generalizable fashion.

Representation Learning: 3D shape completion (Chou et al., 2023) (Chen et al., 2024) (Cheng et al., 2023) (Shim et al., 2023) is a closely related application. In 3D shape completion, the input contains partial geometric information for example as a point cloud. In our case, the input is only a combinatorial description such as the visibility graph. There might be many shapes consistent with the input graph and extracting them without any geometric information as part of the input is challenging. Another body of work related to our problem is mesh generation (Gupta et al., 2023) (Wang et al., 2020). Two recent results in this domain are MeshGPT (Siddiqui et al., 2024) and PolyDiff (Alliegro et al., 2023). Both of these approaches generate high-quality 3D triangular meshes by learning to output a set of triangles from a fixed set of triangles. PolyDiff discretizes the 3D space into bins and MeshGPT works over a predefined set of triangles. In our work, we seek to learn the space of all polygons and their visibility graphs.

Graph Neural Networks (GNNs): GNNs are one of the standard representations for graphs. The current literature on GNNs primarily focuses on graphs with features associated with a well-defined metric space. In the literature, the closest to our work is generating graph embeddings for a given distance matrix. Li et al. (2024) showed that a GNN given all-pairwise Euclidean distance information which is known as Vanilla DisGNN, fails to differentiate between symmetric graph structures. To address the limitation of Vanilla DisGNN, they propose k𝑘kitalic_k-DisGNN. k𝑘kitalic_k-DisGNN captures information not just from immediate neighbors but from a k𝑘kitalic_k-hop neighborhood around each node. The ability to utilize the k-hop neighbourhood results in building richer geometric representations for differentiating between symmetric structures efficiently.  Cui & Wei (2023) proposed MetricGNN, which is capable of generating graph embedding from a given embedding distance matrix. Shi et al. (2021) proposed ConfGF which uses GNN for determining molecular conformation given the inter-atomic distances and bond characteristics. All the above work assumes the presence of an underlying metric space which is absent in visibility graph reconstruction. We develop VisDiff to learn embeddings in this challenging combinatorial domain.

3 VisDiff Architecture

VisDiff consists of three main modules: Graph Encoding, SDF Representation Learning, and Vertex Prediction. The following sections focus on the details of each module.

Refer to caption
Figure 2: VisDiff architecture: There are three main blocks, namely Denoising U-Net, Vertex Prediction and graph encoder shown by E. G represents a polygon structural graph. Z represents the encoding of the graph. During Training: the model is supervised using both the ground truth SDF and polygon. During Testing: only the visibility graph is provided as input.

Graph Encoding: The visibility graph is represented as a binary adjacency matrix. To condition other components of VisDiff on this input, we train a U-Net (Ronneberger et al., 2015) autoencoder with Binary Cross Entropy (BCE) Loss to reduce the dimensionality of the 25×25252525\times 2525 × 25 input matrix to 512. We pretrain the autoencoder separately and freeze the encoder layer during encoding visibility graph G𝐺Gitalic_G in other modules.

SDF Diffusion: Diffusion models have shown the ability to efficiently learn the space of all images. Motivated by this success, we represent polygons with their signed distance functions which in turn can be represented as images (each pixel stores the distance to the nearest point on the polygon boundary). We can now learn the space of polygons as a diffusion process using a Denoising Diffusion Implicit Model (DDIM) (Song et al., 2020). DDIM primarily involves two steps: forward diffusion and the reverse diffusion processes.

Forward Diffusion process involves adding noise to the SDF representation in a scheduled manner. Let the SDF sample from the valid polygon distribution be denoted by x0. Given the standard deviation of the noise level denoted by σtsubscript𝜎𝑡\sigma_{t}italic_σ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT >>> 0 at timestep t of the diffusion step, the noise addition process is defined by xt=x0+σtϵsubscript𝑥𝑡subscript𝑥0subscript𝜎𝑡italic-ϵx_{t}=x_{0}+\sigma_{t}\epsilonitalic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT = italic_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT + italic_σ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT italic_ϵ where ϵ𝒩(0,I)similar-toitalic-ϵ𝒩0𝐼\epsilon\sim\mathcal{N}(0,I)italic_ϵ ∼ caligraphic_N ( 0 , italic_I ) is a sample from the Gaussian distribution. In this way, noise is continuously injected into the SDF, eventually transforming it into a pure Gaussian sample at the end of the forward noising process. VisDiff uses a linear log scheduler (Permenter & Yuan, 2023) to control the noise level throughout the forward noising process.

Reverse Diffusion involves recovering the original SDF from the final Gaussian sample generated during the forward diffusion process. In this step, we start with Gaussian noise and predict the noise added to the sample given the σtsubscript𝜎𝑡\sigma_{t}italic_σ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT. The reverse diffusion is parameterized through a neural network that learns to predict the added noise given the input noise sample and σtsubscript𝜎𝑡\sigma_{t}italic_σ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT.

Specifically, we train a U-Net (Ronneberger et al., 2015) encoder-decoder architecture to predict the noise added to the original SDF sample. Additionally, we condition the U-Net CNN blocks on encoded visibility using multiple Spatial Transformer Cross Attention (Ngo et al., 2023) blocks. The cross-attention blocks directly incorporate visibility information into the U-Net spatial features during the learning process. The key and value components of the cross-attention block are the spatial CNN features, while the query is the encoded visibility embedding. Figure 2 shows the architecture of the SDF Diffusion block. The model is trained using LMSEsubscript𝐿𝑀𝑆𝐸L_{MSE}italic_L start_POSTSUBSCRIPT italic_M italic_S italic_E end_POSTSUBSCRIPT mean-squared error loss (MSE) between the predicted noise and the actual noise added to the sample. Given the visibility graph G, the trained model is then used to sample polygon SDF.

Sampling of the SDF is performed using a DDIM sampler. The sampling process draws a sample from a Gaussian distribution 𝒩(0,I)𝒩0𝐼\mathcal{N}(0,I)caligraphic_N ( 0 , italic_I ) denoted by xt along with a schedule of decreasing noise levels proportional to the number of steps in the sampling process. Each diffusion step is given by Equation 1.

xt1=xt+(σt1σt)ϵθ(xt,σt,G)subscript𝑥𝑡1subscript𝑥𝑡subscript𝜎𝑡1subscript𝜎𝑡subscriptitalic-ϵ𝜃subscript𝑥𝑡subscript𝜎𝑡𝐺x_{t-1}=x_{t}+(\sigma_{t-1}-\sigma_{t})\epsilon_{\theta}(x_{t},\sigma_{t},G)italic_x start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT = italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT + ( italic_σ start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT - italic_σ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) italic_ϵ start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT ( italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_σ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_G ) (1)

where ϵθ(xt,σt,G)subscriptitalic-ϵ𝜃subscript𝑥𝑡subscript𝜎𝑡𝐺\epsilon_{\theta}(x_{t},\sigma_{t},G)italic_ϵ start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT ( italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_σ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_G ) represents the noise predicted by the U-Net encoder-decoder architecture given the visibility graph G, the noise sample from the previous step xt and the standard deviation of the noise level σtsubscript𝜎𝑡\sigma_{t}italic_σ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT. This process reconstructs the SDF of the polygon, ensuring it adheres to the visibility constraints defined by G.

Vertex Prediction: The generated SDF of the polygon is then used to determine the final vertex locations whose visibility relationship corresponds to the visibility graph G. The process of picking vertex locations over the zero level-set is challenging as the corners of the polygons are not well-defined in the SDF image.Furthermore, as the number of vertex locations increases, a small change in the placement of points on the SDF will significantly alter the visibility of the entire polygon.

We formulate the polygon vertex extraction as a separate estimation problem of determining vertex locations given the SDF and the visibility graph G. Specifically, we train a CNN encoder to encode the SDF into an embedding space. The embedding process is also conditioned on the visibility graph G𝐺Gitalic_G encoding using Spatial Transformer Cross Attention (Ronneberger et al., 2015) layers, which helps relate vertex generation to the visibility constraints. The keys and values for the spatial transformer are the spatial CNN features similar to the diffusion block, while the query is the encoded visibility embedding. The generated SDF embedding is then passed through multiple MLP layers to predict the ordered vertex locations of the polygon. See Figure 2.

We experimented with predicting vertex locations simultaneously with the SDF. Comparisons presented in Appendix (Section A, Table 3) show that training the vertex prediction model independently from the SDF generation model is significantly more accurate than joint training and prediction. Hence, we train the vertex prediction model separately with the ground truth SDF.

4 Loss Functions

The model is trained using the following loss function

Loss=λ1LMSE+λ2Lvalidity+λ3Lvisibility+λ4LSDF𝐿𝑜𝑠𝑠subscript𝜆1subscript𝐿𝑀𝑆𝐸subscript𝜆2subscript𝐿𝑣𝑎𝑙𝑖𝑑𝑖𝑡𝑦subscript𝜆3subscript𝐿𝑣𝑖𝑠𝑖𝑏𝑖𝑙𝑖𝑡𝑦subscript𝜆4subscript𝐿𝑆𝐷𝐹Loss=\lambda_{1}L_{MSE}+\lambda_{2}L_{validity}+\lambda_{3}L_{visibility}+% \lambda_{4}L_{SDF}italic_L italic_o italic_s italic_s = italic_λ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT italic_L start_POSTSUBSCRIPT italic_M italic_S italic_E end_POSTSUBSCRIPT + italic_λ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT italic_L start_POSTSUBSCRIPT italic_v italic_a italic_l italic_i italic_d italic_i italic_t italic_y end_POSTSUBSCRIPT + italic_λ start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT italic_L start_POSTSUBSCRIPT italic_v italic_i italic_s italic_i italic_b italic_i italic_l italic_i italic_t italic_y end_POSTSUBSCRIPT + italic_λ start_POSTSUBSCRIPT 4 end_POSTSUBSCRIPT italic_L start_POSTSUBSCRIPT italic_S italic_D italic_F end_POSTSUBSCRIPT (2)

where λisubscript𝜆𝑖\lambda_{i}italic_λ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT is a scaling factor and λ1=λ4=1.0subscript𝜆1subscript𝜆41.0\lambda_{1}=\lambda_{4}=1.0italic_λ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT = italic_λ start_POSTSUBSCRIPT 4 end_POSTSUBSCRIPT = 1.0 while λ2=λ3=0.1subscript𝜆2subscript𝜆30.1\lambda_{2}=\lambda_{3}=0.1italic_λ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT = italic_λ start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT = 0.1. λ2subscript𝜆2\lambda_{2}italic_λ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT and λ3subscript𝜆3\lambda_{3}italic_λ start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT were chosen as 0.1 because the scale of LValiditysubscript𝐿𝑉𝑎𝑙𝑖𝑑𝑖𝑡𝑦L_{Validity}italic_L start_POSTSUBSCRIPT italic_V italic_a italic_l italic_i italic_d italic_i italic_t italic_y end_POSTSUBSCRIPT and LVisibilitysubscript𝐿𝑉𝑖𝑠𝑖𝑏𝑖𝑙𝑖𝑡𝑦L_{Visibility}italic_L start_POSTSUBSCRIPT italic_V italic_i italic_s italic_i italic_b italic_i italic_l italic_i italic_t italic_y end_POSTSUBSCRIPT is 10 times bigger than the other components. Each loss component has a unique role in learning the visibility property efficiently as described below.

Refer to caption
(a) B, D Non-visible
Refer to caption
(b) A, D Non-visible
Refer to caption
(c) D, F Visible
Figure 3: Visibility losses: To check whether two vertices u𝑢uitalic_u and v𝑣vitalic_v are visible to each other, we consider the intersection of the lines-segment |uv|𝑢𝑣|uv|| italic_u italic_v | with the edges of the polygon and handle a degenerate case separately: In Figure 3(a), the segment BD𝐵𝐷BDitalic_B italic_D does not intersect any polygon edge but it lies completely outside the polygon. The loss component Loutsubscript𝐿𝑜𝑢𝑡L_{out}italic_L start_POSTSUBSCRIPT italic_o italic_u italic_t end_POSTSUBSCRIPT addresses this case. For the remaining cases, the loss Lintsubscript𝐿𝑖𝑛𝑡L_{int}italic_L start_POSTSUBSCRIPT italic_i italic_n italic_t end_POSTSUBSCRIPT calculates Int(X,Y)=1/(1+d(X,Y))𝐼𝑛𝑡𝑋𝑌11𝑑𝑋𝑌Int(X,Y)=1/(1+d(X,Y))italic_I italic_n italic_t ( italic_X , italic_Y ) = 1 / ( 1 + italic_d ( italic_X , italic_Y ) ) where X𝑋Xitalic_X and Y𝑌Yitalic_Y are two line segments and d𝑑ditalic_d is the distance of the intersection point to the closest point on X𝑋Xitalic_X. In Figure 3(b), when X=AD𝑋𝐴𝐷X=ADitalic_X = italic_A italic_D and Y=BC𝑌𝐵𝐶Y=BCitalic_Y = italic_B italic_C, the value of d(AD,BC)=0𝑑𝐴𝐷𝐵𝐶0d(AD,BC)=0italic_d ( italic_A italic_D , italic_B italic_C ) = 0 because the intersection point is on AD𝐴𝐷ADitalic_A italic_D. However in Figure 3(c), the value of d(FD,BC)>0𝑑𝐹𝐷𝐵𝐶0d(FD,BC)>0italic_d ( italic_F italic_D , italic_B italic_C ) > 0 as FD𝐹𝐷FDitalic_F italic_D and BC𝐵𝐶BCitalic_B italic_C are non-intersecting. Lintsubscript𝐿𝑖𝑛𝑡L_{int}italic_L start_POSTSUBSCRIPT italic_i italic_n italic_t end_POSTSUBSCRIPT calculates the Int𝐼𝑛𝑡Intitalic_I italic_n italic_t function with all polygon edges separately during the visibility calculation.

LMSEsubscript𝐿𝑀𝑆𝐸L_{MSE}italic_L start_POSTSUBSCRIPT italic_M italic_S italic_E end_POSTSUBSCRIPT: The MSE loss penalizes deviation from ground truth vertex locations.

LMSE=(X^X)2subscript𝐿𝑀𝑆𝐸superscript^𝑋superscript𝑋2L_{MSE}=(\hat{X}-X^{*})^{2}italic_L start_POSTSUBSCRIPT italic_M italic_S italic_E end_POSTSUBSCRIPT = ( over^ start_ARG italic_X end_ARG - italic_X start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT (3)

where X^^𝑋\hat{X}over^ start_ARG italic_X end_ARG denotes the locations of the predicted vertices and Xsuperscript𝑋X^{*}italic_X start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT denotes the ground truth vertex locations. LMSEsubscript𝐿𝑀𝑆𝐸L_{MSE}italic_L start_POSTSUBSCRIPT italic_M italic_S italic_E end_POSTSUBSCRIPT loss is especially helpful for initial learning of the polygon structure.

Lvisibilitysubscript𝐿𝑣𝑖𝑠𝑖𝑏𝑖𝑙𝑖𝑡𝑦L_{visibility}italic_L start_POSTSUBSCRIPT italic_v italic_i italic_s italic_i italic_b italic_i italic_l italic_i italic_t italic_y end_POSTSUBSCRIPT: The loss component Lvisibilitysubscript𝐿𝑣𝑖𝑠𝑖𝑏𝑖𝑙𝑖𝑡𝑦L_{visibility}italic_L start_POSTSUBSCRIPT italic_v italic_i italic_s italic_i italic_b italic_i italic_l italic_i italic_t italic_y end_POSTSUBSCRIPT measures how close the visibility graph G^^𝐺\hat{G}over^ start_ARG italic_G end_ARG of the output polygon is to the input G𝐺Gitalic_G which can be computed using binary cross entropy.

Lvisibility=LBCE(G^,G)subscript𝐿𝑣𝑖𝑠𝑖𝑏𝑖𝑙𝑖𝑡𝑦subscript𝐿𝐵𝐶𝐸^𝐺𝐺L_{visibility}=L_{BCE}(\hat{G},G)italic_L start_POSTSUBSCRIPT italic_v italic_i italic_s italic_i italic_b italic_i italic_l italic_i italic_t italic_y end_POSTSUBSCRIPT = italic_L start_POSTSUBSCRIPT italic_B italic_C italic_E end_POSTSUBSCRIPT ( over^ start_ARG italic_G end_ARG , italic_G ) (4)

where G^^𝐺\hat{G}over^ start_ARG italic_G end_ARG represents the predicted visibility graph and LBCEsubscript𝐿𝐵𝐶𝐸L_{BCE}italic_L start_POSTSUBSCRIPT italic_B italic_C italic_E end_POSTSUBSCRIPT refers to binary cross entropy.

However, since VisDiff outputs only vertex locations, the main challenge in computing this loss is computing the visibility graph in a differentiable manner. We present a differentiable method to estimate G^^𝐺\hat{G}over^ start_ARG italic_G end_ARG. An edge is considered non-visible if it intersects any other polygon edge or is fully outside the polygon. We estimate G^^𝐺\hat{G}over^ start_ARG italic_G end_ARG using two terms Loutsubscript𝐿𝑜𝑢𝑡L_{out}italic_L start_POSTSUBSCRIPT italic_o italic_u italic_t end_POSTSUBSCRIPT and Lintsubscript𝐿𝑖𝑛𝑡L_{int}italic_L start_POSTSUBSCRIPT italic_i italic_n italic_t end_POSTSUBSCRIPT to account for both conditions of non-visibility. See Figure 3. Loutsubscript𝐿𝑜𝑢𝑡L_{out}italic_L start_POSTSUBSCRIPT italic_o italic_u italic_t end_POSTSUBSCRIPT determines non-visibility due to being fully outside the polygon while Lintsubscript𝐿𝑖𝑛𝑡L_{int}italic_L start_POSTSUBSCRIPT italic_i italic_n italic_t end_POSTSUBSCRIPT determines non-visibility due to intersection. Specifically, Loutsubscript𝐿𝑜𝑢𝑡L_{out}italic_L start_POSTSUBSCRIPT italic_o italic_u italic_t end_POSTSUBSCRIPT samples dense points on the line and extracts the SDF values of points outside the polygon. Lintsubscript𝐿𝑖𝑛𝑡L_{int}italic_L start_POSTSUBSCRIPT italic_i italic_n italic_t end_POSTSUBSCRIPT calculates the distance to the intersection point between the visibility edge and each polygon edge. Equation 5 shows the resultant G^^𝐺\hat{G}over^ start_ARG italic_G end_ARG for determining visibility for single edge i𝑖iitalic_i given the Lintsubscript𝐿𝑖𝑛𝑡L_{int}italic_L start_POSTSUBSCRIPT italic_i italic_n italic_t end_POSTSUBSCRIPT and Loutsubscript𝐿𝑜𝑢𝑡L_{out}italic_L start_POSTSUBSCRIPT italic_o italic_u italic_t end_POSTSUBSCRIPT.

G^i=1max(Lintmax,Loutmax)subscript^𝐺𝑖1𝑚𝑎𝑥subscript𝐿𝑖𝑛𝑡𝑚𝑎𝑥subscript𝐿𝑜𝑢𝑡𝑚𝑎𝑥\begin{split}\hat{G}_{i}=1-max(L_{intmax},L_{outmax})\\ \end{split}start_ROW start_CELL over^ start_ARG italic_G end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = 1 - italic_m italic_a italic_x ( italic_L start_POSTSUBSCRIPT italic_i italic_n italic_t italic_m italic_a italic_x end_POSTSUBSCRIPT , italic_L start_POSTSUBSCRIPT italic_o italic_u italic_t italic_m italic_a italic_x end_POSTSUBSCRIPT ) end_CELL end_ROW (5)

where Lintmaxsubscript𝐿𝑖𝑛𝑡𝑚𝑎𝑥L_{intmax}italic_L start_POSTSUBSCRIPT italic_i italic_n italic_t italic_m italic_a italic_x end_POSTSUBSCRIPT shows max(Lint)𝑚𝑎𝑥subscript𝐿𝑖𝑛𝑡max(L_{int})italic_m italic_a italic_x ( italic_L start_POSTSUBSCRIPT italic_i italic_n italic_t end_POSTSUBSCRIPT ) and Loutmaxsubscript𝐿𝑜𝑢𝑡𝑚𝑎𝑥L_{outmax}italic_L start_POSTSUBSCRIPT italic_o italic_u italic_t italic_m italic_a italic_x end_POSTSUBSCRIPT shows max(Lout)𝑚𝑎𝑥subscript𝐿𝑜𝑢𝑡max(L_{out})italic_m italic_a italic_x ( italic_L start_POSTSUBSCRIPT italic_o italic_u italic_t end_POSTSUBSCRIPT ). We subtract one as non-visible edges are represented as 0 in the visibility matrix. Max𝑀𝑎𝑥Maxitalic_M italic_a italic_x is a non-differentiable operation. We design a soft maximum to have a differentiable estimation of the maximum operation. Equation 6 shows a differentiable estimation of the maximum operation given two random numbers A𝐴Aitalic_A and B𝐵Bitalic_B.

softmaximum(A,B)=softmax(A,B)(A+B)𝑠𝑜𝑓𝑡𝑚𝑎𝑥𝑖𝑚𝑢𝑚𝐴𝐵𝑠𝑜𝑓𝑡𝑚𝑎𝑥𝐴𝐵𝐴𝐵\begin{split}softmaximum(A,B)=softmax(A,B)*(A+B)\\ \end{split}start_ROW start_CELL italic_s italic_o italic_f italic_t italic_m italic_a italic_x italic_i italic_m italic_u italic_m ( italic_A , italic_B ) = italic_s italic_o italic_f italic_t italic_m italic_a italic_x ( italic_A , italic_B ) ∗ ( italic_A + italic_B ) end_CELL end_ROW (6)

The differentiable estimation of the maximum operation is used to determine each edge in G^^𝐺\hat{G}over^ start_ARG italic_G end_ARG.

Lvaliditysubscript𝐿𝑣𝑎𝑙𝑖𝑑𝑖𝑡𝑦L_{validity}italic_L start_POSTSUBSCRIPT italic_v italic_a italic_l italic_i italic_d italic_i italic_t italic_y end_POSTSUBSCRIPT: We introduce Lvaliditysubscript𝐿𝑣𝑎𝑙𝑖𝑑𝑖𝑡𝑦L_{validity}italic_L start_POSTSUBSCRIPT italic_v italic_a italic_l italic_i italic_d italic_i italic_t italic_y end_POSTSUBSCRIPT to penalize polygon edge crossings. Lvaliditysubscript𝐿𝑣𝑎𝑙𝑖𝑑𝑖𝑡𝑦L_{validity}italic_L start_POSTSUBSCRIPT italic_v italic_a italic_l italic_i italic_d italic_i italic_t italic_y end_POSTSUBSCRIPT uses the Int𝐼𝑛𝑡Intitalic_I italic_n italic_t function from Lvisibilitysubscript𝐿𝑣𝑖𝑠𝑖𝑏𝑖𝑙𝑖𝑡𝑦L_{visibility}italic_L start_POSTSUBSCRIPT italic_v italic_i italic_s italic_i italic_b italic_i italic_l italic_i italic_t italic_y end_POSTSUBSCRIPT to identify invalid configurations. Equation 7 shows the validity loss.

Lvalidity=1(m+1)2i=0mj=0mInt(Pi,Pj),ijL_{validity}=\frac{1}{(m+1)^{2}}\sum_{i=0}^{m}\sum_{j=0}^{m}Int(P_{i},P_{j})% \quad,\quad i\neq j\\ italic_L start_POSTSUBSCRIPT italic_v italic_a italic_l italic_i italic_d italic_i italic_t italic_y end_POSTSUBSCRIPT = divide start_ARG 1 end_ARG start_ARG ( italic_m + 1 ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG ∑ start_POSTSUBSCRIPT italic_i = 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT ∑ start_POSTSUBSCRIPT italic_j = 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT italic_I italic_n italic_t ( italic_P start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_P start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ) , italic_i ≠ italic_j (7)

where m𝑚mitalic_m denotes number of polygon P𝑃Pitalic_P edges and ij𝑖𝑗i\neq jitalic_i ≠ italic_j restricts the sum to edges that are neither adjacent nor the same. The function Int()𝐼𝑛𝑡Int()italic_I italic_n italic_t ( ) is illustrated in Figure 3.

LSDFsubscript𝐿𝑆𝐷𝐹L_{SDF}italic_L start_POSTSUBSCRIPT italic_S italic_D italic_F end_POSTSUBSCRIPT: The final loss components ensures that the vertices lie on the polygon boundary i.e. the zero level set.

LSDF=i=0n|S(Vi)|subscript𝐿𝑆𝐷𝐹superscriptsubscript𝑖0𝑛𝑆subscript𝑉𝑖L_{SDF}=\sum_{i=0}^{n}|S(V_{i})|italic_L start_POSTSUBSCRIPT italic_S italic_D italic_F end_POSTSUBSCRIPT = ∑ start_POSTSUBSCRIPT italic_i = 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT | italic_S ( italic_V start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) | (8)

where Visubscript𝑉𝑖V_{i}italic_V start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT represents the i𝑖iitalic_i-th vertex location of polygon P𝑃Pitalic_P, S represents its SDF value and n represents the number of vertices of polygon P𝑃Pitalic_P.

Ablation studies in Appendix (Section A, Table 3) show that adding these additional losses helps the model improve on upholding the visibility graph G compared to training with only the LMSEsubscript𝐿𝑀𝑆𝐸L_{MSE}italic_L start_POSTSUBSCRIPT italic_M italic_S italic_E end_POSTSUBSCRIPT.

5 Dataset Generation

The problems of Visibility Characterization and Visibility Reconstruction require the dataset distribution to have a key characteristic of multiple polygons P𝑃Pitalic_P corresponding to the same visibility graph G𝐺Gitalic_G. Additionally, the dataset should also represent a high diversity of visibility graphs. We address these characteristics by uniformly sampling polygons based on graph properties described below and also generate multiple augmentations of the same polygon.

The dataset generation process involves sampling 60,000 polygons with 25 vertex locations. The vertex locations are drawn from a uniform distribution within [1,1]2superscript112[-1,1]^{2}[ - 1 , 1 ] start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT. We use the 2-opt move (Auer & Held, 1996) algorithm to generate polygons from the drawn locations.We observed that the dataset generated from the 2-opt move algorithm exhibited non-uniformity with respect to the link diameter of the visibility graph. Link diameter quantifies the maximum number of edges on the shortest path between any two graph nodes. A higher diameter indicates greater concavity in the polygon. Hence, to have a balanced distribution, we resample the large dataset based on the link diameter of the visibility graph. The resampling process results in a subset of 18,500 polygons. In the appendix (Section B, Figure 9(b)) we present additional statistics showing that that our dataset is uniformly distributed in terms of link diameter.

We further augment each polygon to generate 20 samples by applying shear transformation and vertex perturbation while preserving the visibility graph G𝐺Gitalic_G. The augmentations introduce the property of multiple polygons with the same visibility graph G . The augmentation and resampling are critical for learning the representative space of Visibility Characterization and Visibility Reconstruction problems. The final dataset consists of 370,000 polygons and their respective visibility graphs. The total dataset size including all polygons consists of 400,000, which will be made publicly available.

5.1 Test Set Generation

We generate two datasets for evaluation: in-distribution and out-of-distribution. In-distribution samples are generated by setting aside 100 unique polygons per link diameter from the large dataset. These are not included in the training data.

Refer to caption
(a) Star
Refer to caption
(b) Terrain
Refer to caption
(c) Fan
Refer to caption
(d) Anchor
Refer to caption
(e) Spiral
Figure 4: Polygon types: a) Star: Single kernel point (red) from which all vertex locations are visible, b) Terrain: X-monotone polygons where orthogonal lines from the X axis intersect the polygon boundary at most twice, c) Convex Fan: Single convex vertex (red) which appears in every triangle of the polygon triangulation, d) Anchor: Polygons with two reflex links and a convex link connecting both of them, e) Spiral: Polygons with long link diameter.

The out-of-distribution samples are generated based on specific polygon types - star, spiral, anchor, convex fan, and terrain. Figure 4 details the properties of the polygon types. Spiral and anchor share similar characteristics to our dataset while terrain, convex fan and star differ significantly in terms of its density i.e., the total percentage of edges in the graph. In the appendix (Section B, Figure 10(b)) shows the difference in density of visibility graph distribution of terrain, convex fan, and star compared to the training set.

6 Results

We evaluate VisDiff with baselines on the problem of Visibility Reconstruction. We also show the ability of VisDiff to give evidence for Visibility Characterization problem. We then provide preliminary results on Visibility Recognition. Lastly, we showcase the generalization of VisDiff to other graph structural properties like Triangulation Dual.

6.1 Evaluation Metrics

To evaluate our algorithm, we compute the visibility graphs of the output polygons and formulate the evaluation of the visibility graph as a classification problem. We report the accuracy, precision, recall, and F1-Score between the generated and the ground-truth visibility graphs. Specifically, each edge of the visibility graph is classified as either a visible or non-visible edge. Each visibility graph is evaluated individually, and the average over the dataset is reported as a collective quantitative metric. Since the ratio of visible and non-visible edges can be vastly different across polygons, we use the F-1 score to evaluate model performance.

6.2 Qualitative and Quantitative Evaluation

We compare VisDiff against baselines, which generate vertex representation of a polygon from the visibility graph. In particular, we compare against various state of the art encoders such as Transformer-Decoder [Seq] (Vaswani, 2017), Graph Neural Network [Gnn] (Veličković et al., 2017), DDIM [VD] (Song et al., 2020), Encoder - Decoder [E.D] and a direct optimization approach based on Nelder-Mead [NM] (Gao & Han, 2012) optimization. Nelder-Mead optimizes the configuration of vertex locations by using the difference between the predicted and actual visibility graph as a loss which is backpropagated to the vertex locations. The code will be made publicly available for details on the implementation of all baselines.

6.3 Visibility Reconstruction

Table 1 shows the quantitative evaluation on the in-distribution dataset. VisDiff performs significantly better than architectures utilizing vertex representation on all metrics except for precision. Nelder-Mead optimization based on predicted and actual visibility graphs performs much better on precision, but it needs to be noted that it has the lowest recall as well. Specifically, Nelder-Mead optimization missed an average of 60% visible edges on all samples in the test dataset. Figure 5 also shows that Nelder-Mead optimization and others fail to generate valid polygons, ensuring both validity and visibility while VisDiff learns to generate polygons close to the ground truth visibility.

Acc \uparrow Prec \uparrow Rec \uparrow F1 \uparrow DAcc \uparrow DRec \uparrow DF1 \uparrow CDist \downarrow
(a) E.D 0.75 0.76 0.54 0.62 0.95 0.69 0.81 0.95
(b) Seq 0.68 0.58 0.65 0.61 0.96 0.75 0.85 0.96
(c) Gnn 0.73 0.90 0.43 0.57 0.95 0.70 0.82 1.03
(d) VD 0.77 0.80 0.58 0.66 0.93 0.55 0.71 0.96
(e) NM 0.70 0.93 0.34 0.49 0.98 0.88 0.94 1.10
(f) Ours 0.85 0.83 0.77 0.80 0.99 0.95 0.97 0.91
Table 1: Baseline comparison:(a) Encoder-Decoder, (b) Sequence Prediction, (c) GNN, (d) Vertex Diffusion, (e) Nelder-Mead Optimization, (f) VisDiff, Acc: Accuracy, Prec: Precision, Rec: Recall, DAcc: triangulation-dual accuracy, DRec: triangulation-dual recall, DF1: triangulation-dual F-1 Score, CDist: Chamfer distance between point sets in triangulation dual
Refer to caption
(a) GT
Refer to caption
(b) 0.81
Refer to caption
(c) 0.60
Refer to caption
(d) 0.58
Refer to caption
(e) 0.57
Refer to caption
(f) 0.57
Refer to caption
(g) 0.48
Figure 5: Visibility reconstruction qualitative results: The top row shows the polygons generated by different methods. The second row shows corresponding visibility graphs of the polygons where green represents the visible edge and red represents the non-visible edge. The captions indicate the F1 Score of the visibility graph compared to the GT. The polygon results correspond to the following methods - a) Ground Truth, b) VisDiff  c) Sequence Prediction d) GNN, e) Vertex diffusion, f) Encoder-Decoder, g) Optimization

We further evaluate VisDiff on its generalization to different polygon types. Table 2 shows its quantitative results on the out-of-distribution dataset. VisDiff generalizes well to polygons different from the training distribution. Specifically to the terrain, star and convex-fan which have density of the visibility graph different from our distribution.

Metrics Accuracy \uparrow Precision \uparrow Recall \uparrow F1-Score \uparrow
Spiral 0.875 0.842 0.808 0.823
Terrain 0.866 0.815 0.645 0.712
Convex Fan 0.769 0.775 0.772 0.771
Anchor 0.89 0.935 0.935 0.935
Star 0.772 0.751 0.797 0.77
Table 2: Specific polygon types: VisDiff shows generalization to star, terrain and anchor polygon types which are out of distribution samples to our dataset.

6.4 Visibility Characterization

We showcase the ability of VisDiff to present evidence for the Visibility Characterization problem. We generate multiple polygons given the same visibility graph G𝐺Gitalic_G by drawing different samples from Gaussian distribution for diffusion initialization. Figure 6 shows how VisDiff generates different polygons with perturbation and shear transformation but having similar visibility to the ground truth visibility graph G𝐺Gitalic_G. The ability of sampling multiple polygons with the same visibility was also utilized in the above Visibility Reconstruction experiments. In particular, we sample 50 polygons given a single visibility graph G𝐺Gitalic_G and get the polygon best following the visibility graph G𝐺Gitalic_G.

Refer to caption
(a) GT
Refer to caption
(b) 0.81
Refer to caption
(c) 0.76
Refer to caption
(d) 0.75
Refer to caption
(e) 0.76
Figure 6: Visibility Characterization: The top row shows multiple polygons generated by VisDiff for the same visibility graph G𝐺Gitalic_G. The second row shows the visibility graph corresponding to the polygons where green represents visible edge and red represents non-visible edge. The caption shows the F1-Score compared to the ground truth (GT) visibility graph.
Refer to caption
(a)
Refer to caption
(b)
Refer to caption
(c)
Refer to caption
(d)
Refer to caption
(e) Accuracy (%) vs F-1 Threshold
Figure 8: Visibility Recognition: a) Non-Valid Sample 1: Red represents hole, b) VisDiff Prediction Sample 1: VisDiff learns to put points in such a way to best maintain the visibility and the visibility graph is detected as a valid visibility graph, c) Non-Valid Sample 2: Red represents hole, d) VisDiff Prediction Sample 2: VisDiff failed to generate a valid polygon and therefore classified as a non-valid visibility graph, e) Visibility Recognition Quantitative Results: VisDiff classifies 80% of the samples correctly when the F-1 threshold is selected as 0.73.

6.5 Visibility Recognition

We present preliminary results on the Visibility Recognition problem. We generate a set of 50 valid and non-valid visibility graphs for the Visibility Recognition problem. We use polygons with holes as samples of non-valid visibility graphs. A polygon with a hole is a polygon with an outer boundary, but also has an inner boundary which makes it non simple. We determine the visibility graph in the same way as that of simple polygon. An edge through the hole is a non visible edge since the hole is considered outside the polygon.

We utilize the model’s ability to sample multiple polygons and sample a set of polygons S for each visibility graph. If any of the polygons from S are valid and has a F1-Score over a certain threshold X. It is classified as a valid visibility graph. Figure 8(e) shows the performance of our model on Visibility Recognition problem using different thresholds on F-1 Score. Figure 16(a) to 16(d) shows qualitative results on the polygon generation for two non-valid visibility graphs. VisDiff is able to correctly classify 80% of the samples from the set of valid and non-valid visibility graphs when the F-1 threshold is selected to be close to mean performance on the Visibility Reconstruction problem. Classification performance of 80% shows that VisDiff is able to represent the underlying valid visibility graph space efficiently. Appendix C.4 shows more qualitative results on Visibility Recognition.

6.6 Triangulation

In this section, we change the input from the complete visibility graph to the triangulation dual to show case the versatility of VisDiff. Note that a polygon may have many different triangulations. Each triangulation contains n2𝑛2n-2italic_n - 2 triangles where n𝑛nitalic_n is the number of vertices (De Berg, 2000). We use the Constrained Delauney Triangulation (Rognant et al., 1999) to triangulate the polygons in our dataset.

We evaluate the model on the classification metrics of the dual and the Chamfer distance (Borgefors, 1988). The classification metrics are calculated by comparing the existence of dual edges in the visibility graph of the generated polygon. In the case the model predicts a convex polygon given a dual of a concave polygon. It would have 100% dual accuracy which is misleading. Hence, the Chamfer distance between the points is also evaluated as the dual is unique to the spatial locations of the points. The Chamfer distance is calculated with polygons rotated to have the first edge aligned with the x-axis to account for rotation variations. Table 1 shows the quantitative results of VisDiff with baselines. VisDiff performs much better than all the models in maintaining the triangulation dual while also has the minimum Chamfer distance. We present additional qualitative results in Appendix (Section C.3, Figure 15)

7 Conclusion

In this paper, we studied the problems of Visibility Reconstruction and Visibility Characterization for simple polygons. We presented VisDiff  a diffusion-based approach which first predicts the Signed Distance Function (SDF) associated with a polygonal boundary conditioned on the input visibility graph G𝐺Gitalic_G. The SDF is then used to generate vertex locations of a polygon P𝑃Pitalic_P whose visibility graph is G𝐺Gitalic_G. Our method showed an improvement of 21% on F1-Score compared to baseline approaches on the Visibility Reconstruction problem. We then showed the capability of VisDiff to sample multiple polygons for a single visibility graph G𝐺Gitalic_G as a realization of Visibility Characterization problem. We also presented preliminary results of 80% accuracy on the Visibility Recognition problem. VisDiff has been shown to generalize to accept triangulations as input where it maintains 95% of the dual edges and achieves 4% improvement on Chamfer distance compared to baselines.

At a high-level, our results show that modern neural representations are capable of encoding the space of all polygons in such a way that the distances on the learned manifold are faithful to the combinatorial properties of polygons. In terms of future work, the presented VisDiff architecture represents the SDF as a grid, which creates a bottleneck in terms of computation time and space. In our future work, we will investigate encoding the SDF using more efficient representations such as (Park et al., 2019; Mitchell et al., 2020).

References

  • Alliegro et al. (2023) Antonio Alliegro, Yawar Siddiqui, Tatiana Tommasi, and Matthias Nießner. Polydiff: Generating 3d polygonal meshes with diffusion models. arXiv preprint arXiv:2312.11417, 2023.
  • Ameer et al. (2022) Safwa Ameer, Matt Gibson-Lopez, Erik Krohn, and Qing Wang. On the visibility graphs of pseudo-polygons: recognition and reconstruction. In 18th Scandinavian Symposium and Workshops on Algorithm Theory (SWAT 2022). Schloss Dagstuhl-Leibniz-Zentrum für Informatik, 2022.
  • Auer & Held (1996) Thomas Auer and Martin Held. Heuristics for the generation of random polygons. In CCCG, pp.  38–43, 1996.
  • Boomari & Zarei (2016) Hossein Boomari and Alireza Zarei. Visibility graphs of anchor polygons. In Topics in Theoretical Computer Science: The First IFIP WG 1.8 International Conference, TTCS 2015, Tehran, Iran, August 26-28, 2015, Revised Selected Papers 1, pp.  72–89. Springer, 2016.
  • Boomari et al. (2018) Hossein Boomari, Mojtaba Ostovari, and Alireza Zarei. Recognizing visibility graphs of polygons with holes and internal-external visibility graphs of polygons. arXiv preprint arXiv:1804.05105, 2018.
  • Borgefors (1988) Gunilla Borgefors. Hierarchical chamfer matching: A parametric edge matching algorithm. IEEE Transactions on pattern analysis and machine intelligence, 10(6):849–865, 1988.
  • Chen et al. (2024) Jiacheng Chen, Ruizhi Deng, and Yasutaka Furukawa. Polydiffuse: Polygonal shape reconstruction via guided set diffusion models. Advances in Neural Information Processing Systems, 36, 2024.
  • Cheng et al. (2023) Yen-Chi Cheng, Hsin-Ying Lee, Sergey Tulyakov, Alexander G Schwing, and Liang-Yan Gui. Sdfusion: Multimodal 3d shape completion, reconstruction, and generation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp.  4456–4465, 2023.
  • Chou et al. (2023) Gene Chou, Yuval Bahat, and Felix Heide. Diffusion-sdf: Conditional generative modeling of signed distance functions. In Proceedings of the IEEE/CVF international conference on computer vision, pp.  2262–2272, 2023.
  • Colley et al. (1997) Paul Colley, Anna Lubiw, and Jeremy Spinrad. Visibility graphs of towers. Computational Geometry, 7(3):161–172, 1997.
  • Cui & Wei (2023) Guanyu Cui and Zhewei Wei. Mgnn: Graph neural networks inspired by distance geometry problem. In Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, pp.  335–347, 2023.
  • De Berg (2000) Mark De Berg. Computational geometry: algorithms and applications. Springer Science & Business Media, 2000.
  • Dehghani & Morady (2009) Gholamreza Dehghani and Hossein Morady. An algorithm for visibility graph recognition on planar graphs. In 2009 International Conference on Future Computer and Communication, pp.  518–521. IEEE, 2009.
  • Everett (1990) Hazel Everett. Visibility graph recognition. University of Toronto, 1990.
  • Everett & Corneil (1990) Hazel Everett and Derek G. Corneil. Recognizing visibility graphs of spiral polygons. Journal of Algorithms, 11(1):1–26, 1990.
  • Gao & Han (2012) Fuchang Gao and Lixing Han. Implementing the nelder-mead simplex algorithm with adaptive parameters. Computational Optimization and Applications, 51(1):259–277, 2012.
  • Ghosh & Goswami (2013) Subir K Ghosh and Partha P Goswami. Unsolved problems in visibility graphs of points, segments, and polygons. ACM Computing Surveys (CSUR), 46(2):1–29, 2013.
  • Guibas et al. (1986) Leonidas Guibas, John Hershberger, Daniel Leven, Micha Sharir, and Robert Tarjan. Linear time algorithms for visibility and shortest path problems inside simple polygons. In Proceedings of the second annual symposium on computational geometry, pp.  1–13, 1986.
  • Gupta et al. (2023) Anchit Gupta, Wenhan Xiong, Yixin Nie, Ian Jones, and Barlas Oğuz. 3dgen: Triplane latent diffusion for textured mesh generation. arXiv preprint arXiv:2303.05371, 2023.
  • Li et al. (2024) Zian Li, Xiyuan Wang, Yinan Huang, and Muhan Zhang. Is distance matrix enough for geometric deep learning? Advances in Neural Information Processing Systems, 36, 2024.
  • Mitchell et al. (2020) Eric Mitchell, Selim Engin, Volkan Isler, and Daniel D Lee. Higher-order function networks for learning composable 3d object representations. In International Conference on Learning Representations, 2020.
  • Ngo et al. (2023) Khoa Anh Ngo, Kyuhong Shim, and Byonghyo Shim. Spatial cross-attention for transformer-based image captioning. In ICASSP 2023-2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp.  1–5. IEEE, 2023.
  • Park et al. (2019) Jeong Joon Park, Peter Florence, Julian Straub, Richard Newcombe, and Steven Lovegrove. Deepsdf: Learning continuous signed distance functions for shape representation. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp.  165–174, 2019.
  • Permenter & Yuan (2023) Frank Permenter and Chenyang Yuan. Interpreting and improving diffusion models from an optimization perspective. arXiv preprint arXiv:2306.04848, 2023.
  • Rognant et al. (1999) L Rognant, Jean-Marc Chassery, S Goze, and JG Planes. The delaunay constrained triangulation: the delaunay stable algorithms. In 1999 IEEE International Conference on Information Visualization (Cat. No. PR00210), pp.  147–152. IEEE, 1999.
  • Ronneberger et al. (2015) Olaf Ronneberger, Philipp Fischer, and Thomas Brox. U-net: Convolutional networks for biomedical image segmentation. In Medical image computing and computer-assisted intervention–MICCAI 2015: 18th international conference, Munich, Germany, October 5-9, 2015, proceedings, part III 18, pp.  234–241. Springer, 2015.
  • Shi et al. (2021) Chence Shi, Shitong Luo, Minkai Xu, and Jian Tang. Learning gradient fields for molecular conformation generation. In International conference on machine learning, pp.  9558–9568. PMLR, 2021.
  • Shim et al. (2023) Jaehyeok Shim, Changwoo Kang, and Kyungdon Joo. Diffusion-based signed distance fields for 3d shape generation. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp.  20887–20897, 2023.
  • Siddiqui et al. (2024) Yawar Siddiqui, Antonio Alliegro, Alexey Artemov, Tatiana Tommasi, Daniele Sirigatti, Vladislav Rosov, Angela Dai, and Matthias Nießner. Meshgpt: Generating triangle meshes with decoder-only transformers. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp.  19615–19625, 2024.
  • Silva (2020) André C Silva. On visibility graphs of convex fans and terrains. arXiv preprint arXiv:2001.06436, 2020.
  • Song et al. (2020) Jiaming Song, Chenlin Meng, and Stefano Ermon. Denoising diffusion implicit models. arXiv preprint arXiv:2010.02502, 2020.
  • Tenenbaum et al. (2000) Joshua B Tenenbaum, Vin de Silva, and John C Langford. A global geometric framework for nonlinear dimensionality reduction. science, 290(5500):2319–2323, 2000.
  • Vaswani (2017) A Vaswani. Attention is all you need. Advances in Neural Information Processing Systems, 2017.
  • Veličković et al. (2017) Petar Veličković, Guillem Cucurull, Arantxa Casanova, Adriana Romero, Pietro Lio, and Yoshua Bengio. Graph attention networks. arXiv preprint arXiv:1710.10903, 2017.
  • Wang et al. (2020) Nanyang Wang, Yinda Zhang, Zhuwen Li, Yanwei Fu, Hang Yu, Wei Liu, Xiangyang Xue, and Yu-Gang Jiang. Pixel2mesh: 3d mesh model generation via image guided deformation. IEEE transactions on pattern analysis and machine intelligence, 43(10):3600–3613, 2020.

Appendix

Appendix A Ablation Studies

The two main directions of ablation studies performed for VisDiff are in loss functions and architecture choices. Table 3 shows the results achieved for each choice. It shows that the best results are achieved by estimating the SDF and vertex locations separately. Furthermore, the addition of visibility loss helps to gain 10% F1-Score than using just the MSELoss.

Accuracy \uparrow Precision \uparrow Recall \uparrow F1 \uparrow
a) JE 0.77 0.79 0.58 0.66
b) JEL 0.78 0.80 0.60 0.68
c) VE 0.83 0.74 0.72 0.73
d) VEL 0.85 0.83 0.77 0.80
Table 3: Ablation Studies: (a) Joint estimation of SDF and vertex locations without visibility loss, (b) Joint estimation with visibility loss, (c) Separate estimation of vertex locations and SDF without visibility loss, (d) Separate estimation of vertex locations and SDF with visibility loss (VisDiff). The results are on Visibility Reconstruction problem

Appendix B Dataset Statistics

In this section, we present statistics about our dataset. Figure 9 shows the distribution of the train and in-distribution test set statistics. It shows that our dataset is uniform in diameter of the visibility graph. Figure 10 compares the training dataset with the out-of-distribution testing dataset. It shows that star, convex-fan, and terrain classes have densities different from our train distribution, where density refers to the percentage of edges in the visibility graph.

Refer to caption
(a) Density Comparison Train vs Test
Refer to caption
(b) Diameter Comparison Train vs Test
Figure 9: Train vs in-distribution test set analysis: 9(a)) The density is inversely proportional to the diameter. Uniform sampling of diameter results in bimodal density. 9(b)) Training and testing sets are uniform in terms of the link diameter of the visibility graph.
Refer to caption
(a) Diameter Comparison Train vs Test
Refer to caption
(b) Density Comparison Train vs Test
Figure 10: Out-of-distribution test set analysis: Figure 10(b) shows the density of the anchor and spiral are close to the mean of the bimodal training distribution, making it similar to our training set. The density of the star, convex fan, and terrain differ significantly from the training distribution.

Appendix C Qualitative Results

In this section, we provide additional qualitative results on Visibility Reconstruction, Visibility Characterization, Visibility Recognition, and the Triangulation dual problem (Section 6.6).

C.1 Visibility Reconstruction

We provide additional qualitative results for the Visibility Reconstruction problem. Figures 11 and 12 show the comparison between polygons generated by VisDiff to baselines. The F1-Score shows that VisDiff generates polygons much closer to the visibility graph of the ground truth polygon.

Refer to caption
(a) GT
Refer to caption
(b) 0.73
Refer to caption
(c) 0.65
Refer to caption
(d) 0.55
Refer to caption
(e) 0.54
Refer to caption
(f) 0.54
Refer to caption
(g) 0.49
Figure 11: Visibility reconstruction qualitative results: The top row shows the polygons generated by different methods. The second row shows corresponding visibility graphs of the polygons where green represents the visible edge and red represents the non-visible edge. The polygon results correspond to the following methods - a) Ground Truth, b) VisDiff  c) Sequence Prediction d) GNN, e) Vertex diffusion, f) Encoder-Decoder, g) Optimization. The captions indicate F1-Score
Refer to caption
(a) GT
Refer to caption
(b) 0.80
Refer to caption
(c) 0.71
Refer to caption
(d) 0.62
Refer to caption
(e) 0.54
Refer to caption
(f) 0.56
Refer to caption
(g) 0.50
Figure 12: Visibility reconstruction qualitative results: The top row shows the polygons generated by different methods. The second row shows corresponding visibility graphs of the polygons where green represents the visible edge and red represents the non-visible edge. The polygon results correspond to the following methods - a) Ground Truth, b) VisDiff  c) Sequence Prediction d) GNN, e) Vertex diffusion, f) Encoder-Decoder, g) Optimization. The captions indicate F1-Score

C.2 Visibility Characterization

We provide further qualitative results on the problem of Visibility Characterization where we seek to generate the set of all polygons associated with the same visibility graph. Figures 13 and Figure 14 show the ability of VisDiff to sample multiple polygons given same visibility graph.

Refer to caption
(a) GT
Refer to caption
(b) 0.77
Refer to caption
(c) 0.76
Refer to caption
(d) 0.76
Figure 13: Visibility Characterization: The top row shows multiple polygons generated by VisDiff for the same visibility graph G𝐺Gitalic_G. The second row shows the visibility graph corresponding to the polygons where green represents visible edge and red represents non-visible edge. Subfigure captions indicate the F1-Score
Refer to caption
(a) GT
Refer to caption
(b) 0.76
Refer to caption
(c) 0.75
Refer to caption
(d) 0.80
Figure 14: Visibility Characterization: The top row shows multiple polygons generated by VisDiff for the same visibility graph G𝐺Gitalic_G. The second row shows the visibility graph corresponding to the polygons where green represents visible edge and red represents non-visible edge. Subfigure captions indicate the F1-Score

C.3 Triangulation Duals

We provide qualitative results for the problem of generating polygons from the triangulation duals. Figure 15 shows the performance of VisDiff compared to other baselines. VisDiff maintains 98% of the triangulation edges.

Refer to caption
(a) GT
Refer to caption
(b) 0.98
Refer to caption
(c) 0.76
Refer to caption
(d) 0.92
Refer to caption
(e) 0.91
Refer to caption
(f) 0.81
Refer to caption
(g) 0.69
Figure 15: Triangulation Dual Qualitative Results: Top row shows the polygons generated by different methods. The second row shows corresponding triangulation dual graphs of the polygons where green represents dual edge and red represents absence of dual edge. The captions indicate the F1 Score of the triangulation dual graph compared to the GT. The polygon results correspond the following methods - a) Ground Truth, b) VisDiff  c) Sequence Prediction d) GNN, e) Vertex diffusion, f) Encoder-Decoder, g) Optimization

C.4 Visibility Recognition

We provide additional qualitative results to showcase failure and successful instances of VisDiff on Visibility Recognition problem. Figure 16 shows the output of VisDiff when the input is not a valid polygon (We generate visibility graphs of polygons with holes as invalid input samples). It shows that VisDiff can be used to identify non-valid visibility graphs in most of the scenarios by turning it into a classifier based on the validity of the output.

Refer to caption
(a)
Refer to caption
(b)
Refer to caption
(c)
Refer to caption
(d)
Figure 16: Visibility Recognition: The top row signifies the ground truth non-valid polygon with the hole (red) while the bottom row is the polygons drawn by VisDiff. a) Non-Valid Sample 1: VisDiff predicts it as a non-valid polygon as it is not able to generate any valid polygon, b) Non-Valid Sample 2: VisDiff generates valid polygon where it learns to put points in a V𝑉Vitalic_V shape to account for a hole. It misclassified a non-valid visibility graph as a valid visibility graph. c) Non-Valid Sample 3: VisDiff predicts it as a non-valid polygon as it is not able to generate any valid polygon, d) Non-Valid Sample 4: VisDiff predicts it as a non-valid polygon as it is not able to generate any valid polygon