Solutions to Elliptic and Parabolic Problems via Finite Difference Based Unsupervised Small Linear Convolutional Neural Networks

Adrian Celaya Keegan Kirk David Fuentes Beatrice Riviere
Abstract

In recent years, there has been a growing interest in leveraging deep learning and neural networks to address scientific problems, particularly in solving partial differential equations (PDEs). However, many neural network-based methods like PINNs rely on auto differentiation and sampling collocation points, leading to a lack of interpretability and lower accuracy than traditional numerical methods. As a result, we propose a fully unsupervised approach, requiring no training data, to estimate finite difference solutions for PDEs directly via small linear convolutional neural networks. Our proposed approach uses substantially fewer parameters than similar finite difference-based approaches while also demonstrating comparable accuracy to the true solution for several selected elliptic and parabolic problems compared to the finite difference method.

keywords:
Convolutional neural networks , unsupervised learning , partial differential equations , finite difference
journal: Computers and Mathematics with Applications
\affiliation

[rice]organization=Rice University, city=Houston, postcode=77005, state=TX, country=USA

\affiliation

[mda]organization=The University of Texas MD Anderson Cancer Center, city=Houston, postcode=77003, state=TX, country=USA

1 Introduction

Partial differential equations (PDEs) of elliptic and parabolic type are ubiquitous in the mathematical modeling of many physical phenomena and thus see wide application to many real-world problems. Classical numerical methods (i.e., finite difference and element methods) introduce a computational mesh over which we define representations of differential operators with matrices. This results in large linear systems whose efficient solution presents several computational challenges. Physics informed neural networks (PINNs) have recently gained popularity in solving PDEs [1, 2, 3, 4, 5, 6]. This approach uses the universal approximation property of deep neural networks to train a surrogate model (i.e., the neural network) of the solution to a given PDE.

A vital aspect of PINNs is using auto differentiation to compute a residual-based loss function for a set of sampled collocation points [7]. While PINNs represent a mesh-free solution and have shown promise in multiple fields like biology [8, 9], meteorology [10], and optimal control [11, 12], this reliance on auto differentiation and sampling results in a lack of interpretability and lower accuracy than traditional numerical methods [13]. Choosing collocation points involves sampling random points from a uniform distribution on the computational domain, and it is unclear how many points are required to achieve acceptable results for a given PDE [13, 7, 14]. Additionally, while networks can express very complex functions, determining the appropriate architecture and number of parameters to solve a specific PDE can be challenging [15, 16, 17]. If a specific PDE requires a neural network with many parameters, it will increase the memory and time costs required for training. Finally, the generalization capability of PINNs to points outside their training domains is unclear, and a topic of ongoing research [18].

Refer to caption
Figure 1: Properties of classical PINNs, numerics informed neural networks (NINNs), and numerical PDEs.

While traditional numerical methods (such as finite difference and finite element methods) are computationally expensive, they are highly structured and can guide the design of neural network-based methods so that they output explainable solutions and are more computationally efficient. If we constrain a neural network-based approach to mirror a numerical method, it is reasonable to expect solutions from the neural network to be close to those of a numerical method with known error bounds. Additionally, the structure of numerical methods makes it possible to get away with using simpler neural networks that converge to small loss values. Examples of such neural network-based methods include the Deep Ritz and Deep Galerkin methods [19, 20, 21]. Both of these methods incorporate aspects of finite element methods, allowing for faster convergence during training by utilizing the structure and properties of the numerical method that inspired them.

This work proposes a finite difference-based approach to approximating solutions to elliptic and parabolic PDEs that utilizes small, linear convolutional neural networks (CNNs). Figure 1 displays the properties of our proposed method which we call a “numerics informed neural network” (NINN), classical PINNs, and numerical PDEs. Our choice of architecture, inspired by the PocketNet paradigm [22], uses CNNs that closely mirror geometric multigrid methods to almost exactly recover finite difference solutions with fewer parameters than other CNN-based methods. To the best of our knowledge, we are also the first to propose using linear CNNs, which do not use non-linear activation functions, to learn finite difference solutions.

Finite difference methods have inspired the construction of various neural networks [13, 23], but these methods require training data. The recent work [24] solves elliptic PDEs with constant diffusion, and it shares the following similarities with our proposed numerical method: no training data is needed, and the loss function is derived from the five-point stencil of the finite difference approximation. However, our loss function is different as it incorporates the Dirichlet boundary condition in a weighted fashion, whereas [24] directly enforces the boundary condition outside of the loss function (more details in Section 4). We also define the method for elliptic PDEs with non-constant diffusion coefficients and extend it to time-dependent problems. Another key feature of our method is using small neural networks, which makes it computationally efficient.

An outline of the paper is as follows. Section 2 introduces the model problem, the network architecture and the loss function. Section 3 describes the algorithms for both steady-state and time-dependent problems. Results and discussion are presented in Section 4. Conclusions follow.

2 Material and Methods

We propose a fully unsupervised method for estimating finite differences solutions to partial differential equations via convolutional neural networks. In contrast to existing deep learning-based methods, our approach is fully unsupervised. In other words, our approach does not require training data and estimates the solution to a given PDE directly via the optimization process, also called the training process.

2.1 Elliptic Problems

Let Ω2Ωsuperscript2\Omega\subset\mathbb{R}^{2}roman_Ω ⊂ blackboard_R start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT be an open set. Given a function f:Ω:𝑓Ωf:\Omega\rightarrow\mathbb{R}italic_f : roman_Ω → blackboard_R and diffusion coefficients κ𝜅\kappaitalic_κ, we consider the 2D Poisson problem with solution u:Ω¯:𝑢¯Ωu:\overline{\Omega}\rightarrow\mathbb{R}italic_u : over¯ start_ARG roman_Ω end_ARG → blackboard_R such that

(κu)=f in Ω,u=g on Ω.formulae-sequence𝜅𝑢𝑓 in Ω𝑢𝑔 on Ω\displaystyle\begin{split}-\nabla\cdot(\kappa\nabla u)&=f\text{ in }\Omega,\\ u&=g\text{ on }\partial\Omega.\end{split}start_ROW start_CELL - ∇ ⋅ ( italic_κ ∇ italic_u ) end_CELL start_CELL = italic_f in roman_Ω , end_CELL end_ROW start_ROW start_CELL italic_u end_CELL start_CELL = italic_g on ∂ roman_Ω . end_CELL end_ROW (1)

We begin with the case when κ𝜅\kappaitalic_κ is constant and assume that κ=1𝜅1\kappa=1italic_κ = 1. For readibility, we assume that ΩΩ\Omegaroman_Ω is the square domain (0,L)2superscript0𝐿2(0,L)^{2}( 0 , italic_L ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT, that is partitioned into a uniform grid of N×N𝑁𝑁N\times Nitalic_N × italic_N squares with size h=L/N𝐿𝑁h=L/Nitalic_h = italic_L / italic_N. The methodology presented in the paper can be easily extended to rectangular domains. We first recall the standard finite difference method based on the five-point stencil, applied to (1). Let 𝒱0superscript𝒱0\mathcal{V}^{0}caligraphic_V start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT (resp. 𝒱superscript𝒱\mathcal{V}^{\partial}caligraphic_V start_POSTSUPERSCRIPT ∂ end_POSTSUPERSCRIPT) be the set of indices (i,j)𝑖𝑗(i,j)( italic_i , italic_j ) such that the point (xi,yj)=(ih,jh)subscript𝑥𝑖subscript𝑦𝑗𝑖𝑗(x_{i},y_{j})=(ih,jh)( italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_y start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ) = ( italic_i italic_h , italic_j italic_h ) belongs to the interior (resp. boundary) of ΩΩ\Omegaroman_Ω.

4ui,jui+1,jui1,jui,j+1ui,j1h2=f(xi,yj),(i,j)𝒱0,formulae-sequence4subscript𝑢𝑖𝑗subscript𝑢𝑖1𝑗subscript𝑢𝑖1𝑗subscript𝑢𝑖𝑗1subscript𝑢𝑖𝑗1superscript2𝑓subscript𝑥𝑖subscript𝑦𝑗for-all𝑖𝑗superscript𝒱0\displaystyle\frac{4u_{i,j}-u_{i+1,j}-u_{i-1,j}-u_{i,j+1}-u_{i,j-1}}{h^{2}}=f(% x_{i},y_{j}),\quad\forall(i,j)\in\mathcal{V}^{0},divide start_ARG 4 italic_u start_POSTSUBSCRIPT italic_i , italic_j end_POSTSUBSCRIPT - italic_u start_POSTSUBSCRIPT italic_i + 1 , italic_j end_POSTSUBSCRIPT - italic_u start_POSTSUBSCRIPT italic_i - 1 , italic_j end_POSTSUBSCRIPT - italic_u start_POSTSUBSCRIPT italic_i , italic_j + 1 end_POSTSUBSCRIPT - italic_u start_POSTSUBSCRIPT italic_i , italic_j - 1 end_POSTSUBSCRIPT end_ARG start_ARG italic_h start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG = italic_f ( italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_y start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ) , ∀ ( italic_i , italic_j ) ∈ caligraphic_V start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT , (2)
ui,j=g(xi,yj),(i,i)𝒱.formulae-sequencesubscript𝑢𝑖𝑗𝑔subscript𝑥𝑖subscript𝑦𝑗for-all𝑖𝑖superscript𝒱\displaystyle u_{i,j}=g(x_{i},y_{j}),\quad\forall(i,i)\in\mathcal{V}^{\partial}.italic_u start_POSTSUBSCRIPT italic_i , italic_j end_POSTSUBSCRIPT = italic_g ( italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_y start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ) , ∀ ( italic_i , italic_i ) ∈ caligraphic_V start_POSTSUPERSCRIPT ∂ end_POSTSUPERSCRIPT . (3)

The finite difference solution uhsubscript𝑢u_{h}italic_u start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT is a vector with entries ui,jsubscript𝑢𝑖𝑗u_{i,j}italic_u start_POSTSUBSCRIPT italic_i , italic_j end_POSTSUBSCRIPT satisfying (2)-(3) and it is known that for smooth enough solutions u𝑢uitalic_u and small enough hhitalic_h, the value ui,jsubscript𝑢𝑖𝑗u_{i,j}italic_u start_POSTSUBSCRIPT italic_i , italic_j end_POSTSUBSCRIPT is a good approximation of u(xi,yj)𝑢subscript𝑥𝑖subscript𝑦𝑗u(x_{i},y_{j})italic_u ( italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_y start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ). One can rewrite (2) in terms of a discrete convolutional operator \star and convolutional kernel KΔsubscript𝐾ΔK_{\Delta}italic_K start_POSTSUBSCRIPT roman_Δ end_POSTSUBSCRIPT defined as:

KΔ=1h2[010141010].subscript𝐾Δ1superscript2matrix010141010\displaystyle K_{\Delta}=\frac{1}{h^{2}}\begin{bmatrix}0&-1&0\\ -1&4&-1\\ 0&-1&0\end{bmatrix}.italic_K start_POSTSUBSCRIPT roman_Δ end_POSTSUBSCRIPT = divide start_ARG 1 end_ARG start_ARG italic_h start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG [ start_ARG start_ROW start_CELL 0 end_CELL start_CELL - 1 end_CELL start_CELL 0 end_CELL end_ROW start_ROW start_CELL - 1 end_CELL start_CELL 4 end_CELL start_CELL - 1 end_CELL end_ROW start_ROW start_CELL 0 end_CELL start_CELL - 1 end_CELL start_CELL 0 end_CELL end_ROW end_ARG ] . (4)

We use the shorthand notation fi,j=f(xi,yj)subscript𝑓𝑖𝑗𝑓subscript𝑥𝑖subscript𝑦𝑗f_{i,j}=f(x_{i},y_{j})italic_f start_POSTSUBSCRIPT italic_i , italic_j end_POSTSUBSCRIPT = italic_f ( italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_y start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ). The equivalent form for (2) is

(KΔuh)i,j=fi,j,(i,j)𝒱0.formulae-sequencesubscriptsubscript𝐾Δsubscript𝑢𝑖𝑗subscript𝑓𝑖𝑗for-all𝑖𝑗superscript𝒱0\displaystyle(K_{\Delta}\star u_{h})_{i,j}=f_{i,j},\quad\forall(i,j)\in% \mathcal{V}^{0}.( italic_K start_POSTSUBSCRIPT roman_Δ end_POSTSUBSCRIPT ⋆ italic_u start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT ) start_POSTSUBSCRIPT italic_i , italic_j end_POSTSUBSCRIPT = italic_f start_POSTSUBSCRIPT italic_i , italic_j end_POSTSUBSCRIPT , ∀ ( italic_i , italic_j ) ∈ caligraphic_V start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT . (5)

For a given kernel K3×3𝐾superscript33K\in\mathbb{R}^{3\times 3}italic_K ∈ blackboard_R start_POSTSUPERSCRIPT 3 × 3 end_POSTSUPERSCRIPT, the convolution operator \star is defined by

(Kuh)i,j=p=11q=11Kp,qui+p,j+q.subscript𝐾subscript𝑢𝑖𝑗superscriptsubscript𝑝11superscriptsubscript𝑞11subscript𝐾𝑝𝑞subscript𝑢𝑖𝑝𝑗𝑞\displaystyle(K\star u_{h})_{i,j}=\sum_{p=-1}^{1}\sum_{q=-1}^{1}K_{p,q}\,u_{i+% p,j+q}.( italic_K ⋆ italic_u start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT ) start_POSTSUBSCRIPT italic_i , italic_j end_POSTSUBSCRIPT = ∑ start_POSTSUBSCRIPT italic_p = - 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT ∑ start_POSTSUBSCRIPT italic_q = - 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT italic_K start_POSTSUBSCRIPT italic_p , italic_q end_POSTSUBSCRIPT italic_u start_POSTSUBSCRIPT italic_i + italic_p , italic_j + italic_q end_POSTSUBSCRIPT . (6)

With this convolution-based discretization, we can now reformulate the finite difference approximation of (1) as a convex optimization problem given by {dmath} arg min_u_h ∑_(i,j) ∈V^0 ((K_Δ ⋆u_h)_i,j -f_i,j)^2 + ∑_(i,j) ∈V^∂ (( u_h )_i,j - g_i,j)^2. We obtain an approximate solution u^^𝑢\hat{u}over^ start_ARG italic_u end_ARG to (2.1) by training a neural network 𝒩θ:N×NN×N:subscript𝒩𝜃superscript𝑁𝑁superscript𝑁𝑁\mathcal{N}_{\theta}:\mathbb{R}^{N\times N}\rightarrow\mathbb{R}^{N\times N}caligraphic_N start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT : blackboard_R start_POSTSUPERSCRIPT italic_N × italic_N end_POSTSUPERSCRIPT → blackboard_R start_POSTSUPERSCRIPT italic_N × italic_N end_POSTSUPERSCRIPT with trainable parameters 𝜽M𝜽superscript𝑀\boldsymbol{\theta}\in\mathbb{R}^{M}bold_italic_θ ∈ blackboard_R start_POSTSUPERSCRIPT italic_M end_POSTSUPERSCRIPT. We propose to train 𝒩θsubscript𝒩𝜃\mathcal{N}_{\theta}caligraphic_N start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT using the unsupervised loss function α:N×N:subscript𝛼superscript𝑁𝑁\mathcal{L}_{\alpha}:\mathbb{R}^{N\times N}\rightarrow\mathbb{R}caligraphic_L start_POSTSUBSCRIPT italic_α end_POSTSUBSCRIPT : blackboard_R start_POSTSUPERSCRIPT italic_N × italic_N end_POSTSUPERSCRIPT → blackboard_R given by

α(u^)=α(i,j)𝒱0((KΔu^)i,jfi,j)2+(1α)(i,j)𝒱(u^i,jgi,j)2,subscript𝛼^𝑢𝛼subscript𝑖𝑗superscript𝒱0superscriptsubscriptsubscript𝐾Δ^𝑢𝑖𝑗subscript𝑓𝑖𝑗21𝛼subscript𝑖𝑗superscript𝒱superscriptsubscript^𝑢𝑖𝑗subscript𝑔𝑖𝑗2\displaystyle\mathcal{L}_{\alpha}(\hat{u})=\alpha\sum_{(i,j)\in\mathcal{V}^{0}% }\left(\left(K_{\Delta}\star\hat{u}\right)_{i,j}-f_{i,j}\right)^{2}+(1-\alpha)% \sum_{(i,j)\in\mathcal{V}^{\partial}}\left(\hat{u}_{i,j}-g_{i,j}\right)^{2},caligraphic_L start_POSTSUBSCRIPT italic_α end_POSTSUBSCRIPT ( over^ start_ARG italic_u end_ARG ) = italic_α ∑ start_POSTSUBSCRIPT ( italic_i , italic_j ) ∈ caligraphic_V start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ( ( italic_K start_POSTSUBSCRIPT roman_Δ end_POSTSUBSCRIPT ⋆ over^ start_ARG italic_u end_ARG ) start_POSTSUBSCRIPT italic_i , italic_j end_POSTSUBSCRIPT - italic_f start_POSTSUBSCRIPT italic_i , italic_j end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + ( 1 - italic_α ) ∑ start_POSTSUBSCRIPT ( italic_i , italic_j ) ∈ caligraphic_V start_POSTSUPERSCRIPT ∂ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ( over^ start_ARG italic_u end_ARG start_POSTSUBSCRIPT italic_i , italic_j end_POSTSUBSCRIPT - italic_g start_POSTSUBSCRIPT italic_i , italic_j end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT , (7)

where α=h2/4𝛼superscript24\alpha=h^{2}/4italic_α = italic_h start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT / 4 is a weighting term. The idea behind this loss function is to directly estimate the finite difference approximation to (1) without using any training data. Specifically, we use f𝑓fitalic_f as the input to the CNN and we perform \mathcal{M}caligraphic_M updates of the parameters in the neural network 𝒩𝜽subscript𝒩𝜽\mathcal{N}_{\boldsymbol{\theta}}caligraphic_N start_POSTSUBSCRIPT bold_italic_θ end_POSTSUBSCRIPT, using the loss function defined above. Algorithm 1 in Section 3 outlines this training procedure in more detail.

In the case where κ𝜅\kappaitalic_κ is piecewise constant, κ=(κi,j)𝜅subscript𝜅𝑖𝑗\kappa=(\kappa_{i,j})italic_κ = ( italic_κ start_POSTSUBSCRIPT italic_i , italic_j end_POSTSUBSCRIPT ), we use the following finite difference discretization

1h2(κi+12,j(ui+1,jui,j)κi12,j(ui,jui1,j))1h2(κi,j+12(ui,j+1ui,j)κi,j12(ui,jui,j1))=fi,j,1superscript2subscript𝜅𝑖12𝑗subscript𝑢𝑖1𝑗subscript𝑢𝑖𝑗subscript𝜅𝑖12𝑗subscript𝑢𝑖𝑗subscript𝑢𝑖1𝑗1superscript2subscript𝜅𝑖𝑗12subscript𝑢𝑖𝑗1subscript𝑢𝑖𝑗subscript𝜅𝑖𝑗12subscript𝑢𝑖𝑗subscript𝑢𝑖𝑗1subscript𝑓𝑖𝑗-\frac{1}{h^{2}}\left(\kappa_{i+\frac{1}{2},j}(u_{i+1,j}-u_{i,j})-\kappa_{i-% \frac{1}{2},j}(u_{i,j}-u_{i-1,j})\right)\\ -\frac{1}{h^{2}}\left(\kappa_{i,j+\frac{1}{2}}(u_{i,j+1}-u_{i,j})-\kappa_{i,j-% \frac{1}{2}}(u_{i,j}-u_{i,j-1})\right)=f_{i,j},start_ROW start_CELL - divide start_ARG 1 end_ARG start_ARG italic_h start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG ( italic_κ start_POSTSUBSCRIPT italic_i + divide start_ARG 1 end_ARG start_ARG 2 end_ARG , italic_j end_POSTSUBSCRIPT ( italic_u start_POSTSUBSCRIPT italic_i + 1 , italic_j end_POSTSUBSCRIPT - italic_u start_POSTSUBSCRIPT italic_i , italic_j end_POSTSUBSCRIPT ) - italic_κ start_POSTSUBSCRIPT italic_i - divide start_ARG 1 end_ARG start_ARG 2 end_ARG , italic_j end_POSTSUBSCRIPT ( italic_u start_POSTSUBSCRIPT italic_i , italic_j end_POSTSUBSCRIPT - italic_u start_POSTSUBSCRIPT italic_i - 1 , italic_j end_POSTSUBSCRIPT ) ) end_CELL end_ROW start_ROW start_CELL - divide start_ARG 1 end_ARG start_ARG italic_h start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG ( italic_κ start_POSTSUBSCRIPT italic_i , italic_j + divide start_ARG 1 end_ARG start_ARG 2 end_ARG end_POSTSUBSCRIPT ( italic_u start_POSTSUBSCRIPT italic_i , italic_j + 1 end_POSTSUBSCRIPT - italic_u start_POSTSUBSCRIPT italic_i , italic_j end_POSTSUBSCRIPT ) - italic_κ start_POSTSUBSCRIPT italic_i , italic_j - divide start_ARG 1 end_ARG start_ARG 2 end_ARG end_POSTSUBSCRIPT ( italic_u start_POSTSUBSCRIPT italic_i , italic_j end_POSTSUBSCRIPT - italic_u start_POSTSUBSCRIPT italic_i , italic_j - 1 end_POSTSUBSCRIPT ) ) = italic_f start_POSTSUBSCRIPT italic_i , italic_j end_POSTSUBSCRIPT , end_CELL end_ROW (8)

where the interface values κi+12,jsubscript𝜅𝑖12𝑗\kappa_{i+\frac{1}{2},j}italic_κ start_POSTSUBSCRIPT italic_i + divide start_ARG 1 end_ARG start_ARG 2 end_ARG , italic_j end_POSTSUBSCRIPT, κi12,jsubscript𝜅𝑖12𝑗\kappa_{i-\frac{1}{2},j}italic_κ start_POSTSUBSCRIPT italic_i - divide start_ARG 1 end_ARG start_ARG 2 end_ARG , italic_j end_POSTSUBSCRIPT, κi,j+12subscript𝜅𝑖𝑗12\kappa_{i,j+\frac{1}{2}}italic_κ start_POSTSUBSCRIPT italic_i , italic_j + divide start_ARG 1 end_ARG start_ARG 2 end_ARG end_POSTSUBSCRIPT, and κi,j12subscript𝜅𝑖𝑗12\kappa_{i,j-\frac{1}{2}}italic_κ start_POSTSUBSCRIPT italic_i , italic_j - divide start_ARG 1 end_ARG start_ARG 2 end_ARG end_POSTSUBSCRIPT are the harmonic averages of κi,jsubscript𝜅𝑖𝑗\kappa_{i,j}italic_κ start_POSTSUBSCRIPT italic_i , italic_j end_POSTSUBSCRIPT given by

κi+12,j=21κi+1,j+1κi,jκi12,j=21κi,j+1κi1,jκi,j+12=21κi,j+1+1κi,jκi,j12=21κi,j+1κi,j1.formulae-sequenceformulae-sequencesubscript𝜅𝑖12𝑗21subscript𝜅𝑖1𝑗1subscript𝜅𝑖𝑗subscript𝜅𝑖12𝑗21subscript𝜅𝑖𝑗1subscript𝜅𝑖1𝑗subscript𝜅𝑖𝑗1221subscript𝜅𝑖𝑗11subscript𝜅𝑖𝑗subscript𝜅𝑖𝑗1221subscript𝜅𝑖𝑗1subscript𝜅𝑖𝑗1\displaystyle\begin{split}\kappa_{i+\frac{1}{2},j}&=\frac{2}{\frac{1}{\kappa_{% i+1,j}}+\frac{1}{\kappa_{i,j}}}\hskip 51.21495pt\kappa_{i-\frac{1}{2},j}=\frac% {2}{\frac{1}{\kappa_{i,j}}+\frac{1}{\kappa_{i-1,j}}}\\ \kappa_{i,j+\frac{1}{2}}&=\frac{2}{\frac{1}{\kappa_{i,j+1}}+\frac{1}{\kappa_{i% ,j}}}\hskip 51.21495pt\kappa_{i,j-\frac{1}{2}}=\frac{2}{\frac{1}{\kappa_{i,j}}% +\frac{1}{\kappa_{i,j-1}}}.\end{split}start_ROW start_CELL italic_κ start_POSTSUBSCRIPT italic_i + divide start_ARG 1 end_ARG start_ARG 2 end_ARG , italic_j end_POSTSUBSCRIPT end_CELL start_CELL = divide start_ARG 2 end_ARG start_ARG divide start_ARG 1 end_ARG start_ARG italic_κ start_POSTSUBSCRIPT italic_i + 1 , italic_j end_POSTSUBSCRIPT end_ARG + divide start_ARG 1 end_ARG start_ARG italic_κ start_POSTSUBSCRIPT italic_i , italic_j end_POSTSUBSCRIPT end_ARG end_ARG italic_κ start_POSTSUBSCRIPT italic_i - divide start_ARG 1 end_ARG start_ARG 2 end_ARG , italic_j end_POSTSUBSCRIPT = divide start_ARG 2 end_ARG start_ARG divide start_ARG 1 end_ARG start_ARG italic_κ start_POSTSUBSCRIPT italic_i , italic_j end_POSTSUBSCRIPT end_ARG + divide start_ARG 1 end_ARG start_ARG italic_κ start_POSTSUBSCRIPT italic_i - 1 , italic_j end_POSTSUBSCRIPT end_ARG end_ARG end_CELL end_ROW start_ROW start_CELL italic_κ start_POSTSUBSCRIPT italic_i , italic_j + divide start_ARG 1 end_ARG start_ARG 2 end_ARG end_POSTSUBSCRIPT end_CELL start_CELL = divide start_ARG 2 end_ARG start_ARG divide start_ARG 1 end_ARG start_ARG italic_κ start_POSTSUBSCRIPT italic_i , italic_j + 1 end_POSTSUBSCRIPT end_ARG + divide start_ARG 1 end_ARG start_ARG italic_κ start_POSTSUBSCRIPT italic_i , italic_j end_POSTSUBSCRIPT end_ARG end_ARG italic_κ start_POSTSUBSCRIPT italic_i , italic_j - divide start_ARG 1 end_ARG start_ARG 2 end_ARG end_POSTSUBSCRIPT = divide start_ARG 2 end_ARG start_ARG divide start_ARG 1 end_ARG start_ARG italic_κ start_POSTSUBSCRIPT italic_i , italic_j end_POSTSUBSCRIPT end_ARG + divide start_ARG 1 end_ARG start_ARG italic_κ start_POSTSUBSCRIPT italic_i , italic_j - 1 end_POSTSUBSCRIPT end_ARG end_ARG . end_CELL end_ROW (9)

The interface values necessitate the imposition of a half-grid (also called dual grid) onto our original grid. As a result, we modify our neural network prediction via a dilation operator D:N×N(2N1)×(2N1):𝐷superscript𝑁𝑁superscript2𝑁12𝑁1D:\mathbb{R}^{N\times N}\rightarrow\mathbb{R}^{(2N-1)\times(2N-1)}italic_D : blackboard_R start_POSTSUPERSCRIPT italic_N × italic_N end_POSTSUPERSCRIPT → blackboard_R start_POSTSUPERSCRIPT ( 2 italic_N - 1 ) × ( 2 italic_N - 1 ) end_POSTSUPERSCRIPT such that

D(u^)i,j={u^1+i12,1+j12, if i and j are odd0,otherwise.𝐷subscript^𝑢𝑖𝑗casessubscript^𝑢1𝑖121𝑗12 if 𝑖 and 𝑗 are odd0otherwise.\displaystyle D(\hat{u})_{i,j}=\begin{cases}\hat{u}_{1+\frac{i-1}{2},1+\frac{j% -1}{2}},&\text{ if }i\text{ and }j\text{ are odd}\\ 0,&\text{otherwise.}\end{cases}italic_D ( over^ start_ARG italic_u end_ARG ) start_POSTSUBSCRIPT italic_i , italic_j end_POSTSUBSCRIPT = { start_ROW start_CELL over^ start_ARG italic_u end_ARG start_POSTSUBSCRIPT 1 + divide start_ARG italic_i - 1 end_ARG start_ARG 2 end_ARG , 1 + divide start_ARG italic_j - 1 end_ARG start_ARG 2 end_ARG end_POSTSUBSCRIPT , end_CELL start_CELL if italic_i and italic_j are odd end_CELL end_ROW start_ROW start_CELL 0 , end_CELL start_CELL otherwise. end_CELL end_ROW (10)

Additionally, we map κ𝜅\kappaitalic_κ from the original grid to the dual grid by applying the following steps:
Step 1: apply the dilation operator to obtain κ(2N1)×(2N1)superscript𝜅superscript2𝑁12𝑁1\kappa^{\ast}\in\mathbb{R}^{(2N-1)\times(2N-1)}italic_κ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT ( 2 italic_N - 1 ) × ( 2 italic_N - 1 ) end_POSTSUPERSCRIPT:

κ=D(κ).superscript𝜅𝐷𝜅\displaystyle\kappa^{\ast}=D(\kappa).italic_κ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT = italic_D ( italic_κ ) . (11)

Step 2: update the values of κsuperscript𝜅\kappa^{\ast}italic_κ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT on the half-grid:

κi,j={2(1κi1,j+1κi+1,j)1, if i odd and j even2(1κi,j1+1κi,j+1)1,if i even and j oddsubscriptsuperscript𝜅𝑖𝑗cases2superscript1subscriptsuperscript𝜅𝑖1𝑗1subscriptsuperscript𝜅𝑖1𝑗1 if 𝑖 odd and 𝑗 even2superscript1subscriptsuperscript𝜅𝑖𝑗11subscriptsuperscript𝜅𝑖𝑗11if 𝑖 even and 𝑗 odd\displaystyle\kappa^{\ast}_{i,j}=\begin{cases}2\left(\frac{1}{\kappa^{\ast}_{i% -1,j}}+\frac{1}{\kappa^{\ast}_{i+1,j}}\right)^{-1},&\text{ if }i\text{ odd and% }j\text{ even}\\ 2\left(\frac{1}{\kappa^{\ast}_{i,j-1}}+\frac{1}{\kappa^{\ast}_{i,j+1}}\right)^% {-1},&\text{if }i\text{ even and }j\text{ odd}\end{cases}italic_κ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i , italic_j end_POSTSUBSCRIPT = { start_ROW start_CELL 2 ( divide start_ARG 1 end_ARG start_ARG italic_κ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i - 1 , italic_j end_POSTSUBSCRIPT end_ARG + divide start_ARG 1 end_ARG start_ARG italic_κ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i + 1 , italic_j end_POSTSUBSCRIPT end_ARG ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT , end_CELL start_CELL if italic_i odd and italic_j even end_CELL end_ROW start_ROW start_CELL 2 ( divide start_ARG 1 end_ARG start_ARG italic_κ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i , italic_j - 1 end_POSTSUBSCRIPT end_ARG + divide start_ARG 1 end_ARG start_ARG italic_κ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i , italic_j + 1 end_POSTSUBSCRIPT end_ARG ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT , end_CELL start_CELL if italic_i even and italic_j odd end_CELL end_ROW (12)

Step 3: set to zero the values of κsuperscript𝜅\kappa^{\ast}italic_κ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT that would correspond to the values on the original grid:

κ~=(𝟏(2N1)×(2N1)D(𝟏N×N))κ.~𝜅direct-productsubscript12𝑁12𝑁1𝐷subscript1𝑁𝑁superscript𝜅\displaystyle\tilde{\kappa}=\left(\mathbf{1}_{(2N-1)\times(2N-1)}-D(\mathbf{1}% _{N\times N})\right)\odot\kappa^{\ast}.over~ start_ARG italic_κ end_ARG = ( bold_1 start_POSTSUBSCRIPT ( 2 italic_N - 1 ) × ( 2 italic_N - 1 ) end_POSTSUBSCRIPT - italic_D ( bold_1 start_POSTSUBSCRIPT italic_N × italic_N end_POSTSUBSCRIPT ) ) ⊙ italic_κ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT . (13)

In the equation above, 𝟏N×Nsubscript1𝑁𝑁\mathbf{1}_{N\times N}bold_1 start_POSTSUBSCRIPT italic_N × italic_N end_POSTSUBSCRIPT is an N×N𝑁𝑁N\times Nitalic_N × italic_N matrix with all entries equal to one, and direct-product\odot is the Hadamard or pointwise product.

We now introduce the following convolution kernels

T=[0010000000001000000000000]P=[0000000100000000000000000]T=TTP=PTT=[0000000000001000000000100]P=[0000000000000000010000000]T=TTP=PT,formulae-sequencesubscript𝑇matrix0010000000001000000000000subscript𝑃matrix0000000100000000000000000subscript𝑇superscriptsubscript𝑇𝑇subscript𝑃superscriptsubscript𝑃𝑇subscript𝑇matrix0000000000001000000000100subscript𝑃matrix0000000000000000010000000subscript𝑇superscriptsubscript𝑇𝑇subscript𝑃superscriptsubscript𝑃𝑇\displaystyle\begin{split}T_{\uparrow}&=\begin{bmatrix}0&0&1&0&0\\ 0&0&0&0&0\\ 0&0&-1&0&0\\ 0&0&0&0&0\\ 0&0&0&0&0\end{bmatrix}\hskip 8.53581ptP_{\uparrow}=\begin{bmatrix}0&0&0&0&0\\ 0&0&1&0&0\\ 0&0&0&0&0\\ 0&0&0&0&0\\ 0&0&0&0&0\end{bmatrix}\\ T_{\leftarrow}&=-T_{\uparrow}^{T}\hskip 76.68016ptP_{\leftarrow}=P_{\uparrow}^% {T}\\ T_{\downarrow}&=\begin{bmatrix}0&0&0&0&0\\ 0&0&0&0&0\\ 0&0&1&0&0\\ 0&0&0&0&0\\ 0&0&-1&0&0\end{bmatrix}\hskip 8.53581ptP_{\downarrow}=\begin{bmatrix}0&0&0&0&0% \\ 0&0&0&0&0\\ 0&0&0&0&0\\ 0&0&1&0&0\\ 0&0&0&0&0\end{bmatrix}\\ T_{\rightarrow}&=-T_{\downarrow}^{T}\hskip 76.68016ptP_{\rightarrow}=P_{% \downarrow}^{T},\\ \end{split}start_ROW start_CELL italic_T start_POSTSUBSCRIPT ↑ end_POSTSUBSCRIPT end_CELL start_CELL = [ start_ARG start_ROW start_CELL 0 end_CELL start_CELL 0 end_CELL start_CELL 1 end_CELL start_CELL 0 end_CELL start_CELL 0 end_CELL end_ROW start_ROW start_CELL 0 end_CELL start_CELL 0 end_CELL start_CELL 0 end_CELL start_CELL 0 end_CELL start_CELL 0 end_CELL end_ROW start_ROW start_CELL 0 end_CELL start_CELL 0 end_CELL start_CELL - 1 end_CELL start_CELL 0 end_CELL start_CELL 0 end_CELL end_ROW start_ROW start_CELL 0 end_CELL start_CELL 0 end_CELL start_CELL 0 end_CELL start_CELL 0 end_CELL start_CELL 0 end_CELL end_ROW start_ROW start_CELL 0 end_CELL start_CELL 0 end_CELL start_CELL 0 end_CELL start_CELL 0 end_CELL start_CELL 0 end_CELL end_ROW end_ARG ] italic_P start_POSTSUBSCRIPT ↑ end_POSTSUBSCRIPT = [ start_ARG start_ROW start_CELL 0 end_CELL start_CELL 0 end_CELL start_CELL 0 end_CELL start_CELL 0 end_CELL start_CELL 0 end_CELL end_ROW start_ROW start_CELL 0 end_CELL start_CELL 0 end_CELL start_CELL 1 end_CELL start_CELL 0 end_CELL start_CELL 0 end_CELL end_ROW start_ROW start_CELL 0 end_CELL start_CELL 0 end_CELL start_CELL 0 end_CELL start_CELL 0 end_CELL start_CELL 0 end_CELL end_ROW start_ROW start_CELL 0 end_CELL start_CELL 0 end_CELL start_CELL 0 end_CELL start_CELL 0 end_CELL start_CELL 0 end_CELL end_ROW start_ROW start_CELL 0 end_CELL start_CELL 0 end_CELL start_CELL 0 end_CELL start_CELL 0 end_CELL start_CELL 0 end_CELL end_ROW end_ARG ] end_CELL end_ROW start_ROW start_CELL italic_T start_POSTSUBSCRIPT ← end_POSTSUBSCRIPT end_CELL start_CELL = - italic_T start_POSTSUBSCRIPT ↑ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT italic_P start_POSTSUBSCRIPT ← end_POSTSUBSCRIPT = italic_P start_POSTSUBSCRIPT ↑ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT end_CELL end_ROW start_ROW start_CELL italic_T start_POSTSUBSCRIPT ↓ end_POSTSUBSCRIPT end_CELL start_CELL = [ start_ARG start_ROW start_CELL 0 end_CELL start_CELL 0 end_CELL start_CELL 0 end_CELL start_CELL 0 end_CELL start_CELL 0 end_CELL end_ROW start_ROW start_CELL 0 end_CELL start_CELL 0 end_CELL start_CELL 0 end_CELL start_CELL 0 end_CELL start_CELL 0 end_CELL end_ROW start_ROW start_CELL 0 end_CELL start_CELL 0 end_CELL start_CELL 1 end_CELL start_CELL 0 end_CELL start_CELL 0 end_CELL end_ROW start_ROW start_CELL 0 end_CELL start_CELL 0 end_CELL start_CELL 0 end_CELL start_CELL 0 end_CELL start_CELL 0 end_CELL end_ROW start_ROW start_CELL 0 end_CELL start_CELL 0 end_CELL start_CELL - 1 end_CELL start_CELL 0 end_CELL start_CELL 0 end_CELL end_ROW end_ARG ] italic_P start_POSTSUBSCRIPT ↓ end_POSTSUBSCRIPT = [ start_ARG start_ROW start_CELL 0 end_CELL start_CELL 0 end_CELL start_CELL 0 end_CELL start_CELL 0 end_CELL start_CELL 0 end_CELL end_ROW start_ROW start_CELL 0 end_CELL start_CELL 0 end_CELL start_CELL 0 end_CELL start_CELL 0 end_CELL start_CELL 0 end_CELL end_ROW start_ROW start_CELL 0 end_CELL start_CELL 0 end_CELL start_CELL 0 end_CELL start_CELL 0 end_CELL start_CELL 0 end_CELL end_ROW start_ROW start_CELL 0 end_CELL start_CELL 0 end_CELL start_CELL 1 end_CELL start_CELL 0 end_CELL start_CELL 0 end_CELL end_ROW start_ROW start_CELL 0 end_CELL start_CELL 0 end_CELL start_CELL 0 end_CELL start_CELL 0 end_CELL start_CELL 0 end_CELL end_ROW end_ARG ] end_CELL end_ROW start_ROW start_CELL italic_T start_POSTSUBSCRIPT → end_POSTSUBSCRIPT end_CELL start_CELL = - italic_T start_POSTSUBSCRIPT ↓ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT italic_P start_POSTSUBSCRIPT → end_POSTSUBSCRIPT = italic_P start_POSTSUBSCRIPT ↓ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT , end_CELL end_ROW (14)

which will allow us to rewrite (8) as follows:

1h2((TD(u))(Pκ~)(TD(u))(Pκ~)+(TD(u))(Pκ~)(TD(u))(Pκ~))i,j=fi,j,i,j𝒱0,formulae-sequence1superscript2subscriptdirect-productsubscript𝑇𝐷𝑢subscript𝑃~𝜅direct-productsubscript𝑇𝐷𝑢subscript𝑃~𝜅direct-productsubscript𝑇𝐷𝑢subscript𝑃~𝜅direct-productsubscript𝑇𝐷𝑢subscript𝑃~𝜅𝑖𝑗subscript𝑓𝑖𝑗for-all𝑖𝑗superscript𝒱0-\frac{1}{h^{2}}\left(\left(T_{\uparrow}\star D(u)\right)\odot\left(P_{% \uparrow}\star\tilde{\kappa}\right)-\left(T_{\downarrow}\star D(u)\right)\odot% \left(P_{\downarrow}\star\tilde{\kappa}\right)\right.\\ \left.+\left(T_{\rightarrow}\star D(u)\right)\odot\left(P_{\rightarrow}\star% \tilde{\kappa}\right)-\left(T_{\leftarrow}\star D(u)\right)\odot\left(P_{% \leftarrow}\star\tilde{\kappa}\right)\right)_{i,j}=f_{i,j},\forall i,j\in% \mathcal{V}^{0},\quadstart_ROW start_CELL - divide start_ARG 1 end_ARG start_ARG italic_h start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG ( ( italic_T start_POSTSUBSCRIPT ↑ end_POSTSUBSCRIPT ⋆ italic_D ( italic_u ) ) ⊙ ( italic_P start_POSTSUBSCRIPT ↑ end_POSTSUBSCRIPT ⋆ over~ start_ARG italic_κ end_ARG ) - ( italic_T start_POSTSUBSCRIPT ↓ end_POSTSUBSCRIPT ⋆ italic_D ( italic_u ) ) ⊙ ( italic_P start_POSTSUBSCRIPT ↓ end_POSTSUBSCRIPT ⋆ over~ start_ARG italic_κ end_ARG ) end_CELL end_ROW start_ROW start_CELL + ( italic_T start_POSTSUBSCRIPT → end_POSTSUBSCRIPT ⋆ italic_D ( italic_u ) ) ⊙ ( italic_P start_POSTSUBSCRIPT → end_POSTSUBSCRIPT ⋆ over~ start_ARG italic_κ end_ARG ) - ( italic_T start_POSTSUBSCRIPT ← end_POSTSUBSCRIPT ⋆ italic_D ( italic_u ) ) ⊙ ( italic_P start_POSTSUBSCRIPT ← end_POSTSUBSCRIPT ⋆ over~ start_ARG italic_κ end_ARG ) ) start_POSTSUBSCRIPT italic_i , italic_j end_POSTSUBSCRIPT = italic_f start_POSTSUBSCRIPT italic_i , italic_j end_POSTSUBSCRIPT , ∀ italic_i , italic_j ∈ caligraphic_V start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT , end_CELL end_ROW (15)

which is then used in the first term of the loss function (7) to train our neural network. The same training process outlined in Algorithm 1 applies here. Note that in (15), the convolution operator \star uses a stride of two because of the dual grid and the use of larger convolution kernels.

2.2 Parabolic Problems

We generalize the method described in the previous section to time-dependent problems. We propose a modified loss function that incorporates the backward Euler method to address the time derivative. Consider the following parabolic problem on a square domain ΩΩ\Omegaroman_Ω and for the time interval [t0,T]subscript𝑡0𝑇[t_{0},T][ italic_t start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_T ].

utΔu=finΩ×(t0,T],u=gonΩ×(t0,T],u(,t0)=u0inΩ×{t0}.formulae-sequence𝑢𝑡Δ𝑢𝑓inΩsubscript𝑡0𝑇formulae-sequence𝑢𝑔onΩsubscript𝑡0𝑇𝑢subscript𝑡0superscript𝑢0inΩsubscript𝑡0\displaystyle\begin{split}\frac{\partial u}{\partial t}-\Delta u&=f\quad\mbox{% in}\quad\Omega\times(t_{0},T],\\ u&=g\quad\mbox{on}\quad\partial\Omega\times(t_{0},T],\\ u(\cdot,t_{0})&=u^{0}\quad\mbox{in}\quad\Omega\times\{t_{0}\}.\end{split}start_ROW start_CELL divide start_ARG ∂ italic_u end_ARG start_ARG ∂ italic_t end_ARG - roman_Δ italic_u end_CELL start_CELL = italic_f in roman_Ω × ( italic_t start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_T ] , end_CELL end_ROW start_ROW start_CELL italic_u end_CELL start_CELL = italic_g on ∂ roman_Ω × ( italic_t start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_T ] , end_CELL end_ROW start_ROW start_CELL italic_u ( ⋅ , italic_t start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) end_CELL start_CELL = italic_u start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT in roman_Ω × { italic_t start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT } . end_CELL end_ROW (16)

Let τ>0𝜏0\tau>0italic_τ > 0 be the time step size. At each time step, tn=nτsuperscript𝑡𝑛𝑛𝜏t^{n}=n\tauitalic_t start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT = italic_n italic_τ, the finite difference solution denoted by uhnsuperscriptsubscript𝑢𝑛u_{h}^{n}italic_u start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT solves the following optimization problem

argminuhnsubscriptargminsuperscriptsubscript𝑢𝑛\displaystyle\operatorname*{arg\,min}_{u_{h}^{n}}start_OPERATOR roman_arg roman_min end_OPERATOR start_POSTSUBSCRIPT italic_u start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT end_POSTSUBSCRIPT (i,j)𝒱0(ui,jnui,jn1τ(fi,jn(KΔuhn)i,j))2subscript𝑖𝑗superscript𝒱0superscriptsubscriptsuperscript𝑢𝑛𝑖𝑗subscriptsuperscript𝑢𝑛1𝑖𝑗𝜏superscriptsubscript𝑓𝑖𝑗𝑛subscriptsubscript𝐾Δsuperscriptsubscript𝑢𝑛𝑖𝑗2\displaystyle\sum_{(i,j)\in\mathcal{V}^{0}}\left(u^{n}_{i,j}-u^{n-1}_{i,j}-% \tau\left(f_{i,j}^{n}-\left(K_{\Delta}\star u_{h}^{n}\right)_{i,j}\right)% \right)^{2}∑ start_POSTSUBSCRIPT ( italic_i , italic_j ) ∈ caligraphic_V start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ( italic_u start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i , italic_j end_POSTSUBSCRIPT - italic_u start_POSTSUPERSCRIPT italic_n - 1 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i , italic_j end_POSTSUBSCRIPT - italic_τ ( italic_f start_POSTSUBSCRIPT italic_i , italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT - ( italic_K start_POSTSUBSCRIPT roman_Δ end_POSTSUBSCRIPT ⋆ italic_u start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ) start_POSTSUBSCRIPT italic_i , italic_j end_POSTSUBSCRIPT ) ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT
+(i,j)𝒱(ui,jngi,jn)2.subscript𝑖𝑗superscript𝒱superscriptsubscriptsuperscript𝑢𝑛𝑖𝑗superscriptsubscript𝑔𝑖𝑗𝑛2\displaystyle+\sum_{(i,j)\in\mathcal{V}^{\partial}}\left(u^{n}_{i,j}-g_{i,j}^{% n}\right)^{2}.+ ∑ start_POSTSUBSCRIPT ( italic_i , italic_j ) ∈ caligraphic_V start_POSTSUPERSCRIPT ∂ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ( italic_u start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i , italic_j end_POSTSUBSCRIPT - italic_g start_POSTSUBSCRIPT italic_i , italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT . (17)

Above we use the short-hand notation fi,jn=f(xi,yj,tn)superscriptsubscript𝑓𝑖𝑗𝑛𝑓subscript𝑥𝑖subscript𝑦𝑗superscript𝑡𝑛f_{i,j}^{n}=f(x_{i},y_{j},t^{n})italic_f start_POSTSUBSCRIPT italic_i , italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT = italic_f ( italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_y start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT , italic_t start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ) and gi,jn=g(xi,yj,tn)superscriptsubscript𝑔𝑖𝑗𝑛𝑔subscript𝑥𝑖subscript𝑦𝑗superscript𝑡𝑛g_{i,j}^{n}=g(x_{i},y_{j},t^{n})italic_g start_POSTSUBSCRIPT italic_i , italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT = italic_g ( italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_y start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT , italic_t start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ). Here, the unknowns ui,jnsuperscriptsubscript𝑢𝑖𝑗𝑛u_{i,j}^{n}italic_u start_POSTSUBSCRIPT italic_i , italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT are the approximations of the solution u𝑢uitalic_u evaluated at the point (xi,yj)subscript𝑥𝑖subscript𝑦𝑗(x_{i},y_{j})( italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_y start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ) and time tnsuperscript𝑡𝑛t^{n}italic_t start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT. We then use the following loss function to train our CNN at each time step

α,τ(u^n,u^n1,tn)=subscript𝛼𝜏superscript^𝑢𝑛superscript^𝑢𝑛1superscript𝑡𝑛absent\displaystyle\mathcal{L}_{\alpha,\tau}(\hat{u}^{n},\hat{u}^{n-1},t^{n})=caligraphic_L start_POSTSUBSCRIPT italic_α , italic_τ end_POSTSUBSCRIPT ( over^ start_ARG italic_u end_ARG start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT , over^ start_ARG italic_u end_ARG start_POSTSUPERSCRIPT italic_n - 1 end_POSTSUPERSCRIPT , italic_t start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ) = α(i,j)𝒱0(u^i,jnu^i,jn1τ(fi,jn(KΔu^n)i,j))2𝛼subscript𝑖𝑗superscript𝒱0superscriptsubscriptsuperscript^𝑢𝑛𝑖𝑗subscriptsuperscript^𝑢𝑛1𝑖𝑗𝜏superscriptsubscript𝑓𝑖𝑗𝑛subscriptsubscript𝐾Δsuperscript^𝑢𝑛𝑖𝑗2\displaystyle\alpha\sum_{(i,j)\in\mathcal{V}^{0}}\left(\hat{u}^{n}_{i,j}-\hat{% u}^{n-1}_{i,j}-\tau\left(f_{i,j}^{n}-\left(K_{\Delta}\star\hat{u}^{n}\right)_{% i,j}\right)\right)^{2}italic_α ∑ start_POSTSUBSCRIPT ( italic_i , italic_j ) ∈ caligraphic_V start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ( over^ start_ARG italic_u end_ARG start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i , italic_j end_POSTSUBSCRIPT - over^ start_ARG italic_u end_ARG start_POSTSUPERSCRIPT italic_n - 1 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i , italic_j end_POSTSUBSCRIPT - italic_τ ( italic_f start_POSTSUBSCRIPT italic_i , italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT - ( italic_K start_POSTSUBSCRIPT roman_Δ end_POSTSUBSCRIPT ⋆ over^ start_ARG italic_u end_ARG start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ) start_POSTSUBSCRIPT italic_i , italic_j end_POSTSUBSCRIPT ) ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT
+(1α)(i,j)𝒱(u^i,jngi,jn)2.1𝛼subscript𝑖𝑗superscript𝒱superscriptsubscriptsuperscript^𝑢𝑛𝑖𝑗superscriptsubscript𝑔𝑖𝑗𝑛2\displaystyle+(1-\alpha)\sum_{(i,j)\in\mathcal{V}^{\partial}}\left(\hat{u}^{n}% _{i,j}-g_{i,j}^{n}\right)^{2}.+ ( 1 - italic_α ) ∑ start_POSTSUBSCRIPT ( italic_i , italic_j ) ∈ caligraphic_V start_POSTSUPERSCRIPT ∂ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ( over^ start_ARG italic_u end_ARG start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i , italic_j end_POSTSUBSCRIPT - italic_g start_POSTSUBSCRIPT italic_i , italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT . (18)

Again, we will choose the weight α=h2/4𝛼superscript24\alpha=h^{2}/4italic_α = italic_h start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT / 4. The key difference between (7) and (2.2) is that we minimize at each time step. The input to the CNN at each time step is the previous solution u^n1superscript^𝑢𝑛1\hat{u}^{n-1}over^ start_ARG italic_u end_ARG start_POSTSUPERSCRIPT italic_n - 1 end_POSTSUPERSCRIPT and the output is the solution at the time step tnsuperscript𝑡𝑛t^{n}italic_t start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT. Algorithm 2 in Section 3 outlines this training procedure in more detail.

2.3 Network Architecture

We select as our architecture the widely used U-Net [25]. The U-Net was initially designed for image segmentation tasks but is successful in several scientific machine learning tasks [26, 27, 28]. The architecture consists of convolution block operators (blocks), a downsampling (or encoding) path, and an upsampling (or decoding) path. A channel-wise concatenation operation links each layer in the downsampling and upsampling paths. Each block consists of two 2D convolution layers. Unlike most implementations of the U-Net architecture [25, 29, 30], we omit normalization (i.e., batch or instance normalization) and do not apply a non-linear activation function like ReLU and use the identity function instead (see Section 4). Additionally, we use the PocketNet approach proposed in [22] and leave the number of feature maps (channels) at each resolution within our architecture constant, namely 32323232. Figure 2 sketches the U-Net architecture for different network depths. We define the depth, d𝑑ditalic_d, of each network to be the number of downsampling operations present in the network. The size of the feature tensor is halved from depth d𝑑ditalic_d to depth d+1𝑑1d+1italic_d + 1.

Refer to caption
Figure 2: Sketch of U-Net architecture for different network depths: 0d40𝑑40\leq d\leq 40 ≤ italic_d ≤ 4.

2.4 Data

2.4.1 Elliptic Problem Data

We test our proposed approach for elliptic problems on four different test cases. Unless otherwise specified, the domain is the unit square.

The Bubble Function

In the first test case, (1) is solved with data chosen such that the exact solution is the bubble function

u(x,y)=x(x1)y(y1).𝑢𝑥𝑦𝑥𝑥1𝑦𝑦1\displaystyle u(x,y)=x(x-1)y(y-1).italic_u ( italic_x , italic_y ) = italic_x ( italic_x - 1 ) italic_y ( italic_y - 1 ) . (19)

This function is symmetric, and we apply homogeneous zero Dirichlet boundary conditions. Note that for this function, one can show that the five-point stencil scheme given by (2) applied to the bubble function gives exactly ΔuΔ𝑢-\Delta u- roman_Δ italic_u evaluated at the grid point (xi,yj)subscript𝑥𝑖subscript𝑦𝑗(x_{i},y_{j})( italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_y start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ), which implies that the finite difference approximation is exact.

The “Peak” Function

In the second case, we want to test the effectiveness of our strategy when the solution has high gradients and is non-symmetric. Hence, we select data such that the exact solution is the “peak” function

u(x,y)=0.0005(x(x1)y(y1))2e10x2+10y.𝑢𝑥𝑦0.0005superscript𝑥𝑥1𝑦𝑦12superscript𝑒10superscript𝑥210𝑦\displaystyle u(x,y)=0.0005\left(x(x-1)y(y-1)\right)^{2}e^{10x^{2}+10y}.italic_u ( italic_x , italic_y ) = 0.0005 ( italic_x ( italic_x - 1 ) italic_y ( italic_y - 1 ) ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_e start_POSTSUPERSCRIPT 10 italic_x start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + 10 italic_y end_POSTSUPERSCRIPT . (20)
The Exponential-Trigonometric Function

We test our approach on a function with non-homogeneous Dirichlet boundary conditions in the third case. We select data such that the exact solution is the exponential-trigonometric function given by

u(x,y)=ex2y2sin(3πx)sin(3πy)+x.𝑢𝑥𝑦superscript𝑒superscript𝑥2superscript𝑦23𝜋𝑥3𝜋𝑦𝑥\displaystyle u(x,y)=e^{-x^{2}-y^{2}}\sin(3\pi x)\sin(3\pi y)+x.italic_u ( italic_x , italic_y ) = italic_e start_POSTSUPERSCRIPT - italic_x start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT - italic_y start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT roman_sin ( 3 italic_π italic_x ) roman_sin ( 3 italic_π italic_y ) + italic_x . (21)
Non-Constant and Discontinuous Diffusion

To test how our approach fairs on problems with low regularity, we consider an example with a non-constant, discontinuous diffusion coefficient taken from [31]. The problem is posed on the square domain Ω=(1,1)2Ωsuperscript112\Omega=(-1,1)^{2}roman_Ω = ( - 1 , 1 ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT which we divide into four subdomains ΩisubscriptΩ𝑖\Omega_{i}roman_Ω start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT corresponding to the quadrants of the Cartesian plane. On each subdomain, κ𝜅\kappaitalic_κ is constant and takes the values κ1=κ3=5subscript𝜅1subscript𝜅35\kappa_{1}=\kappa_{3}=5italic_κ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT = italic_κ start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT = 5 and κ2=κ4=1subscript𝜅2subscript𝜅41\kappa_{2}=\kappa_{4}=1italic_κ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT = italic_κ start_POSTSUBSCRIPT 4 end_POSTSUBSCRIPT = 1. The exact solution takes the form

rβ(aisin(βθ)+bicos(βθ)),superscript𝑟𝛽subscript𝑎𝑖𝛽𝜃subscript𝑏𝑖𝛽𝜃\displaystyle r^{\beta}(a_{i}\sin(\beta\theta)+b_{i}\cos(\beta\theta)),italic_r start_POSTSUPERSCRIPT italic_β end_POSTSUPERSCRIPT ( italic_a start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT roman_sin ( italic_β italic_θ ) + italic_b start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT roman_cos ( italic_β italic_θ ) ) , (22)

where (r,θ)𝑟𝜃(r,\theta)( italic_r , italic_θ ) are the polar coordinates of a given point in ΩΩ\Omegaroman_Ω, ai,bisubscript𝑎𝑖subscript𝑏𝑖a_{i},b_{i}italic_a start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_b start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT are constants that depend on the subdomains (see [31] for exact values). This discontinuity in the diffusion coefficient introduces a singularity in the solution at the origin, namely the function belongs to the Sobolev space H1+β(Ω)superscript𝐻1𝛽ΩH^{1+\beta}(\Omega)italic_H start_POSTSUPERSCRIPT 1 + italic_β end_POSTSUPERSCRIPT ( roman_Ω ), with β0.5354𝛽0.5354\beta\approx 0.5354italic_β ≈ 0.5354.

2.4.2 Parabolic Problem Data

We test our proposed approach for parabolic problems on three different test cases.

The Trigonometric Function

In the first two cases, (16) is solved with data such that the exact solution is given by the trigonometric function

u(x,y,t)=cos(t)sin(nπx)sin(nπy).𝑢𝑥𝑦𝑡𝑡𝑛𝜋𝑥𝑛𝜋𝑦\displaystyle u(x,y,t)=\cos(t)\sin(n\pi x)\sin(n\pi y).italic_u ( italic_x , italic_y , italic_t ) = roman_cos ( italic_t ) roman_sin ( italic_n italic_π italic_x ) roman_sin ( italic_n italic_π italic_y ) . (23)

In the first case, we select n=1𝑛1n=1italic_n = 1, which gives a symmetric function with a single peak. In the second case, we set n=4𝑛4n=4italic_n = 4, which results in 16 peaks and troughs in our domain.

The Gaussian Function

In the third case, we test our approach on a Gaussian function centered on the point (12,12)1212\left(\frac{1}{2},\frac{1}{2}\right)( divide start_ARG 1 end_ARG start_ARG 2 end_ARG , divide start_ARG 1 end_ARG start_ARG 2 end_ARG )

u(x,y,t)=cos(t)e50((2x1)2+(2y1)2).𝑢𝑥𝑦𝑡𝑡superscript𝑒50superscript2𝑥12superscript2𝑦12\displaystyle u(x,y,t)=\cos(t)e^{-50\left((2x-1)^{2}+(2y-1)^{2}\right)}.italic_u ( italic_x , italic_y , italic_t ) = roman_cos ( italic_t ) italic_e start_POSTSUPERSCRIPT - 50 ( ( 2 italic_x - 1 ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + ( 2 italic_y - 1 ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) end_POSTSUPERSCRIPT . (24)

2.5 Training and Testing Protocols

We use the Adam optimizer [32] and 5×\times×5 convolutional kernels in each layer. The initial learning rate is 0.001 for the steady-state elliptic problems and 0.0001 for parabolic problems. During training, we use L2 regularization with a penalty of 107superscript10710^{-7}10 start_POSTSUPERSCRIPT - 7 end_POSTSUPERSCRIPT and the norm of the network’s gradient is clipped so that it is no greater than 102superscript10210^{-2}10 start_POSTSUPERSCRIPT - 2 end_POSTSUPERSCRIPT. Our models are implemented in Python using TensorFlow (v2.12.0) and trained on an NVIDIA Quadro RTX 6000 GPU [33]. All network weights are initialized using the default TensorFlow initializers. All other hyperparameters are left at their default values. The code for our network architecture is available at https://github.com/aecelaya/pde-nets.

To assess the accuracy of our predicted solutions, we use the following norms of the error between the exact solution u𝑢uitalic_u and its approximate u^^𝑢\hat{u}over^ start_ARG italic_u end_ARG for steady-state problems:

uu^2,hsubscriptnorm𝑢^𝑢2\displaystyle||u-\hat{u}||_{2,h}| | italic_u - over^ start_ARG italic_u end_ARG | | start_POSTSUBSCRIPT 2 , italic_h end_POSTSUBSCRIPT =h(i,j)𝒱0𝒱(u(xi,yj)u^i,j)2absentsubscript𝑖𝑗superscript𝒱0superscript𝒱superscript𝑢subscript𝑥𝑖subscript𝑦𝑗subscript^𝑢𝑖𝑗2\displaystyle=h\sqrt{\sum_{(i,j)\in\mathcal{V}^{0}\cup\mathcal{V}^{\partial}}(% u(x_{i},y_{j})-\hat{u}_{i,j})^{2}}= italic_h square-root start_ARG ∑ start_POSTSUBSCRIPT ( italic_i , italic_j ) ∈ caligraphic_V start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT ∪ caligraphic_V start_POSTSUPERSCRIPT ∂ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ( italic_u ( italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_y start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ) - over^ start_ARG italic_u end_ARG start_POSTSUBSCRIPT italic_i , italic_j end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG (25)
uu^subscriptnorm𝑢^𝑢\displaystyle||u-\hat{u}||_{\infty}| | italic_u - over^ start_ARG italic_u end_ARG | | start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT =max(i,j)𝒱0𝒱|u(xi,yj)u^i,j|.absentsubscript𝑖𝑗superscript𝒱0superscript𝒱𝑢subscript𝑥𝑖subscript𝑦𝑗subscript^𝑢𝑖𝑗\displaystyle=\max_{(i,j)\in\mathcal{V}^{0}\cup\mathcal{V}^{\partial}}|u(x_{i}% ,y_{j})-\hat{u}_{i,j}|.= roman_max start_POSTSUBSCRIPT ( italic_i , italic_j ) ∈ caligraphic_V start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT ∪ caligraphic_V start_POSTSUPERSCRIPT ∂ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT | italic_u ( italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_y start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ) - over^ start_ARG italic_u end_ARG start_POSTSUBSCRIPT italic_i , italic_j end_POSTSUBSCRIPT | . (26)

For time-dependent problems, similar errors are computed at the final time T𝑇Titalic_T.

3 Calculation

We use Algorithm 1 to approximate a solution u^^𝑢\hat{u}over^ start_ARG italic_u end_ARG to (1). In this algorithm, we start by initializing the neural network 𝒩θsubscript𝒩𝜃\mathcal{N}_{\theta}caligraphic_N start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT with trainable parameters 𝜽𝜽\boldsymbol{\theta}bold_italic_θ. In this case, we use the U-Net architecture described in Section 2.3. We set the maximum number of optimization steps \mathcal{M}caligraphic_M as the stopping criterion for the algorithm. We then begin the minimization (or training) process by generating a prediction from our neural network with the source term f𝑓fitalic_f as the input. Given the current prediction, we compute the loss value and update the network weights via backpropagation. If the loss value from the current prediction is less than the previous best value, we update our best loss and save the current prediction. Algorithm 1 fully outlines this procedure.

Algorithm 1 Unsupervised CNN Training For (1)

Input Right-hand side f𝑓fitalic_f and boundary condition g𝑔gitalic_g
      Output Approximate solution to (2.1), u^subscript^𝑢\hat{u}_{*}over^ start_ARG italic_u end_ARG start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT

1:Randomly initialize the network parameters 𝜽𝜽\boldsymbol{\theta}bold_italic_θ
2:Set maximum iterations \mathcal{M}caligraphic_M
3:k0𝑘0k\leftarrow 0italic_k ← 0 \triangleright Iteration counter
4:+subscript\ell_{*}\leftarrow+\inftyroman_ℓ start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT ← + ∞ \triangleright Store smallest loss value
5:while k<𝑘k<\mathcal{M}italic_k < caligraphic_M do
6:     u^𝒩θ(f)^𝑢subscript𝒩𝜃𝑓\hat{u}\leftarrow\mathcal{N}_{\theta}(f)over^ start_ARG italic_u end_ARG ← caligraphic_N start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT ( italic_f ) \triangleright Get network prediction
7:     α(u^)subscript𝛼^𝑢\ell\leftarrow\mathcal{L}_{\alpha}(\hat{u})roman_ℓ ← caligraphic_L start_POSTSUBSCRIPT italic_α end_POSTSUBSCRIPT ( over^ start_ARG italic_u end_ARG ) \triangleright Compute loss
8:     Update 𝜽𝜽\boldsymbol{\theta}bold_italic_θ using backpropagation on \ellroman_ℓ
9:     if <subscript\ell<\ell_{*}roman_ℓ < roman_ℓ start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT then
10:         u^u^subscript^𝑢^𝑢\hat{u}_{*}\leftarrow\hat{u}over^ start_ARG italic_u end_ARG start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT ← over^ start_ARG italic_u end_ARG \triangleright Save best prediction
11:         subscript\ell_{*}\leftarrow\ellroman_ℓ start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT ← roman_ℓ \triangleright Update smallest loss value      
12:     kk+1𝑘𝑘1k\leftarrow k+1italic_k ← italic_k + 1

We use Algorithm 2 to approximate a solution u^NTsuperscript^𝑢subscript𝑁𝑇\hat{u}^{N_{T}}over^ start_ARG italic_u end_ARG start_POSTSUPERSCRIPT italic_N start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT end_POSTSUPERSCRIPT to (16), where NTsubscript𝑁𝑇N_{T}italic_N start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT is an integer such that T=NTτ𝑇subscript𝑁𝑇𝜏T=N_{T}\tauitalic_T = italic_N start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT italic_τ. Like with Algorithm 1, we randomly initialize a U-Net and set a maximum number of optimization steps as a stopping criterion. However, in this case, the maximum number of steps applies to each time step. We then set the initial condition as the previous solution. We apply nearly the same steps for each time step as with Algorithm 1. Namely, we produce a candidate prediction from our neural network with the previous solution as the input. We compute the loss value and update the network weights via backpropogation on the loss. If the loss value is lower than the previously observed lowest loss value, then we update our lowest loss value and save the current prediction. At the end of this process, we set the previous solution to the best current solution. This process continues for every time step. Algorithm 2 fully outlines this procedure. Note that in this algorithm, we do not reinitialize our neural network weights at each time step. Instead, we use the previous network configuration as the initial state for the next time step.

Algorithm 2 Unsupervised CNN Training For (16)

Input Number of time steps NTsubscript𝑁𝑇N_{T}italic_N start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT, final time T𝑇Titalic_T, initial condition u0superscript𝑢0u^{0}italic_u start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT,
      right hand side f𝑓fitalic_f, and boundary condition g𝑔gitalic_g
      Output Approximate solution to (2.2) at final time u^NTsuperscript^𝑢subscript𝑁𝑇\hat{u}^{N_{T}}over^ start_ARG italic_u end_ARG start_POSTSUPERSCRIPT italic_N start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT end_POSTSUPERSCRIPT

1:Randomly initialize the network parameters 𝜽𝜽\boldsymbol{\theta}bold_italic_θ
2:Set maximum iterations \mathcal{M}caligraphic_M
3:τT/NT𝜏𝑇subscript𝑁𝑇\tau\leftarrow T/N_{T}italic_τ ← italic_T / italic_N start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT \triangleright Initialize time step size
4:n1𝑛1n\leftarrow 1italic_n ← 1 \triangleright Time step counter
5:while nNT𝑛subscript𝑁𝑇n\leq N_{T}italic_n ≤ italic_N start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT do
6:     k0𝑘0k\leftarrow 0italic_k ← 0 \triangleright Iteration counter
7:     +subscript\ell_{*}\leftarrow+\inftyroman_ℓ start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT ← + ∞ \triangleright Store smallest loss value
8:     while k<𝑘k<\mathcal{M}italic_k < caligraphic_M do
9:         w𝒩θ(u^n1)𝑤subscript𝒩𝜃superscript^𝑢𝑛1w\leftarrow\mathcal{N}_{\theta}(\hat{u}^{n-1})italic_w ← caligraphic_N start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT ( over^ start_ARG italic_u end_ARG start_POSTSUPERSCRIPT italic_n - 1 end_POSTSUPERSCRIPT ) \triangleright Get network prediction
10:         α,τ(w,u^n1,nτ)subscript𝛼𝜏𝑤superscript^𝑢𝑛1𝑛𝜏\ell\leftarrow\mathcal{L}_{\alpha,\tau}(w,\hat{u}^{n-1},n\tau)roman_ℓ ← caligraphic_L start_POSTSUBSCRIPT italic_α , italic_τ end_POSTSUBSCRIPT ( italic_w , over^ start_ARG italic_u end_ARG start_POSTSUPERSCRIPT italic_n - 1 end_POSTSUPERSCRIPT , italic_n italic_τ ) \triangleright Compute loss
11:         Update 𝜽𝜽\boldsymbol{\theta}bold_italic_θ using backpropagation on \ellroman_ℓ
12:         if <subscript\ell<\ell_{*}roman_ℓ < roman_ℓ start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT then
13:              u^nwsuperscript^𝑢𝑛𝑤\hat{u}^{n}\leftarrow wover^ start_ARG italic_u end_ARG start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ← italic_w \triangleright Save best prediction
14:              subscript\ell_{*}\leftarrow\ellroman_ℓ start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT ← roman_ℓ \triangleright Update smallest loss value          
15:         kk+1𝑘𝑘1k\leftarrow k+1italic_k ← italic_k + 1      
16:     nn+1𝑛𝑛1n\leftarrow n+1italic_n ← italic_n + 1

4 Results and Discussion

Using the methods described in Section 2 and Algorithm 1, we approximate the finite difference solution to the elliptic problems defined in Section 2.4.1 with constant diffusion coefficients (i.e., κ=1𝜅1\kappa=1italic_κ = 1). Table 1 shows the accuracy of our unsupervised predictions for a varying grid size and number of optimization steps. We fix the depth of our U-Net architecture to three in this case. For reference, Table 3 shows the accuracy of the finite difference method for varying grid sizes. For the peak and exponential-trigonometric functions, our approach almost exactly recovers the finite difference solution. We do not, however, see this for the bubble function example. In that case, the solution of the finite difference method is exact. Under our selected settings (i.e., depth and optimization steps), our method stops at an error of approximately 10-6 in the ||||2,h||\cdot||_{2,h}| | ⋅ | | start_POSTSUBSCRIPT 2 , italic_h end_POSTSUBSCRIPT norm for the bubble function example. The cause of this discrepancy between our method applied to the bubble function and the finite difference solution is due mostly to the fact that single-precision is used to train the neural networks. Values of the order 106superscript10610^{-6}10 start_POSTSUPERSCRIPT - 6 end_POSTSUPERSCRIPT mean that we are very close to machine epsilon (107superscript10710^{-7}10 start_POSTSUPERSCRIPT - 7 end_POSTSUPERSCRIPT). Another factor is the use of stochastic, gradient-based optimizers. While the Adam optimizer is not stochastic in this case since we are not training with mini-batches of data, it is still true that we are using non-optimal step sizes. Therefore, we hypothesize that the method is getting stuck at errors 𝒪(106)𝒪superscript106\mathcal{O}(10^{-6})caligraphic_O ( 10 start_POSTSUPERSCRIPT - 6 end_POSTSUPERSCRIPT ) instead of reaching machine epsilon values 𝒪(107)𝒪superscript107\mathcal{O}(10^{-7})caligraphic_O ( 10 start_POSTSUPERSCRIPT - 7 end_POSTSUPERSCRIPT ) in single-precision.

Figure 3 shows the true solution, predicted solution, and absolute difference for the bubble, peak, and exponential-trigonometric cases on a 128×\times×128 grid. This figure shows that for every case, our unsupervised algorithm produces visually indistinguishable solutions from the true solution in each case. To give more insight into our method’s performance, we show the loss function for one of the test cases (the bubble function) in Figure 4. Here, we observe that our loss function reaches values that are machine epsilon for single precision (on the order of 107superscript10710^{-7}10 start_POSTSUPERSCRIPT - 7 end_POSTSUPERSCRIPT). Similar loss function values are observed for the other cases, indicating that we have reached convergence.

We also want to study the network depth’s effect on our predictions’ accuracy. Table 2 shows the accuracy of our unsupervised neural network predictions for varying grid sizes and U-Net depths. We set the number of optimization steps to 4,000. Here, we see that the depth of the U-Net architectures does not appear to have a significant effect on the accuracy of our predictions. The only notable exceptions are for the bubble and exponential-trigonometric functions on a 128×\times×128 grid with a depth equal to two. This discrepancy may be caused by the network not capturing sufficiently rich features on the finer grid. The fact that we see a decrease in the errors between our predictions and the true solution to those comparable to the finite difference method as we increase the depth to three or greater supports this hypothesis. Note that for coarser grids (i.e., fewer grid points), we do not test higher network depths since the size of the features (the output of each layer) at the coarser grids in such networks would be smaller than the size of the convolutional kernels in the network.

The results of Algorithm 1 for the non-constant diffusion problem defined in Section 2.4.1 are shown in Table 4. Here, we see more significant errors than with the constant diffusion problems. Figure 5 shows the true solution, predicted solution, and absolute difference for the non-constant diffusion problem on a 128×\times×128 grid. This figure shows that the errors are mostly concentrated around the discontinuities in κ𝜅\kappaitalic_κ. We do not compare this to the finite difference method because the exact solution exhibits a singularity at the origin. Indeed, the gradient of the exact solution blows up at the point (0,0)00(0,0)( 0 , 0 ).

Because the finite difference solution solves the convex minimization problem (2.1), the finite difference approximation’s accuracy limits our approach’s accuracy. This explains the relatively poor performance in the case of discontinuous diffusion. It is well known that, due to the singularity at the origin, the exact solution to the discontinuous diffusion problem belongs to the Sobolev space H1+β(ΩH^{1+\beta}(\Omegaitalic_H start_POSTSUPERSCRIPT 1 + italic_β end_POSTSUPERSCRIPT ( roman_Ω), where β0.5354𝛽0.5354\beta\approx 0.5354italic_β ≈ 0.5354 [31]. Hence, the finite difference method, which requires more regularity, performs poorly.

Problem \mathcal{M}caligraphic_M N=32𝑁32N=32italic_N = 32 N=64𝑁64N=64italic_N = 64 N=128𝑁128N=128italic_N = 128
uu^2,hsubscriptnorm𝑢^𝑢2||u-\hat{u}||_{2,h}| | italic_u - over^ start_ARG italic_u end_ARG | | start_POSTSUBSCRIPT 2 , italic_h end_POSTSUBSCRIPT uu^subscriptnorm𝑢^𝑢||u-\hat{u}||_{\infty}| | italic_u - over^ start_ARG italic_u end_ARG | | start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT uu^2,hsubscriptnorm𝑢^𝑢2||u-\hat{u}||_{2,h}| | italic_u - over^ start_ARG italic_u end_ARG | | start_POSTSUBSCRIPT 2 , italic_h end_POSTSUBSCRIPT uu^subscriptnorm𝑢^𝑢||u-\hat{u}||_{\infty}| | italic_u - over^ start_ARG italic_u end_ARG | | start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT uu^2,hsubscriptnorm𝑢^𝑢2||u-\hat{u}||_{2,h}| | italic_u - over^ start_ARG italic_u end_ARG | | start_POSTSUBSCRIPT 2 , italic_h end_POSTSUBSCRIPT uu^subscriptnorm𝑢^𝑢||u-\hat{u}||_{\infty}| | italic_u - over^ start_ARG italic_u end_ARG | | start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT
Bubble 500 4.7521e-05 5.2953e-04 4.8521e-05 8.2744e-04 4.2407e-04 1.6607e-03
1,000 2.5330e-05 2.4341e-04 2.1550e-05 3.4875e-04 1.3244e-04 5.3431e-04
2,000 1.4457e-05 1.4313e-04 9.1955e-06 1.4581e-04 4.0090e-05 3.0874e-04
4,000 3.2266e-06 3.0962e-05 3.1159e-06 5.4674e-05 7.6852e-06 8.5074e-05
8,000 2.2661e-06 2.9351e-05 3.7016e-06 4.9231e-05 3.7105e-06 4.4149e-05
Peak 500 2.0066e-02 1.2398e-01 1.0361e-02 1.2776e-01 3.2611e-02 2.8655e-01
1,000 2.0603e-02 1.1937e-01 5.3352e-03 3.2162e-02 5.0469e-03 2.6498e-02
2,000 2.0594e-02 1.1934e-01 5.5620e-03 3.2176e-02 1.3510e-03 8.0407e-03
4,000 2.1275e-02 1.1932e-01 5.6542e-03 3.2192e-02 1.3851e-03 8.0365e-03
8,000 2.0616e-02 1.1933e-01 5.5843e-03 3.2196e-02 1.4042e-03 8.0850e-03
Exp-Trig 500 2.8224e-03 1.8145e-02 1.0084e-03 1.5877e-02 2.6316e-03 3.0236e-02
1,000 2.4582e-03 7.8004e-03 6.3506e-04 4.8903e-03 4.0585e-04 1.6215e-02
2,000 2.3809e-03 7.8190e-03 5.9138e-04 1.8863e-03 1.8116e-04 3.6817e-03
4,000 2.4604e-03 7.8169e-03 5.9671e-04 2.4966e-03 1.5110e-04 1.3024e-03
8,000 2.3742e-03 7.8151e-03 5.8227e-04 1.8845e-03 1.4478e-04 5.7032e-04
Table 1: Accuracy of unsupervised predictions for a varying grid size N𝑁Nitalic_N, number of optimization steps \mathcal{M}caligraphic_M, and the depth of the U-Net set to three.
Problem d𝑑ditalic_d # Params. N=32𝑁32N=32italic_N = 32 N=64𝑁64N=64italic_N = 64 N=128𝑁128N=128italic_N = 128
uu^2,hsubscriptnorm𝑢^𝑢2||u-\hat{u}||_{2,h}| | italic_u - over^ start_ARG italic_u end_ARG | | start_POSTSUBSCRIPT 2 , italic_h end_POSTSUBSCRIPT uu^subscriptnorm𝑢^𝑢||u-\hat{u}||_{\infty}| | italic_u - over^ start_ARG italic_u end_ARG | | start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT uu^2,hsubscriptnorm𝑢^𝑢2||u-\hat{u}||_{2,h}| | italic_u - over^ start_ARG italic_u end_ARG | | start_POSTSUBSCRIPT 2 , italic_h end_POSTSUBSCRIPT uu^subscriptnorm𝑢^𝑢||u-\hat{u}||_{\infty}| | italic_u - over^ start_ARG italic_u end_ARG | | start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT uu^2,hsubscriptnorm𝑢^𝑢2||u-\hat{u}||_{2,h}| | italic_u - over^ start_ARG italic_u end_ARG | | start_POSTSUBSCRIPT 2 , italic_h end_POSTSUBSCRIPT uu^subscriptnorm𝑢^𝑢||u-\hat{u}||_{\infty}| | italic_u - over^ start_ARG italic_u end_ARG | | start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT
Bubble 2 283,713 3.2705e-06 3.8412e-05 2.8414e-06 4.1518e-05 5.4252e-04 1.9439e-03
3 412,225 3.2266e-06 3.0962e-05 3.1159e-06 5.4674e-05 7.6852e-06 8.5074e-05
4 541,889 - - 2.6481e-06 4.1081e-05 3.2062e-06 6.0881e-05
5 669,249 - - - - 2.8731e-06 9.5620e-05
Peak 2 283,713 2.1274e-02 1.1932e-01 5.6116e-03 3.2174e-02 1.4157e-03 7.9525e-03
3 412,225 2.1275e-02 1.1932e-01 5.6542e-03 3.2192e-02 1.3851e-03 8.0365e-03
4 541,889 - - 5.6673e-03 3.2192e-02 1.4087e-03 8.0848e-03
5 669,249 - - - - 1.4042e-03 8.0918e-03
Exp-Trig 2 283,713 2.4689e-03 7.8191e-03 5.9526e-04 1.8814e-03 2.8053e-02 7.5356e-02
3 412,225 2.4604e-03 7.8169e-03 5.9671e-04 2.4966e-03 1.5110e-04 1.3024e-03
4 541,889 - - 5.9509e-04 1.8855e-03 1.5166e-04 1.8679e-03
5 669,249 - - - - 1.4944e-04 1.2124e-03
Table 2: Accuracy of unsupervised predictions for varying grid sizes N𝑁Nitalic_N and U-Net depths d𝑑ditalic_d. We set the number of optimization steps to 4,000.
Problem N=32𝑁32N=32italic_N = 32 N=64𝑁64N=64italic_N = 64 N=128𝑁128N=128italic_N = 128
uuh2,hsubscriptnorm𝑢subscript𝑢2||u-u_{h}||_{2,h}| | italic_u - italic_u start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT | | start_POSTSUBSCRIPT 2 , italic_h end_POSTSUBSCRIPT uuhsubscriptnorm𝑢subscript𝑢||u-u_{h}||_{\infty}| | italic_u - italic_u start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT | | start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT uuh2,hsubscriptnorm𝑢subscript𝑢2||u-u_{h}||_{2,h}| | italic_u - italic_u start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT | | start_POSTSUBSCRIPT 2 , italic_h end_POSTSUBSCRIPT uuhsubscriptnorm𝑢subscript𝑢||u-u_{h}||_{\infty}| | italic_u - italic_u start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT | | start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT uuh2,hsubscriptnorm𝑢subscript𝑢2||u-u_{h}||_{2,h}| | italic_u - italic_u start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT | | start_POSTSUBSCRIPT 2 , italic_h end_POSTSUBSCRIPT uuhsubscriptnorm𝑢subscript𝑢||u-u_{h}||_{\infty}| | italic_u - italic_u start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT | | start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT
Bubble 1.3582e-16 2.8449e-16 2.4524e-16 6.8695e-16 8.1134e-16 1.9776e-15
Peak 1.7597e-02 1.1151e-01 5.3441e-03 3.1282e-02 1.3988e-03 7.9696e-03
Exp-Trig 2.2986e-03 7.3266e-03 5.7262e-04 1.8251e-03 1.4303e-04 4.5588e-04
Table 3: Finite difference errors for selected elliptic problems for comparison.
Refer to caption
Figure 3: (Top) Bubble function. (Middle) “Peak” function. (Bottom) Exponential trigonometric function. From left to right, contour plots of true solution, predicted solution, and absolute difference. All predictions and solutions are on a 128×\times×128 grid. Note that μ=106𝜇superscript106\mu=10^{-6}italic_μ = 10 start_POSTSUPERSCRIPT - 6 end_POSTSUPERSCRIPT in the colorbar for the bubble function difference.
Refer to caption
Figure 4: Loss values at each optimization step for the bubble function. Here, we use a network depth of three, a grid size of 128×\times×128, and 2,000 optimization steps.
Problem \mathcal{M}caligraphic_M N=32𝑁32N=32italic_N = 32 N=64𝑁64N=64italic_N = 64 N=128𝑁128N=128italic_N = 128
uu^2,hsubscriptnorm𝑢^𝑢2||u-\hat{u}||_{2,h}| | italic_u - over^ start_ARG italic_u end_ARG | | start_POSTSUBSCRIPT 2 , italic_h end_POSTSUBSCRIPT uu^subscriptnorm𝑢^𝑢||u-\hat{u}||_{\infty}| | italic_u - over^ start_ARG italic_u end_ARG | | start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT uu^2,hsubscriptnorm𝑢^𝑢2||u-\hat{u}||_{2,h}| | italic_u - over^ start_ARG italic_u end_ARG | | start_POSTSUBSCRIPT 2 , italic_h end_POSTSUBSCRIPT uu^subscriptnorm𝑢^𝑢||u-\hat{u}||_{\infty}| | italic_u - over^ start_ARG italic_u end_ARG | | start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT uu^2,hsubscriptnorm𝑢^𝑢2||u-\hat{u}||_{2,h}| | italic_u - over^ start_ARG italic_u end_ARG | | start_POSTSUBSCRIPT 2 , italic_h end_POSTSUBSCRIPT uu^subscriptnorm𝑢^𝑢||u-\hat{u}||_{\infty}| | italic_u - over^ start_ARG italic_u end_ARG | | start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT
Non. Const. Diff. 500 4.6597e-02 1.8380e-01 8.6890e-02 3.7069e-01 5.5651e-01 2.1105e+00
1,000 3.9795e-02 1.6035e-01 4.9640e-02 1.3376e-01 2.0766e-01 6.4809e-01
2,000 3.9718e-02 1.5799e-01 3.1458e-02 1.2455e-01 8.7473e-02 2.8797e-01
4,000 3.9713e-02 1.5794e-01 3.1224e-02 1.2258e-01 4.9868e-02 1.1281e-01
8,000 3.9710e-02 1.5800e-01 3.1045e-02 1.2221e-01 3.1002e-02 9.3761e-02
Table 4: Accuracy of unsupervised predictions on the non-constant diffusion problem for a varying grid sizes N𝑁Nitalic_N, number of optimization steps \mathcal{M}caligraphic_M, and the depth of the U-Net set to three.
Problem d𝑑ditalic_d # Params. N=32𝑁32N=32italic_N = 32 N=64𝑁64N=64italic_N = 64 N=128𝑁128N=128italic_N = 128
uu^2,hsubscriptnorm𝑢^𝑢2||u-\hat{u}||_{2,h}| | italic_u - over^ start_ARG italic_u end_ARG | | start_POSTSUBSCRIPT 2 , italic_h end_POSTSUBSCRIPT uu^subscriptnorm𝑢^𝑢||u-\hat{u}||_{\infty}| | italic_u - over^ start_ARG italic_u end_ARG | | start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT uu^2,hsubscriptnorm𝑢^𝑢2||u-\hat{u}||_{2,h}| | italic_u - over^ start_ARG italic_u end_ARG | | start_POSTSUBSCRIPT 2 , italic_h end_POSTSUBSCRIPT uu^subscriptnorm𝑢^𝑢||u-\hat{u}||_{\infty}| | italic_u - over^ start_ARG italic_u end_ARG | | start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT uu^2,hsubscriptnorm𝑢^𝑢2||u-\hat{u}||_{2,h}| | italic_u - over^ start_ARG italic_u end_ARG | | start_POSTSUBSCRIPT 2 , italic_h end_POSTSUBSCRIPT uu^subscriptnorm𝑢^𝑢||u-\hat{u}||_{\infty}| | italic_u - over^ start_ARG italic_u end_ARG | | start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT
Non. Const. Diff. 2 283,713 3.9709e-02 1.5798e-01 3.1294e-02 1.2265e-01 3.3859e-02 1.4377e-01
3 412,225 3.9713e-02 1.5794e-01 3.1224e-02 1.2258e-01 4.9868e-02 1.1281e-01
4 541,889 - - 3.1110e-02 1.2282e-01 3.8937e-02 1.0907e-01
5 669,249 - - - - 3.6566e-02 1.0511e-01
Table 5: Accuracy of unsupervised predictions on non-constant diffusion problem for varying grid sizes N𝑁Nitalic_N and U-Net depths d𝑑ditalic_d. We set the number of optimization steps to 4,000.
Refer to caption
Figure 5: From left to right, contour plots of true solution, predicted solution, and absolute difference for the non-constant diffusion problem.

Using the methods described in Section 2 and Algorithm 2, we approximate the finite difference solution to the parabolic problems defined in Section 2.4.2. The time step is τ=0.1𝜏0.1\tau=0.1italic_τ = 0.1. Except for the first time step, we set the number of optimization steps per time step to 250. Because we start with randomly initialized weights in the first time step, we set the number of optimization steps to 1,000 in the first time step. Table 6 shows the accuracy of our unsupervised predictions for varying grid sizes. For reference, Table 7 shows the accuracy of the finite difference method with backward Euler for varying grid sizes. Here, we see that Algorithm 2 achieves comparable accuracy to the finite difference method with backward Euler. However, we do not see the same convergence as with the finite difference approach for the trigonometric functions. The small number of optimization steps can explain this lack of convergence and, again, the weakness of first-order optimizers like Adam. Figures 6 through 8 show the true solution, predicted solution, and absolute difference for the first ten time steps of each problem. These figures show that our unsupervised approach produces visually accurate solutions. Finally, Table 8 displays the errors for several choices of activation functions at time t=0.5𝑡0.5t=0.5italic_t = 0.5. The errors are similar for ReLU, Tanh, Swish [15], and identity activation functions. These results indicate that linear activation functions (i.e., the identity) are sufficient for learning solutions to time-dependent problems.

Problem t𝑡titalic_t N=32𝑁32N=32italic_N = 32 N=64𝑁64N=64italic_N = 64 N=128𝑁128N=128italic_N = 128
uu^2,hsubscriptnorm𝑢^𝑢2||u-\hat{u}||_{2,h}| | italic_u - over^ start_ARG italic_u end_ARG | | start_POSTSUBSCRIPT 2 , italic_h end_POSTSUBSCRIPT uu^subscriptnorm𝑢^𝑢||u-\hat{u}||_{\infty}| | italic_u - over^ start_ARG italic_u end_ARG | | start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT uu^2,hsubscriptnorm𝑢^𝑢2||u-\hat{u}||_{2,h}| | italic_u - over^ start_ARG italic_u end_ARG | | start_POSTSUBSCRIPT 2 , italic_h end_POSTSUBSCRIPT uu^subscriptnorm𝑢^𝑢||u-\hat{u}||_{\infty}| | italic_u - over^ start_ARG italic_u end_ARG | | start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT uu^2,hsubscriptnorm𝑢^𝑢2||u-\hat{u}||_{2,h}| | italic_u - over^ start_ARG italic_u end_ARG | | start_POSTSUBSCRIPT 2 , italic_h end_POSTSUBSCRIPT uu^subscriptnorm𝑢^𝑢||u-\hat{u}||_{\infty}| | italic_u - over^ start_ARG italic_u end_ARG | | start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT
Trig. (n=1𝑛1n=1italic_n = 1) 0.5 7.6825e-04 1.5388e-03 1.0661e-03 2.1703e-03 1.1562e-03 2.4109e-03
1.0 5.2059e-04 1.0459e-03 7.0987e-04 1.4189e-03 7.7541e-04 1.5364e-03
2.5 6.1710e-04 1.2354e-03 8.6825e-04 1.7483e-03 9.4440e-04 1.9382e-03
5.0 1.6629e-04 3.3075e-04 2.5692e-04 5.2649e-04 3.0442e-04 6.2141e-04
Trig. (n=4𝑛4n=4italic_n = 4) 0.5 6.0011e-03 1.1977e-02 1.3953e-03 2.9254e-03 3.2371e-04 9.0033e-04
1.0 3.6984e-03 7.3841e-03 8.5877e-04 1.8117e-03 1.9062e-04 5.3841e-04
2.5 5.4592e-03 1.0897e-02 1.3062e-03 3.1186e-03 8.4891e-04 2.8031e-03
5.0 1.9352e-03 3.8744e-03 4.8639e-04 1.3043e-03 9.8322e-04 2.5854e-03
Gaussian 0.5 3.0932e-03 3.8826e-02 7.1997e-04 1.0341e-02 1.8902e-04 2.5298e-03
1.0 1.9069e-03 2.3921e-02 4.4389e-04 6.3892e-03 1.3421e-04 1.4701e-03
2.5 2.8197e-03 3.5385e-02 6.5362e-04 9.5078e-03 2.7103e-04 2.3168e-03
5.0 1.0004e-03 1.2579e-02 2.8684e-04 2.6686e-03 3.1119e-04 9.0614e-04
Table 6: Accuracy of unsupervised predictions for varying grid sizes and time steps for parabolic problems. The depth of the network is set to three, the number of optimization iterations at each time step to 250, and the time step to 0.1.
Problem t𝑡titalic_t N=32𝑁32N=32italic_N = 32 N=64𝑁64N=64italic_N = 64 N=128𝑁128N=128italic_N = 128
uu^2,hsubscriptnorm𝑢^𝑢2||u-\hat{u}||_{2,h}| | italic_u - over^ start_ARG italic_u end_ARG | | start_POSTSUBSCRIPT 2 , italic_h end_POSTSUBSCRIPT uu^subscriptnorm𝑢^𝑢||u-\hat{u}||_{\infty}| | italic_u - over^ start_ARG italic_u end_ARG | | start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT uu^2,hsubscriptnorm𝑢^𝑢2||u-\hat{u}||_{2,h}| | italic_u - over^ start_ARG italic_u end_ARG | | start_POSTSUBSCRIPT 2 , italic_h end_POSTSUBSCRIPT uu^subscriptnorm𝑢^𝑢||u-\hat{u}||_{\infty}| | italic_u - over^ start_ARG italic_u end_ARG | | start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT uu^2,hsubscriptnorm𝑢^𝑢2||u-\hat{u}||_{2,h}| | italic_u - over^ start_ARG italic_u end_ARG | | start_POSTSUBSCRIPT 2 , italic_h end_POSTSUBSCRIPT uu^subscriptnorm𝑢^𝑢||u-\hat{u}||_{\infty}| | italic_u - over^ start_ARG italic_u end_ARG | | start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT
Trig. (n=1𝑛1n=1italic_n = 1) 0.5 7.6693e-04 1.5299e-03 1.0560e-03 2.1106e-03 1.1255e-03 2.2507e-03
1.0 5.2033e-04 1.0380e-03 7.0797e-04 1.4151e-03 7.5314e-04 1.5060e-03
2.5 6.1623e-04 1.2293e-03 8.6455e-04 1.7280e-03 9.2434e-04 1.8484e-03
5.0 1.5534e-04 3.0988e-04 2.3116e-04 4.6204e-04 2.4942e-04 4.9877e-04
Trig. (n=4𝑛4n=4italic_n = 4) 0.5 5.9961e-03 1.1961e-02 1.3890e-03 2.7763e-03 2.8791e-04 5.7573e-04
1.0 3.7021e-03 7.3852e-03 8.5647e-04 1.7119e-03 1.7638e-04 3.5271e-04
2.5 5.4540e-03 1.0880e-02 1.2656e-03 2.5296e-03 2.6449e-04 5.2889e-04
5.0 2.8215e-03 5.6286e-03 6.5597e-04 1.3111e-03 1.3834e-04 2.7664e-04
Gaussian 0.5 3.0933e-03 3.8814e-02 7.1809e-04 1.0412e-02 1.8424e-04 2.4368e-03
1.0 1.9068e-03 2.3920e-02 4.4286e-04 6.4085e-03 1.1490e-04 1.4920e-03
2.5 2.8194e-03 3.5387e-02 6.5422e-04 9.5075e-03 1.6586e-04 2.2389e-03
5.0 1.4619e-03 1.8355e-02 3.3909e-04 4.9403e-03 8.4882e-05 1.1716e-03
Table 7: Finite differences with backward Euler errors on selected parabolic problems for comparison. Here, the time step is 0.1.
Problem Activation N=32𝑁32N=32italic_N = 32 N=64𝑁64N=64italic_N = 64 N=128𝑁128N=128italic_N = 128
uu^2,hsubscriptnorm𝑢^𝑢2||u-\hat{u}||_{2,h}| | italic_u - over^ start_ARG italic_u end_ARG | | start_POSTSUBSCRIPT 2 , italic_h end_POSTSUBSCRIPT uu^subscriptnorm𝑢^𝑢||u-\hat{u}||_{\infty}| | italic_u - over^ start_ARG italic_u end_ARG | | start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT uu^2,hsubscriptnorm𝑢^𝑢2||u-\hat{u}||_{2,h}| | italic_u - over^ start_ARG italic_u end_ARG | | start_POSTSUBSCRIPT 2 , italic_h end_POSTSUBSCRIPT uu^subscriptnorm𝑢^𝑢||u-\hat{u}||_{\infty}| | italic_u - over^ start_ARG italic_u end_ARG | | start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT uu^2,hsubscriptnorm𝑢^𝑢2||u-\hat{u}||_{2,h}| | italic_u - over^ start_ARG italic_u end_ARG | | start_POSTSUBSCRIPT 2 , italic_h end_POSTSUBSCRIPT uu^subscriptnorm𝑢^𝑢||u-\hat{u}||_{\infty}| | italic_u - over^ start_ARG italic_u end_ARG | | start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT
Trig. (n=1𝑛1n=1italic_n = 1) ReLU 7.6775e-04 1.5420e-03 1.0746e-03 2.1598e-03 1.2400e-03 2.5946e-03
Tanh 7.6743e-04 1.5391e-03 1.0590e-03 2.1269e-03 1.1478e-03 2.3299e-03
Swish 7.6905e-04 1.5538e-03 1.0623e-03 2.0923e-03 1.1637e-03 2.4307e-03
Identity 7.6825e-04 1.5388e-03 1.0661e-03 2.1703e-03 1.1562e-03 2.4109e-03
Trig. (n=4𝑛4n=4italic_n = 4) ReLU 5.9967e-03 1.1968e-02 1.5607e-03 4.1957e-03 9.2973e-04 2.7800e-03
Tanh 5.9970e-03 1.1968e-02 1.3906e-03 2.8266e-03 3.1624e-04 8.8251e-04
Swish 6.0061e-03 1.1990e-02 1.6013e-03 3.9862e-03 4.7380e-04 1.4040e-03
Identity 6.0011e-03 1.1977e-02 1.3953e-03 2.9254e-03 3.2371e-04 9.0033e-04
Gaussian ReLU 3.0932e-03 3.8828e-02 7.3665e-04 1.0873e-02 3.8312e-04 3.3270e-03
Tanh 3.0930e-03 3.8808e-02 7.1797e-04 1.0419e-02 1.9917e-04 2.4221e-03
Swish 3.0933e-03 3.8830e-02 7.4640e-04 1.0099e-02 6.0038e-04 4.4619e-03
Identity 3.0932e-03 3.8826e-02 7.1997e-04 1.0341e-02 1.8902e-04 2.5298e-03
Table 8: Accuracy of unsupervised predictions for varying grid sizes and activation functions at time 0.5. The depth of the network is set to three, the number of optimization steps to 250, and the time step is 0.1.

Algorithm 1 may benefit from transfer learning (i.e., reusing weights from previous problems) with a supervised counterpart that is trained on a large labeled dataset. We randomly initialize the neural network weights in Algorithms 1 and 2 (first time step only). This random initialization may result in predictions with high errors at the beginning of training. Without optimal step sizes in our optimizer, these poor solutions may result in slow convergence towards an optimal solution. Using transfer learning with a pre-trained network that is trained in a supervised setting (i.e., with labeled training data), the initial predictions from our algorithms may be already close to optimal, resulting in faster convergence. One could view the use of these pre-trained networks as a sort of preconditioner for our neural network-based approach. Indeed, we see the benefits of transfer learning with Algorithm 2 after the first time step. Instead of reinitializing the weights for the subsequent time steps, we resuse the weights from the previous solution. That allows us to use a small number of optimization steps (i.e., =250250\mathcal{M}=250caligraphic_M = 250) at each time for parabolic problems.

It is important to note that our methods for estimating the finite difference solutions do not explicitly construct the finite difference matrix. Instead, we use our neural networks to map the source term f𝑓fitalic_f to the approximate finite difference solution. In this sense, we are learning the inverse mapping of the finite difference matrix. From a numerical point of view, this provides the advantage of not having to construct or store a finite difference matrix. Instead, we implement the required stencil(s) for our problem in the loss function and apply them appropriately. Only having to implement the stencils and not construct a finite difference matrix is also an advantage from an implementation point-of-view.

The justification for the lack of non-linear activation functions in our network is that the relationship between the source term f𝑓fitalic_f and the finite differences solution uhsubscript𝑢u_{h}italic_u start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT is given by a system of linear equations of the form Auh=f𝐴subscript𝑢𝑓Au_{h}=fitalic_A italic_u start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT = italic_f. In other words, we know a priori that our goal is to learn a linear relationship between f𝑓fitalic_f and our estimated solution. In many machine learning applications, the exact nature of the input and output relationship is unknown and assumed to be highly non-linear. Hence, including non-linear activation functions results in a network that learns a non-linear relationship between inputs and outputs. Using the identity function as an activation puts us outside the scope of the universal approximation theorem [34, 35, 36]. However, in our case, we do not need our neural network to be a universal approximator. The role of the network is to learn a linear relationship for a single example. Inputs other than f𝑓fitalic_f in Algorithms 1 and 2, like random noise or constants, would break our assumption that the relationship between the network input and output is linear. Hence, a much larger (i.e., more parameters) non-linear network would be needed to learn such a relationship.

We utilize the PocketNet approach proposed by [22] in our proposed algorithms. This approach takes advantage of the similarity between the U-Net architecture and geometric multigrid methods to drastically reduce the number of parameters, while maintaining the same accuracy as conventional CNNs for medical imaging and scientific machine learning tasks [37, 38, 39]. Additionally, we replace transposed convolution with bilinear upsampling. We find that these changes save time and memory and yield the same accuracy that we see using conventional CNNs (i.e., doubling the number of channels at every depth). These results indicate that smaller neural networks (in terms of parameters) can achieve high accuracy for scientific machine learning tasks.

We see in Tables 2 and 5 that the depth of the U-Net architecture does not generally have a significant effect on our results. This indicates that, regardless of depth, the U-Net architecture is sufficiently expressive to learn a mapping from the right hand side f𝑓fitalic_f to an approximation of the finite difference solution uhsubscript𝑢u_{h}italic_u start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT. However, non-U-shaped architectures like the HRNet may also produce similar or improved results [40]. Additionally, the use of residual or dense connections within the convolutions of our architecture may also be beneficial [41, 42]. Such block designs have been shown to speed up convergence to lower loss values for neural networks [43]. Finally, modifying our existing architecture by adding deep supervision could also speed up convergence to lower loss values in fewer iterations [44, 45].

The use of the weighting parameter α𝛼\alphaitalic_α in (7) and (2.2) is necessary to enforce the given boundary conditions. We arrived at our proposed values of α𝛼\alphaitalic_α via a grid search over a range of possible values. However, finding optimal values of weighting parameters like α𝛼\alphaitalic_α is an open question [16]. Strongly enforcing the Dirichlet boundary conditions by modifying the network output to the prescribed values on the boundary is another approach that has been shown to be effective from an experimental and theoretical perspective [46, 24, 47]. However, both approaches of weakly and strongly enforcing boundary conditions in neural network-based methods are popular and have their own strengths and weaknesses [16]. In our case, strongly enforcing boundary conditions does not have a significant impact on the accuracy of our predicted solutions. Indeed, even with our weakly enforced approach, the boundary is not a significant source of error.

Refer to caption
Figure 6: True solution (top), predicted solution (middle) and absolute difference (bottom) for the first ten time steps of the trigonometric problem presented in Section 2.4 with n=1𝑛1n=1italic_n = 1.
Refer to caption
Figure 7: True solution (top), predicted solution (middle) and absolute difference (bottom) for the first ten time steps of the trigonometric problem presented in Section 2.4 with n=4𝑛4n=4italic_n = 4.
Refer to caption
Figure 8: True solution (top), predicted solution (middle) and absolute difference (bottom) for the first ten time steps of the Gaussian problem presented in Section 2.4.

5 Conclusions

The results presented above show the effectiveness of our proposed unsupervised approaches for estimating the finite difference solution to elliptic and parabolic problems. Unlike classical PINNs, our approach is influenced by numerical PDEs (i.e., the finite difference method), resulting in more explainable solutions. Additionally, unlike other CNN-based approaches, we define the method for elliptic PDEs with non-constant diffusion coefficients and extend it to time-dependent problems. Finally, we use small linear CNNs, making our method computationally efficient.

Our approach could benefit finite difference solvers by producing better initial guesses and/or acting as a preconditioner. With a few iterations, Algorithm 1 can produce good initial guesses for iterative solvers, thereby reducing the number of iterations required to solve the linear system resulting from (2) or (8). This same idea can also apply to time-dependent problems, but with initial guesses being produced at each time step. Additionally, because we employ identity for activation functions, our neural networks are linear. Hence, it may be possible to represent a pretrained architecture as a matrix. This resulting matrix could then be used as a preconditioner for finite difference solvers. This use of preconditioners and further testing on other kinds of PDEs, like convection-diffusion, coupled, and nonlinear problems, will be the object of future work.

6 Acknowledgements

The Department of Defense supports Adrian Celaya through the National Defense Science & Engineering Graduate Fellowship Program. Keegan Kirk is supported by the Natural Sciences and Engineering Research Council of Canada through the Postdoctoral Fellowship Program (PDF-568008). David Fuentes is partially supported by R21CA249373. Beatrice Riviere is partially supported by NSF-DMS2111459. This research was partially supported by the Tumor Measurement Initiative through the MD Anderson Strategic Research Initiative Development (STRIDE), NSF-2111147, and NSF-2111459.

References

  • [1] K. Hornik, M. Stinchcombe, and H. White, “Multilayer feedforward networks are universal approximators,” Neural Networks, vol. 2, pp. 359–366, 1989.
  • [2] K.-I. Funahashi, “On the approximate realization of continuous mappings by neural networks,” Neural Networks, vol. 2, pp. 183–192, 1989.
  • [3] M. Dissanayake and N. Phan-Thien, “Neural-network-based approximations for solving partial differential equations,” communications in Numerical Methods in Engineering, vol. 10, no. 3, pp. 195–201, 1994.
  • [4] I. E. Lagaris, A. Likas, and D. I. Fotiadis, “Artificial neural networks for solving ordinary and partial differential equations,” IEEE transactions on neural networks, vol. 9, no. 5, pp. 987–1000, 1998.
  • [5] M. Raissi, P. Perdikaris, and G. E. Karniadakis, “Physics informed deep learning (part i): Data-driven solutions of nonlinear partial differential equations,” arXiv preprint arXiv:1711.10561, 2017.
  • [6] M. Raissi, P. Perdikaris, and G. E. Karniadakis, “Physics informed deep learning (part i): Data-driven solutions of nonlinear partial differential equations,” arXiv preprint arXiv:1711.10561, 2017.
  • [7] J. Hou, Y. Li, and S. Ying, “Enhancing pinns for solving pdes via adaptive collocation point movement and adaptive loss weighting,” Nonlinear Dynamics, vol. 111, no. 16, pp. 15233–15261, 2023.
  • [8] X. Zhang, B. Mao, Y. Che, J. Kang, M. Luo, A. Qiao, Y. Liu, H. Anzai, M. Ohta, Y. Guo, et al., “Physics-informed neural networks (pinns) for 4d hemodynamics prediction: An investigation of optimal framework based on vascular morphology,” Computers in Biology and Medicine, vol. 164, p. 107287, 2023.
  • [9] M. Sarabian, H. Babaee, and K. Laksari, “Physics-informed neural networks for brain hemodynamic predictions using medical imaging,” IEEE transactions on medical imaging, vol. 41, no. 9, pp. 2285–2303, 2022.
  • [10] D. Li, K. Deng, D. Zhang, Y. Liu, H. Leng, F. Yin, K. Ren, and J. Song, “Lpt-qpn: A lightweight physics-informed transformer for quantitative precipitation nowcasting,” IEEE Transactions on Geoscience and Remote Sensing, 2023.
  • [11] S. Mowlavi and S. Nabi, “Optimal control of pdes using physics-informed neural networks,” Journal of Computational Physics, vol. 473, p. 111731, 2023.
  • [12] F. Fotiadis and K. G. Vamvoudakis, “A physics-informed neural networks framework to solve the infinite-horizon optimal control problem,” in 2023 62nd IEEE Conference on Decision and Control (CDC), pp. 6014–6019, IEEE, 2023.
  • [13] P.-H. Chiu, J. C. Wong, C. Ooi, M. H. Dao, and Y.-S. Ong, “Can-pinn: A fast physics-informed neural network based on coupled-automatic–numerical differentiation method,” Computer Methods in Applied Mechanics and Engineering, vol. 395, p. 114909, 2022.
  • [14] K. Tang, X. Wan, and C. Yang, “Das-pinns: A deep adaptive sampling method for solving high-dimensional partial differential equations,” Journal of Computational Physics, vol. 476, p. 111868, 2023.
  • [15] P. Ramachandran, B. Zoph, and Q. V. Le, “Searching for activation functions,” arXiv preprint arXiv:1710.05941, 2017.
  • [16] S. Cuomo, V. S. Di Cola, F. Giampaolo, G. Rozza, M. Raissi, and F. Piccialli, “Scientific machine learning through physics–informed neural networks: Where we are and what’s next,” Journal of Scientific Computing, vol. 92, no. 3, p. 88, 2022.
  • [17] G. E. Karniadakis, I. G. Kevrekidis, L. Lu, P. Perdikaris, S. Wang, and L. Yang, “Physics-informed machine learning,” Nature Reviews Physics, vol. 3, no. 6, pp. 422–440, 2021.
  • [18] A. Bonfanti, R. Santana, M. Ellero, and B. Gholami, “On the generalization of pinns outside the training domain and the hyperparameters influencing it,” arXiv preprint arXiv:2302.07557, 2023.
  • [19] K. Sirignano, J.and Spiliopoulos, “DGM: A deep learning algorithm for solving partial differential equations,” Journal of Computational Physics, vol. 375, pp. 1339–1364, 2018.
  • [20] E. Wang and B. Yu, “The Deep Ritz method: A deep learning-based numerical algorithm for solving variational problems,” Communications in Mathematical Sciences, vol. 6, pp. 1–12, 2018.
  • [21] J. Müller and M. Zeinhofer, “Error estimate for the Deep Ritz method with boundary penalty,” Machine Learning Research, vol. 145, pp. 1–20, 2022.
  • [22] A. Celaya, J. A. Actor, R. Muthusivarajan, E. Gates, C. Chung, D. Schellingerhout, B. Riviere, and D. Fuentes, “Pocketnet: A smaller neural network for medical image analysis,” IEEE Transactions on Medical Imaging, vol. 42, no. 4, pp. 1172–1184, 2022.
  • [23] K. L. Lim, R. Dutta, and M. Rotaru, “Phyics informed neural network using finite difference method,” 2022 IEEE International Conference on Systems, Man, and Cybernetics (SMC), 2022.
  • [24] X. Zhao, Z. Gong, Y. Zhang, W. Yao, and X. Chen, “Physics-informed convolutional neural networks for temperature field prediction of heat source without labeled data,” Engineering Applications of Artificial Intelligence, vol. 117, p. 105516, 2023.
  • [25] O. Ronneberger, P. Fischer, and T. Brox, “U-net: Convolutional networks for biomedical image segmentation,” in Medical Image Computing and Computer-Assisted Intervention–MICCAI 2015: 18th International Conference, Munich, Germany, October 5-9, 2015, Proceedings, Part III 18, pp. 234–241, Springer, 2015.
  • [26] A. Celaya, B. Denel, Y. Sun, M. Araya-Polo, and A. Price, “Inversion of time-lapse surface gravity data for detection of 3-D CO2 plumes via deep learning,” IEEE Transactions on Geoscience and Remote Sensing, vol. 61, pp. 1–11, 2023.
  • [27] X. Yang, X. Chen, and M. M. Smith, “Deep learning inversion of gravity data for detection of CO2 plumes in overlying aquifers,” Journal of Applied Geophysics, vol. 196, p. 104507, 2022.
  • [28] Y. Sun, B. Denel, N. Daril, L. Evano, P. Williamson, and M. Araya-Polo, “Deep learning joint inversion of seismic and electromagnetic data for salt reconstruction,” SEG Technical Program Expanded Abstracts 2020, pp. 550–554, 2020.
  • [29] Ö. Çiçek, A. Abdulkadir, S. S. Lienkamp, T. Brox, and O. Ronneberger, “3D U-Net: learning dense volumetric segmentation from sparse annotation,” in Medical Image Computing and Computer-Assisted Intervention–MICCAI 2016: 19th International Conference, Athens, Greece, October 17-21, 2016, Proceedings, Part II 19, pp. 424–432, Springer, 2016.
  • [30] F. Isensee, P. F. Jaeger, S. A. Kohl, J. Petersen, and K. H. Maier-Hein, “nnU-Net: a self-configuring method for deep learning-based biomedical image segmentation,” Nature Methods 2020 18:2, vol. 18, pp. 203–211, 12 2020.
  • [31] B. Riviere and M. F. Wheeler, “A posteriori error estimates for a discontinuous Galerkin method applied to elliptic problems. log number: R74,” Computers & Mathematics with Applications, vol. 46, no. 1, pp. 141–163, 2003.
  • [32] D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,” arXiv preprint arXiv:1412.6980, 2014.
  • [33] F. Chollet et al., “Keras.” https://keras.io, 2015.
  • [34] K. Hornik, M. Stinchcombe, and H. White, “Multilayer feedforward networks are universal approximators,” Neural networks, vol. 2, no. 5, pp. 359–366, 1989.
  • [35] G. Cybenko, “Approximation by superpositions of a sigmoidal function,” Mathematics of control, signals and systems, vol. 2, no. 4, pp. 303–314, 1989.
  • [36] K.-I. Funahashi, “On the approximate realization of continuous mappings by neural networks,” Neural networks, vol. 2, no. 3, pp. 183–192, 1989.
  • [37] H. Zhang, M. Liu, Y. Qi, Y. Ning, S. Hu, L. Nie, and W. Zhang, “Efficient brain tumor segmentation with lightweight separable spatial convolutional network,” ACM Transactions on Multimedia Computing, Communications and Applications, 2024.
  • [38] A. Celaya, D. Fuentes, and B. Riviere, “Fmg-net and w-net: Multigrid inspired deep learning architectures for medical imaging segmentation,” in Neural Information Processing Systems Conference: LatinX in AI (LXAI) Research Workshop 2023, LXAI, 2023.
  • [39] A. Celaya, B. Denel, Y. Sun, and M. Araya-Polo, “Inversion of time-lapse surface gravity data for monitoring of 3d co2 plumes via physics informed neural networks,” in Proceedings of the 2024 SIAM Conference on Parallel Processing for Scientific Computing (PP), pp. 1–12, SIAM, 2024.
  • [40] K. Sun, B. Xiao, D. Liu, and J. Wang, “Deep high-resolution representation learning for human pose estimation,” in CVPR, 2019.
  • [41] K. He, X. Zhang, S. Ren, and J. Sun, “Identity mappings in deep residual networks,” in European conference on computer vision, pp. 630–645, Springer, 2016.
  • [42] G. Huang, Z. Liu, L. Van Der Maaten, and K. Q. Weinberger, “Densely connected convolutional networks,” in Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 4700–4708, 2017.
  • [43] H. Li, Z. Xu, G. Taylor, C. Studer, and T. Goldstein, “Visualizing the loss landscape of neural nets,” Advances in neural information processing systems, vol. 31, 2018.
  • [44] C.-Y. Lee, S. Xie, P. Gallagher, Z. Zhang, and Z. Tu, “Deeply-supervised nets,” in Proceedings of the Eighteenth International Conference on Artificial Intelligence and Statistics (G. Lebanon and S. V. N. Vishwanathan, eds.), vol. 38 of Proceedings of Machine Learning Research, (San Diego, California, USA), pp. 562–570, PMLR, 09–12 May 2015.
  • [45] R. Li, X. Wang, G. Huang, W. Yang, K. Zhang, X. Gu, S. N. Tran, S. Garg, J. Alty, and Q. Bai, “A comprehensive review on deep supervision: Theories and applications,” arXiv preprint arXiv:2207.02376, 2022.
  • [46] S. Berrone, C. Canuto, M. Pintore, and N. Sukumar, “Enforcing dirichlet boundary conditions in physics-informed neural networks and variational physics-informed neural networks,” Heliyon, vol. 9, no. 8, 2023.
  • [47] M. Zeinhofer, R. Masri, and K.-A. Mardal, “A unified framework for the error analysis of physics-informed neural networks,” arXiv preprint arXiv:2311.00529, 2023.