Julia parallel course EGU21

vEGU2021: SC4.6 Solving differential equations in parallel with Julia | Thu, 29 Apr, 16:00–17:00 (CEST)

👉 Organisation notes:

💡 The short course material (available on this repository) was just updated - fetch the latest versions!
👉 Make sure to go through Getting started before the start of the course.
Further interests in solving PDEs with Julia on GPUs❓
- Sign-up (free) for a hands-on workshop at JuliaCon2021
- Check out this online geo-HPC tutorial

This short course covers trendy areas in modern geocomputing with broad geoscientific applications. The physical processes governing natural systems' evolution are often mathematically described as systems of differential equations. A performant numerical implementation of the solving algorithm leveraging modern hardware is key and permits to tackle problems that were technically not possible a decade ago.

Content

Objectives
About this repository
Getting started (not part of the live course)
👉 Short course material
Extras (not part of the live course)
Further reading

Objectives

The goal of this short course is to offer an interactive overview on how to solve systems of (partial) differential equations in parallel on GPUs using the Julia language. Julia combines high-level language simplicity and low-level language performance. The resulting codes and applications are fast, short and readable [1, 2, 3].

We will design and implement a numerical algorithm that predicts ice flow dynamics over mountainous topography (Greenland) using GPU computing (e.g. Fig. below). We will discretise the shallow ice approximation (SIA) equations in our ice flow solver to assess Greenland's ice cap evolution as function of a climate scenario.

The online course consists of 2 parts:

Part 1 - You will learn about the Julia language and iterative PDE solvers.
Part 2 - You will implement a GPU parallel PDE solver to predict ice flow dynamics on real topography.

By the end of this short course, you will:

Have an iterative GPU PDE solver that predicts ice-flow;
Have a Julia code that achieves similar performance than legacy codes (C, CUDA, MPI);
Know how the Julia language solves the "two-language problem";
Be able to leverage the computing power of modern GPU accelerated servers and supercomputers.

💡 Disclaimer

The solvers presented in this short course, based on the pseudo-transient method, enable to solve PDEs iteratively and are well-suited for parallel execution (on GPUs). It is not the purpose of this course to provide an extensive overview of various solution techniques.

The performance assessment is done using the time / iteration metric which reflects the ability of the algorithm to efficiently exploit the memory bandwidth of the (parallel) hardware. Further performance considerations regarding the metric can be found here.

⤴️ back to content

About this repository

The course repository lists following folders and items:

the data folder contains various low resolution Greenland input data (bedrock topography, surface elevation, ice thickness, masks, ...) downscaled from BedMachine Greenland v3 - note the filenames include grid resolution information (nx, ny);
the docs folder contains documentation linked in the README;
the various output folder will contain the codes output, mainly figures in png format;
the scripts folder contains the scripts this course is about 🎉
the extras folder contains supporting course material (not discussed live during the course);
the Project.toml file is a Julia project file, tracking the used packages and enabling a reproducible environment.

👉 This repository is an interactive and dynamic source of information related to the short course.

Check out the Discussion tab if you have general comments, ideas to share or for Q&A.

File an Issue if you encounter any technical problems with the distributed codes.

Interact in a open-minded, respectful and inclusive manner.

⤴️ back to content

Getting started

⚠️ Due to the time limitation, the short course will not cover the Getting started steps. These are meant to provide directions to the participant willing to actively try out the examples during the short course. It is warmly recommended to perform the Getting started steps before the beginning of the workshop.

Please follow the detailed steps in: docs/getting-started.md

The provided directions will get you started with:

Installing Julia v1.6
Running the scripts from the course repository.

👉 Note: This course was developed and tested on Julia v1.6. It should work with any Julia version ≥v1.6. The install configuration were tested on a MacBook Pro running macOS 10.15.7, a Linux GPU server running Ubuntu 20.04 LTS and a Linux GPU server running CentOS 8.

Short course material

This section lists the material discussed within this 60 min. short course:

Part 1 - Julia and iterative solvers
Part 2 - solving ice flow PDEs on GPUs

💡 In this course we will implement a 2D nonlinear diffusion equation on GPUs in Julia using the finite-difference method and an iterative solving approach. The goal is to resolve the shallow ice approximation (SIA) and predict ice flow over Greenland.

⤴️ back to content

Part 1 - Julia and iterative solvers

Why Julia

by Mauro Werder

Julia is a modern, general-purpose programming language unifying interactive, high productivity features (like Python, Matlab, etc.) with high performance (like C, Fortran, etc.). This removes the need to have a separate prototype and production languages (the two-language problem).

The main reason to use Julia for scientific computing is:

high performance & productivity, ditto
a good package manager (see, e.g., the Project.toml of this course) making reproducible science easier
a rapidly expanding number of packages, with many at the forefront of research (e.g. GPU-packages, differential equations, machine learning, automatic differentiation)
a friendly community with a large number of scientific users

The main reason to use Julia for GPU computing is that it indeed solves the two-language problem in this domain: it works well from prototyping an idea with a simple serial code to massively parallel, multi-node GPU production code.

A short introduction to Julia will be given using the first numerical example of this course (next section). A very short summary of features covered:

third-party packages can be installed with the package manager (see Package installation)
use a package with using XYZ
run the code in a file with include("abc.jl")
index into an array with [ ] and starts at 1
vectorized function application do with the dot-notation, e.g. sin.(x) ./ y for vectors x and y
macros do funky stuff with your code (aka code-transformations) and call them with @, e.g. @time sin(1) prints the time it takes to evaluate sin(1)
for raster-data handling we use GeoData.jl (but other packages also exist)

For more info see https://docs.julialang.org.

⤴️ back to course material

Diffusion equation

Let's start with a 1D linear diffusion example to implement both an explicit and iterative implicit PDE solver:

dH/dt = ∇.(D ∇H)

The diffusion of a quantity H over time t can be described as (1a) a diffusive flux, (1b) a flux balance and (1c) an update rule:

qH    = -D*dH/dx  (1a)
dHdt  =  -dqH/dx  (1b)
dH/dt = dHdt      (1c)

The diffusion_1D_expl.jl code implements an iterative and explicit solution of eq. (1) for an initial Gaussian profile:

H0 = exp(-(x-lx/2.0)^2)

But now, you may ask: can we use an implicit algorithm to side-step the CFL-condition, control the (physically motivated) time steps dt and keep it "matrix-free" ?

⤴️ back to course material

Iterative solvers

by Ludovic Räss

The diffusion_1D_impl.jl code implements an iterative, implicit solution of eq. (1). How ? We include the physical time derivative dH/dt=(H-Hold)/dt in the previous rate of change dHdt to define the residual ResH

ResH = -(H-Hold)/dt -dqH/dx

and iterate until the values of ResH (the residual of the eq. (1)) drop below a defined tolerance level tol.

It works, but the "naive" Picard iteration count seems to be pretty high (niter>7000). A efficient way to circumvent this is to add "damping" (damp) to the rate-of-change dHdt, analogous to add friction enabling faster convergence [4]

dHdt = ResH + damp*dHdt

The diffusion_1D_damp.jl code implements a damped iterative implicit solution of eq. (1). The iteration count drops to niter~700. This pseudo-transient approach enables fast as the iteration count scales close to O(N) and not O(N^2).

Performance considerations

Performance evaluation is a complex topic as different metrics would lead to different conclusions. Ultimately, efficient algorithms should minimise the time to solution. For iterative algorithms this means:

Ensure fast iterations (minimise the time per iteration).
Keep the iteration count as low as possible and, in particular, that iteration count scales aroung O(n) with the numerical resolution n.

We will here report (1) for various implementations on various computer architectures.

⤴️ back to course material

Part 2 - solving ice flow PDEs on GPUs

SIA equation applied to the Greenland Ice Sheet

Let's move from the simple 1D linear diffusion example to the shallow ice approximation (SIA) equation, a 2D nonlinear diffusion equation for ice thickness H:

dH/dt = ∇.(D ∇S) + M

with surface elevation S the sum of ice thickness H and bed elevation B
S = H + B,
surface mass-balance M, and non-linear diffusion coefficient
D = a Hⁿ⁺² √(∇S.∇S)ⁿ⁻¹
where n is Glen's n (take =3) and a is the ice flow factor (1.5e-24 Pa⁻ⁿ s⁻¹).

Writing the equation in pseudo-code:

qHx   = -D*dS/dx                  (2a)
qHy   = -D*dS/dy                  (2b)
dHdt  = -(dqHx/dx + dqHy/dy) + M  (2c)
dH/dt = dHdt                      (2d)

with the nonlinear diffusion coefficient D with the power-law exponent npow (aka Glen's exponent):

D = a*H^(npow+2)*sqrt((dS/dx)^2 + (dS/dy)^2)^(npow-1)

The topography data, bedrock elevation and ice thickness, is from the BedMachine Greenland v3 dataset (Morlighem et al., (2017)) and is loaded via the GeoData.jl package (see helpers.jl).

We implement climate forcing using a simple mass balance (accumulation and ablation) M formulation:

M  = min(grad_b*(S - z_ELA), b_max)

as function of the surface elevation S and capped by the maximal accumulation rate b_max. The mass balance gradient grad_b is defined as

grad_b = (1.3517 - 0.014158*LAT)/100.0*0.91

where LAT is the latitude (taken from [5]). The equilibrium line altitude z_ELA (where accumulation = ablation) is latitude dependent, ranging from 1300m (South) to 1000m (North) as suggested by [5].

⤴️ back to course material

SIA implementation

The iceflow.jl code implements the 2D SIA equations using the iterative implicit damped formulation as in diffusion_1D_damp.jl using the pseudo-transient approach. The calculation of the SIA PDEs resumes in these 13 lines of Julia code:

# mass balance
M     .= min.(grad_b.*(S .- z_ELA), b_max)
# compute diffusivity
dSdx  .= diff(S, dims=1)/dx
dSdy  .= diff(S, dims=2)/dy
gradS .= sqrt.(av_ya(dSdx).^2 .+ av_xa(dSdy).^2)
D     .= a*av(H).^(npow+2) .* gradS.^(npow-1)
# compute flux
qHx   .= .-av_ya(D).*diff(S[:,2:end-1], dims=1)/dx
qHy   .= .-av_xa(D).*diff(S[2:end-1,:], dims=2)/dy
# update ice thickness
dtau  .= dtausc*min.(10.0, cfl./(epsi .+ av(D)))
ResH  .= .-(diff(qHx, dims=1)/dx .+ diff(qHy, dims=2)/dy) .+ inn(M)
dHdt  .= dHdt.*damp .+ ResH
H[2:end-1,2:end-1] .= max.(0.0, inn(H) .+ dtau.*dHdt)
# apply mask
H[Mask.==0] .= 0.0
# update surface
S     .= B .+ H

💡 Note that the here discussed SIA codes do not implement any flux limiter scheme to circumvent known accuracy and stability issues. Check out [6, 7] for further references (the iceflow_bench.jl script implements the benchmark [6] that reflects this issue).

The model output is the ice surface elevation, the ice thickness, the ice velocity magnitude and the mass balance:

This implementation of the SIA equations solves the steady-state (i.e. the physical time derivative being removed as dt->∞). The last part of this course (Greenland's ice cap evolution) will show how to achieve an (implicit) ice flow predictions over a specific time span dt by including the physical time derivative in the ResH term.

CPU Performance

The figure below depicts the time 1000 iterations (or pseudo-time steps) take running the iceflow.jl code on a Intel Xeon Gold 6244 (1 thread - plain Julia):

⤴️ back to course material

GPU SIA implementation

So now we have a cool iterative and implicit SIA solver in less than 100 lines of code 🎉. Good enough for low resolution calculations. What if we need higher resolution and faster time to solution ? Parallel and GPU computing makes it possible. Let's start from the iceflow.jl code and port it to GPU (with some intermediate steps).

The main idea of GPU parallelisation is to calculate each grid point concurently by a different GPU thread (instaed of the more serial CPU execution) as depicted hereafter:

Extract the flux qHx, qHy from the physics calculations in iceflow.jl:

# [...] skipped lines
# compute flux
qHx   .= .-av_ya(D).*diff(S[:,2:end-1], dims=1)/dx
qHy   .= .-av_xa(D).*diff(S[2:end-1,:], dims=2)/dy
# [...] skipped lines

Let's modify these lines into a compute_flux function we will then be able to turn into a GPU kernel.

# [...] skipped lines
function compute_flux!(qHx, qHy, S, D, dx, dy, nx, ny)
    Threads.@threads for iy=1:ny
        for ix=1:nx
            if (ix<=nx-1 && 2<=iy<=ny-1)  qHx[ix,iy] = -0.5*(D[ix,iy]+D[ix+1,iy])*(S[ix+1,iy]-S[ix,iy])/dx  end
            if (2<=ix<=nx-1 && iy<=ny-1)  qHy[ix,iy] = -0.5*(D[ix,iy]+D[ix,iy+1])*(S[ix,iy+1]-S[ix,iy])/dy  end
        end
    end
    return
end
# [...] skipped lines
compute_flux!(qHx, qHy, S, D, dx, dy, nx, ny)
# [...] skipped lines

💡 Julia enables multi-threading capabilities by simply adding Threads.@threads to the outermost loop (here over iy).

The last step is to replace the (multi-threaded) loops by a vectorised GPU index

ix = (blockIdx().x-1) * blockDim().x + threadIdx().x
iy = (blockIdx().y-1) * blockDim().y + threadIdx().y

specific to GPU execution. Each ix and iy are executed concurrently on a different GPU thread:

using CUDA
# [...] skipped lines
compute_flux!(qHx, qHy, S, D, dx, dy, nx, ny)
    ix = (blockIdx().x-1) * blockDim().x + threadIdx().x
    iy = (blockIdx().y-1) * blockDim().y + threadIdx().y
    if (ix<=nx-1 && 2<=iy<=ny-1)  qHx[ix,iy] = -0.5*(D[ix,iy]+D[ix+1,iy])*(S[ix+1,iy]-S[ix,iy])/dx  end
    if (2<=ix<=nx-1 && iy<=ny-1)  qHy[ix,iy] = -0.5*(D[ix,iy]+D[ix,iy+1])*(S[ix,iy+1]-S[ix,iy])/dy  end
    return
end
# [...] skipped lines
@cuda blocks=cublocks threads=cuthreads compute_flux!(qHx, qHy, S, D, dx, dy, nx, ny)
synchronize()
# [...] skipped lines

💡 We use @cuda blocks=cublocks threads=cuthreads to launch the GPU function on the appropriate number of threads, i.e. "parallel workers". The numerical grid resolution nx and ny must now be chosen accordingly to the number of parallel workers.

The here described porting procedure steps are exemplified for the 1D diffusion equation (Porting the diffusion equation to GPUs) available in the extra material:

XPU SIA implementation

Wouldn't it be great to be able to combine the multi-thread CPU and GPU implementations into a single "XPU" code to be able to run on various hardware with only changing a USE_GPU switch ? Using ParallelStencil.jl enables this, as many more cool features. The iceflow_xpu.jl script uses ParallelStencil.jl for an XPU implementation on various backends:

const USE_GPU = false
using ParallelStencil
using ParallelStencil.FiniteDifferences2D
@static if USE_GPU
    @init_parallel_stencil(CUDA, Float64, 2)
else
    @init_parallel_stencil(Threads, Float64, 2)
end
# [...] skipped lines
@parallel function compute_flux!(qHx, qHy, D, S, dx, dy)
    @all(qHx)  = -@av_ya(D)*@d_xi(S)/dx
    @all(qHy)  = -@av_xa(D)*@d_yi(S)/dy
    return
end
# [...] skipped lines
@parallel compute_flux!(qHx, qHy, D, S, dx, dy)
# [...] skipped lines

💡 Various macros @(...) permit to deal with the low-level technicalities and the USE_GPU flag enables to switch between CPU and GPU backend.

The resulting code is short and readable and solves the "two-language problem"; development and production code implementations are regrouped into a single code.

GPU - CPU Performance

The figure below depicts the time 1000 iterations (or pseudo-time steps) take running the iceflow_xpu.jl code on:

an Intel Xeon Gold 6244 (4 threads - cores)
an Nvidia RTX 2070 GPU

⤴️ back to course material

Greenland's ice cap evolution

We can finally use our XPU ice flow solver to simulate the evolution of Greenland's ice cap for a specific climate scenario. The steps are following:

Compute the steady-state ice thickness for current climate forcing (mass balance function as described in SIA equation section)
Apply a time-dependent linear increase of the ELA (the equilibrium line where accumulation = ablation) of 1m/yr over the next 2500yrs. We here assume that an increase in annual mean temperature of 0.6°C is equivalent to a 100m shift of the ELA, thus 1m/yr represents 0.3°C per 50yrs. The iceflow_xpu_evo.jl code implements this climate evolution model and produces the following predictions for the Greenland ice cap evolution:

Note that our climate scenario may reflect some extreme warming over the next 2500yrs.

⤴️ back to course material

Extras

⚠️ Due to the time limitation, the short course will not cover the extras material. This information complements the course's main content and may provide intermediate and/or additional development steps.

Extra material comprises:

Porting the diffusion equation to GPUs
Simple inversion
Performance metric
Multi-XPU implementation

⤴️ back to content

Porting the diffusion equation to GPUs

This extra example complements the GPU SIA implementation and the XPU SIA implementation.

Parallel GPU computing

Let's start from the diffusion_1D_damp.jl code, a cool iterative and implicit solver in about 30 lines of code 🎉, and port it to GPU (with some intermediate steps).

Extract the physics calculations from diffusion_1D_damp.jl, i.e. the time loop:

# [...] skipped lines
qH         .= -D*diff(H)/dx              # flux
ResH       .= -(H[2:end-1] - Hold[2:end-1])/dt - diff(qH)/dx # residual of the PDE
dHdtau     .= ResH + damp*dHdtau         # damped rate of change
H[2:end-1] .= H[2:end-1] + dtau*dHdtau   # update rule, sets the BC as H[1]=H[end]=0
# [...] skipped lines

Split the calculations into separate functions (or kernels) and call those functions within the time loop. The diffusion_1D_damp_fun.jl implements those modifications:

function compute_flux!(qH, H, D, dx, nx)
    Threads.@threads for ix=1:nx
        if (ix<=nx-1)  qH[ix] = -D*(H[ix+1]-H[ix])/dx  end
    end
    return
end

function compute_rate!(ResH, dHdt, H, Hold, qH, dt, damp, dx, nx)
    Threads.@threads for ix=1:nx
        if (2<=ix<=nx-1)  ResH[ix-1] = -(H[ix] - Hold[ix])/dt -(qH[ix]-qH[ix-1])/dx  end
        if (2<=ix<=nx-1)  dHdt[ix-1] = ResH[ix-1] + damp*dHdt[ix-1]  end
    end
    return
end

function compute_update!(H, dHdt, dtau, nx)
    Threads.@threads for ix=1:nx
        if (2<=ix<=nx-1)  H[ix] = H[ix] + dtau*dHdt[ix-1]  end
    end
    return
end
# [...] skipped lines
compute_flux!(qH, H, D, dx, nx)
compute_rate!(ResH, dHdt, H, Hold, qH, dt, damp, dx, nx)
compute_update!(H, dHdt, dtau, nx)
# [...] skipped lines

💡 Julia enables multi-threading capabilities by simply adding Threads.@threads to the outermost loop (here over ix).

The last step is to replace the (multi-threaded) loop by a vectorised index ix = (blockIdx().x-1) * blockDim().x + threadIdx().x specific to GPU execution. Each ix is executed concurrently on a different GPU thread. The diffusion_1D_damp_gpu.jl implements those modifications to run on GPUs:

using CUDA
# [...] skipped lines
function compute_flux!(qH, H, D, dx, nx)
    ix = (blockIdx().x-1) * blockDim().x + threadIdx().x
    if (ix<=nx-1)  qH[ix] = -D*(H[ix+1]-H[ix])/dx  end
    return
end

function compute_rate!(ResH, dHdt, H, Hold, qH, dt, damp, dx, nx)
    ix = (blockIdx().x-1) * blockDim().x + threadIdx().x
    if (2<=ix<=nx-1)  ResH[ix-1] = -(H[ix] - Hold[ix])/dt -(qH[ix]-qH[ix-1])/dx  end
    if (2<=ix<=nx-1)  dHdt[ix-1] = ResH[ix-1] + damp*dHdt[ix-1]  end
    return
end

function compute_update!(H, dHdt, dtau, nx)
    ix = (blockIdx().x-1) * blockDim().x + threadIdx().x
    if (2<=ix<=nx-1)  H[ix] = H[ix] + dtau*dHdt[ix-1]  end
    return
end
# [...] skipped lines
@cuda blocks=cublocks threads=cuthreads compute_flux!(qH, H, D, dx, nx)
synchronize()
@cuda blocks=cublocks threads=cuthreads compute_rate!(ResH, dHdt, H, Hold, qH, dt, damp, dx, nx)
synchronize()
@cuda blocks=cublocks threads=cuthreads compute_update!(H, dHdt, dtau, nx)
synchronize()
# [...] skipped lines

💡 We use @cuda blocks=cublocks threads=cuthreads to launch the GPU function on the appropriate number of threads, i.e. "parallel workers". The numerical grid resolution nx must now be chosen accordingly to the number of workers nx=cublocks*cuthreads.

XPU computing

Wouldn't it be great to be able to combine the multi-thread CPU and GPU implementations into a single "XPU" code to be able to run on various hardware with only changing a USE_GPU switch ? Using ParallelStencil.jl enables this, as well more other cool features. The diffusion_1D_damp_xpu.jl uses ParallelStencil.jl for an XPU implementation on various backends:

const USE_GPU = false
using ParallelStencil
using ParallelStencil.FiniteDifferences1D
@static if USE_GPU
    @init_parallel_stencil(CUDA, Float64, 1)
else
    @init_parallel_stencil(Threads, Float64, 1)
end
# [...] skipped lines
@parallel function compute_flux!(qH, H, D, dx)
    @all(qH) = -D*@d(H)/dx
    return
end

@parallel function compute_rate!(ResH, dHdt, H, Hold, qH, dt, damp, dx)
    @all(ResH) = -(@inn(H) - @inn(Hold))/dt -@d(qH)/dx
    @all(dHdt) = @all(ResH) + damp*@all(dHdt)
    return
end

@parallel function compute_update!(H, dHdt, dtau)
    @inn(H) = @inn(H) + dtau*@all(dHdt)
    return
end
# [...] skipped lines
@parallel compute_flux!(qH, H, D, dx)
@parallel compute_rate!(ResH, dHdt, H, Hold, qH, dt, damp, dx)
@parallel compute_update!(H, dHdt, dtau)
# [...] skipped lines

Various macros @(...) permit to deal with the low-level technicalities and the USE_GPU flag enables to switch between CPU and GPU backend.

⤴️ back to extras

Simple inversion

Using the inversion approach proposed by [7], our ice flow solver iceflow.jl can be embedded into an inversion framework to retrieve spatially variable maximum accumulation rate b_max in order to constrain ice thickness distribution over Greenland. The following animation depicts the evolution of the inversion procedure as function of the 30 inversion steps and was produced using the iceflow_inverse.jl code. Gam represents the misfit between the observed Hice and the calculated H ice thickness, B_max represents the spatially variable maximal accumulation.

The ice thickness obtained from the inversion procedure can be further compared to the BedMachine Greenland v3 ice thickness data:

💡 Note that the inversion procedure serves here as proof of concept, as higher resolution and finer tuning may be needed to further improve the misfit minimisation.

⤴️ back to extras

Performance metric

Majority of stencil based codes as in this course are memory bounded, meaning the limiting factor in performance is the rate at which memory is transferred from and back between the memory and the arithmetic units. The maximal rate at which the memory transfers occur is the memory copy rate, in the order of 50 GB/s for CPUs and about 1 TB/s for modern GPUs. The effective memory throughput metric (T_eff) measures how good an iterative stencil-based algorithm performs in terms of memory throughput, to be compared to the memory copy rate.

Check out the performance metric section from the ParallelStencil.jl module and this JuliaCon2020 presentation [1].

⤴️ back to extras

Multi-XPU implementation

Check out this material to figure out how combining ImplicitGlobalGrid.jl to ParallelStencil.jl enables efficient distributed memory parallelisation on multiple XPUs.

⤴️ back to extras

Name		Name	Last commit message	Last commit date
Latest commit History 220 Commits
.github/workflows		.github/workflows
data		data
docs		docs
extras		extras
scripts		scripts
src		src
test		test
.gitignore		.gitignore
LICENSE		LICENSE
Project.toml		Project.toml
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Julia parallel course EGU21

Content

Objectives

About this repository

Getting started

Short course material

Part 1 - Julia and iterative solvers

Why Julia

Diffusion equation

Iterative solvers

Performance considerations

Part 2 - solving ice flow PDEs on GPUs

SIA equation applied to the Greenland Ice Sheet

SIA implementation

CPU Performance

GPU SIA implementation

XPU SIA implementation

GPU - CPU Performance

Greenland's ice cap evolution

Extras

Porting the diffusion equation to GPUs

Parallel GPU computing

XPU computing

Simple inversion

Performance metric

Multi-XPU implementation

Further reading

About

Releases

Contributors 2

Languages

License

luraess/julia-parallel-course-EGU21

Folders and files

Latest commit

History

Repository files navigation

Julia parallel course EGU21

Content

Objectives

About this repository

Getting started

Short course material

Part 1 - Julia and iterative solvers

Why Julia

Diffusion equation

Iterative solvers

Performance considerations

Part 2 - solving ice flow PDEs on GPUs

SIA equation applied to the Greenland Ice Sheet

SIA implementation

CPU Performance

GPU SIA implementation

XPU SIA implementation

GPU - CPU Performance

Greenland's ice cap evolution

Extras

Porting the diffusion equation to GPUs

Parallel GPU computing

XPU computing

Simple inversion

Performance metric

Multi-XPU implementation

Further reading

About

Resources

License

Stars

Watchers

Forks

Releases

Contributors 2

Languages