Illinois ECE 498AL: Programming Massively Parallel Processors
Category
Published on
Abstract
Spring 2009
Virtually all semiconductor market domains, including PCs, game consoles, mobile handsets, servers, supercomputers, and networks, are converging to concurrent platforms. There are two important reasons for this trend. First, these concurrent processors can potentially offer more effective use of chip space and power than traditional monolithic microprocessors for many demanding applications. Second, an increasing number of applications that traditionally used Application Specific Integrated Circuits (ASICs) are now implemented with concurrent processors in order to improve functionality and reduce engineering cost. The real challenge is to develop applications software that effectively uses these concurrent processors to achieve efficiency and performance goals.
The aim of this course is to provide students with knowledge and hands-on experience in developing applications software for processors with massively parallel computing resources. In general, we refer to a processor as massively parallel if it has the ability to complete more than 64 arithmetic operations per clock cycle. Today NVIDIA processors already exhibit this capability. Processors from Intel, AMD, and IBM will begin to qualify as massively parallel in the next several years. Effectively programming these processors will require in-depth knowledge about parallel programming principles, as well as the parallelism models, communication models, and resource limitations of these processors. The target audiences of the course are students who want to develop exciting applications for these processors, as well as those who want to develop programming tools and future implementations for these processors.
We will be using NVIDIA processors and the CUDA programming tools in the lab section of the course. Many have reported success in performing non-graphics parallel computation as well as traditional graphics rendering computation on these processors. You will go through structured programming assignments before being turned loose on the final project. Each programming assignment will involve successively more sophisticated programming skills. The final project will be of your own design, with the requirement that the project must involve a demanding application such as mathematics- or physics-intensive simulation or other data-intensive computation, followed by some form of visualization and display of results.
This is a course in programming massively parallel processors for general computation. We are fortunate to have the support and presence of David Kirk, the Chief Scientist of NVIDIA and one of the main driving forces behind the new NVIDIA CUDA technology. Building on architecture knowledge from ECE 411, and general C programming knowledge, we will expose you to the tools and techniques you will need to attack a real-world application for the final project. The final projects will be supported by some real application groups at UIUC and around the country, such as biomedical imaging and physical simulation.
Programming Massively Parallel Processors
Topics:
- Introduction
- GPU Computing and CUDA Programming Model Intro
- CUDA Example and CUDA Threads
- CUDA Threads Part 2 and API Details
- CUDA Memory
- CUDA Memory Example
- GPU as Part of the PC Architecture
- CUDA Threading Hardware
- CUDA Memory Hardware
- Control Flow in CUDA
- Floating Point Performance, precision and Accuracy
- Parallel Programming Basics
- Parallel Algorithm Basics
Credits
These lecture were breezed by Carl Pearson and Daniel Borup and then reviewed, edited ,and Uploaded by Omar Sobh.
References
Timothy G. Mattson, Beverly A. Sanders, Berna L. Massingill, Patterns for Parallel Programming, Addison Wesley
Cite this work
Researchers should cite this work as follows:
Tags
Lecture Number/Topic | Online Lecture | Video | Lecture Notes | Supplemental Material | Suggested Exercises |
---|---|---|---|---|---|
Illinois ECE 498AL: Programming Massively Parallel Processors, Lecture 1: Introduction | View Flash | View | Notes (pdf) |
||
Programming Massively Parallel Processors
Topics:
Introduction, Grading, Outline
Lab Equipment
UIUC/NCSA QP Cluster
UIUC/NCSA AP Cluster
ECE498AL Development History
Why Program... |
|||||
Illinois ECE 498AL: Programming Massively Parallel Processors, Lecture 2: The CUDA Programming Model | View Flash | View | Notes (pdf) |
||
CUDA Programming Model
Topics:
What is GPGPU?
CUDA
An Example of Physical Reality Behind CUDA
Parallel computing on a GPU
CUDA - C With no shader limitations
CUDA Devices and... |
|||||
Illinois ECE 498AL: Programming Massively Parallel Processors, Lecture 3: CUDA Threads, Tools, Simple Examples | View Flash | View | Notes (pdf) |
||
CUDA Threads, Tools, Simple Examples
Topics:
A Running example of Matrix Multiplication
Memory Layout of a Matrix in C
Compiling a CUDA Program
Device Emulation Mode Pitfalls
Floating... |
|||||
Illinois ECE 498AL: Programming Massively Parallel Processors, Lecture 4: CUDA Threads - Part 2 | View Flash | View | Notes (pdf) |
||
CUDA Threads Part2
Topics:
CUDA Thread Block
Transparent Scalability
G80 CUDA Mode, A Review
Executing Thread Blocks
Thread Scheduling
Block Granularity Considerations
More Details... |
|||||
Illinois ECE 498AL: Programming Massively Parallel Processors, Lecture 5: CUDA Memories | View Flash | Notes (pdf) |
Lecture5-CUDA-Memories.mp3 |
||
CUDA Memories
Topics:
G80 Implementation of CUDA Memories
CUDA Variable Type Qualifiers
Where to Declare Variables
Variable Type Restrictions
A Common Programming Strategy
GPU Atomic... |
|||||
Illinois ECE 498AL: Programming Massively Parallel Processors, Lecture 6: CUDA Memories - Part 2 | View Flash | Notes (pdf) |
Lecture6-CUDA-Memories-Part2.mp3 |
||
CUDA Memories Part2
Topics:
Tiled Multiply
Breaking Md and Nd into Tiles
Tiled Matrix Multiplication Kernel
CUDA Code - Kernel Execution Configuration
First Order Size considerations... |
|||||
Illinois ECE 498AL: Programming Massively Parallel Processors, Lecture 7: GPU as part of the PC Architecture | View Flash | Notes (pdf) |
Lecture7-GPU-in-PC |
||
GPU as part of the PC Architecture
Topics:
Typical Structure of a CUDA Program
Bandwidth: Gravity of Modern computer Systems
(Original) PCI Bus Specification
PCI as Memory Mapped I/O
... |
|||||
Illinois ECE 498AL: Programming Massively Parallel Processors, Lecture 8: Threading Hardware in G80 | View Flash | Notes (pdf) |
Lecture8-Threading Hardware in G80 |
||
Threading Hardware in G80
Topics:
Single Program Multiple Data (SPMD)
Grids and Blocks
CUDA Thread Block : Review
Geforce-8 Series Hardware Overview
CUDA Processor Terminology
Stream... |
|||||
Illinois ECE 498AL: Programming Massively Parallel Processors, Lecture 9: Memory Hardware in G80 | View Flash | Notes (pdf) |
Lecture9-Memory Hardware in G80 |
||
Memory Hardware in G80
Topics:
CUDA Device Memory Space
Parallel Memory Sharing
SM Memory Architecture
SM Register File
Programmer view of Register File
Matrix Multiplication... |
|||||
Illinois ECE 498AL: Programming Massively Parallel Processors, Lecture 10: Control Flow | View Flash | View | Notes (pdf) |
||
Control Flow
Topics:
Terminology Review
How Thread Blocks are Partitioned
Control Flow Instructions
Parallel Reduction
A Vector Reduction Example
A simple Implementation
Vector... |
|||||
Illinois ECE 498AL: Programming Massively Parallel Processors, Lecture 11: Floating Point Considerations | View Flash | View | Notes (pdf) |
||
Floating Point Considerations
Topics:
GPU Floating Point Features
Normalized Representation
Exponent Representation
Representable Numbers
Flush to Zero
Denormaliztion
Runtime Math... |
|||||
Illinois ECE 498AL: Programming Massively Parallel Processors, Lecture 12: Structuring Parallel Algorithms | View Flash | View | Notes (pdf) |
||
Structuring Parallel Algorithms
Topics:
Key Parallel Programming Steps
Algorithms
Choosing Algorithm Structure
Mapping a Divide and Conquer algorithm
Tiled Algorithms
Increased work... |
|||||
Illinois ECE 498AL: Programming Massively Parallel Processors, Lecture 13: Reductions and their Implementation | View Flash | View | Notes (pdf) |
||
Structuring Parallel Algorithms
Topics:
Parallel Reductions
Parallel Prefix Sum
Relevance of Scan
Application of Scan
Scan on the CPU
First attempt Parallel Scan Algorithm
Work... |
|||||
Illinois ECE 498AL: Programming Massively Parallel Processors, Lecture 14: Application Case Study - Quantative MRI Reconstruction | View Flash | ||||
Quantative MRI Reconstruction
Topics:
Reconstructing MR Images
An exciting revolution: Sodium Map of the Brain
Least Squares reconstruction
Q vs. FhD
Algorithms to Accelerate
From... |
|||||
Illinois ECE 498AL: Programming Massively Parallel Processors, Lecture 15: Kernel and Algorithm Patterns for CUDA | View Flash | ||||
Kernel and Algorithm Patterns for CUDA
Topics:
Reductions and Memory Patterns
Reduction Patterns in CUDA
Mapping Data into CUDA's Memories
Input/Output Convolution
Generic Algorithm... |