Rohde et al., 2020 - Google Patents

Improving HLS generated accelerators through relaxed memory access scheduling

Rohde et al., 2020

Document ID: 17061295322857244907
Author: Rohde J; Müller K; Hochberger C
Publication year: 2020
Publication venue: 2020 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)

External Links

Cited by

Snippet

High-Level-Synthesis can be used to generate hardware accelerators for compute intense software parts (so called kernels). For meaningful acceleration, such kernels should be able to autonomously access the memory. Unfortunately, such memory accesses can constitute …

Continue reading at raw.necst.it (PDF) (other versions)

238000000034 method 0 abstract description 14

Classifications

- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for programme control, e.g. control unit
- G06F9/06—Arrangements for programme control, e.g. control unit using stored programme, i.e. using internal store of processing equipment to receive and retain programme
- G06F9/30—Arrangements for executing machine-instructions, e.g. instruction decode
- G06F9/38—Concurrent instruction execution, e.g. pipeline, look ahead
- G06F9/3885—Concurrent instruction execution, e.g. pipeline, look ahead using a plurality of independent parallel functional units
- G06F9/3889—Concurrent instruction execution, e.g. pipeline, look ahead using a plurality of independent parallel functional units controlled by multiple instructions, e.g. MIMD, decoupled access or execute
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for programme control, e.g. control unit
- G06F9/06—Arrangements for programme control, e.g. control unit using stored programme, i.e. using internal store of processing equipment to receive and retain programme
- G06F9/30—Arrangements for executing machine-instructions, e.g. instruction decode
- G06F9/30003—Arrangements for executing specific machine instructions
- G06F9/30007—Arrangements for executing specific machine instructions to perform operations on data operands
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for programme control, e.g. control unit
- G06F9/06—Arrangements for programme control, e.g. control unit using stored programme, i.e. using internal store of processing equipment to receive and retain programme
- G06F9/30—Arrangements for executing machine-instructions, e.g. instruction decode
- G06F9/38—Concurrent instruction execution, e.g. pipeline, look ahead
- G06F9/3824—Operand accessing
- G06F9/383—Operand prefetching
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for programme control, e.g. control unit
- G06F9/06—Arrangements for programme control, e.g. control unit using stored programme, i.e. using internal store of processing equipment to receive and retain programme
- G06F9/30—Arrangements for executing machine-instructions, e.g. instruction decode
- G06F9/30003—Arrangements for executing specific machine instructions
- G06F9/3004—Arrangements for executing specific machine instructions to perform operations on memory
- G06F9/30043—LOAD or STORE instructions; Clear instruction
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for programme control, e.g. control unit
- G06F9/06—Arrangements for programme control, e.g. control unit using stored programme, i.e. using internal store of processing equipment to receive and retain programme
- G06F9/30—Arrangements for executing machine-instructions, e.g. instruction decode
- G06F9/38—Concurrent instruction execution, e.g. pipeline, look ahead
- G06F9/3836—Instruction issuing, e.g. dynamic instruction scheduling, out of order instruction execution
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F8/00—Arrangements for software engineering
- G06F8/40—Transformations of program code
- G06F8/41—Compilation
- G06F8/44—Encoding
- G06F8/443—Optimisation
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for programme control, e.g. control unit
- G06F9/06—Arrangements for programme control, e.g. control unit using stored programme, i.e. using internal store of processing equipment to receive and retain programme
- G06F9/30—Arrangements for executing machine-instructions, e.g. instruction decode
- G06F9/32—Address formation of the next instruction, e.g. incrementing the instruction counter, jump
- G06F9/322—Address formation of the next instruction, e.g. incrementing the instruction counter, jump for non-sequential address
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F8/00—Arrangements for software engineering
- G06F8/40—Transformations of program code
- G06F8/41—Compilation
- G06F8/44—Encoding
- G06F8/445—Exploiting fine grain parallelism, i.e. parallelism at instruction level
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for programme control, e.g. control unit
- G06F9/06—Arrangements for programme control, e.g. control unit using stored programme, i.e. using internal store of processing equipment to receive and retain programme
- G06F9/30—Arrangements for executing machine-instructions, e.g. instruction decode
- G06F9/34—Addressing or accessing the instruction operand or the result; Formation of operand address; Addressing modes
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for programme control, e.g. control unit
- G06F9/06—Arrangements for programme control, e.g. control unit using stored programme, i.e. using internal store of processing equipment to receive and retain programme
- G06F9/46—Multiprogramming arrangements
- G06F9/48—Programme initiating; Programme switching, e.g. by interrupt
- G06F9/4806—Task transfer initiation or dispatching
- G06F9/4843—Task transfer initiation or dispatching by program, e.g. task dispatcher, supervisor, operating system
- G06F9/4881—Scheduling strategies for dispatcher, e.g. round robin, multi-level priority queues
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/50—Computer-aided design
- G06F17/5045—Circuit design
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/36—Preventing errors by testing or debugging software
- G06F11/362—Software debugging
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F12/00—Accessing, addressing or allocating within memory systems or architectures
- G06F12/02—Addressing or allocation; Relocation
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F15/00—Digital computers in general; Data processing equipment in general
- G06F15/76—Architectures of general purpose stored programme computers
- G06F15/78—Architectures of general purpose stored programme computers comprising a single central processing unit
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F7/00—Methods or arrangements for processing data by operating upon the order or content of the data handled

Similar Documents

Publication	Publication Date	Title
Schoeberl et al.	2018	Patmos: A time-predictable microprocessor
US9639371B2 (en)	2017-05-02	Solution to divergent branches in a SIMD core using hardware pointers
US8161266B2 (en)	2012-04-17	Replicating opcode to other lanes and modifying argument register to others in vector portion for parallel operation
US8448150B2 (en)	2013-05-21	System and method for translating high-level programming language code into hardware description language code
Czajkowski et al.	2012	OpenCL for FPGAs: Prototyping a compiler
US7100157B2 (en)	2006-08-29	Methods and apparatus to avoid dynamic micro-architectural penalties in an in-order processor
EP2951682B1 (en)	2018-08-22	Hardware and software solutions to divergent branches in a parallel pipeline
US9841957B2 (en)	2017-12-12	Apparatus and method for handling registers in pipeline processing
Wang et al.	2017	CGPredict: Embedded GPU performance estimation from single-threaded applications
US20060015855A1 (en)	2006-01-19	Systems and methods for replacing NOP instructions in a first program with instructions of a second program
JP5146451B2 (en)	2013-02-20	Method and apparatus for synchronizing processors of a hardware emulation system
GB2394085A (en)	2004-04-14	Generating code for a configurable microprocessor
Rohde et al.	2020	Improving HLS generated accelerators through relaxed memory access scheduling
DeVries	1997	A vectorizing SUIF compiler: implementation and performance.
Skliarova et al.	2009	Recursion in reconfigurable computing: A survey of implementation approaches
JP7383390B2 (en)	2023-11-20	Information processing unit, information processing device, information processing method and program
CN115004150A (en)	2022-09-02	Method and apparatus for predicting and scheduling duplicate instructions in software pipelining loops
Zhao et al.	2007	Dependence-based code generation for a CELL processor
KR102631214B1 (en)	2024-01-31	Method and system for efficient data forwarding for accelerating large language model inference
Zarch et al.	2023	A Code Transformation to Improve the Efficiency of OpenCL Code on FPGA through Pipes
Prakash et al.	2013	Modelling communication overhead for accessing local memories in hardware accelerators
Lis	2000	Superscalar processors via automatic microarchitecture transformations
US11740906B2 (en)	2023-08-29	Methods and systems for nested stream prefetching for general purpose central processing units
Grudnitsky et al.	2016	Efficient partial online synthesis of special instructions for reconfigurable processors
Lin et al.	2007	Utilizing custom registers in application-specific instruction set processors for register spills elimination