Skip to content

mrgabich/adCARM-casestudy

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

23 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

adCARM-casestudy

This is a fork from adCARM which targets performance modeling for RISC-V based systems.

This tool performs the micro-benchmarking necessary to constuct the Cache-Aware Roofline Model (CARM) for floating-point operations on Intel CPUs. It supports different instruction set extensions (AVX512, AVX, SSE and Scalar), different data precisions (double- and single-precision), different floating point instructions (fused multiply and add, addition, multiplication and division). The micro-benchmarks can be performed for any number of threads. The tool provides as output a vizualization of CARM, as well as the measurements obtained for the different memory levels and selected FP instruction.

Requirements

  • gcc (>= 4.9 for AVX512 tests and only tested with gcc 9.3)
  • python (only tested with python 3.8.8)
    • matplolib (only tested with 3.3.4)

How to use

The first step is to create a configuration file for the system to test under the config folder. This configuration file must include four fields:

  • identifier of the system
  • nominal frequency of the CPU (in Ghz)
  • L1 size per core (in KiB)
  • L2 size per core (in KiB)
  • Total L3 size (in KiB)

After the creating the configuration file, the tool can executed as:

python run.py --test <test> --inst <fp_inst> --num_ops <num_ops> --isa <isa> --precision <data_precision> --ld_st_ratio <ld_st_ratio> --dram_bytes <dram_bytes> is the size of the array used for the DRAM benchmark in KiB; --threads <num_threads> [--only_ld] [--interleaved] <path_config_file>

where

  • --test is the test to be performed (roofline, fp, mem);
  • --inst <fp_inst> is the floating point instruction (fma, add, mul, div);
  • --num_ops <num_ops> is the number of FP operations used for the FP benchmark;
  • --isa is the instruction set extension (avx512, avx, sse, scalar);
  • --precision <data_precision> is the precision of the data (dp, sp);
  • --ld_st_ratio <ld_st_ratio> is the number of loads per store involed in the memory benchmarks;
  • --dram_bytes <dram_bytes> is the size of the array used for the DRAM benchmark in KiB;
  • --threads <num_threads> is the number of threads used for the test;
  • [--only_ld] indicates that the memory benchmarks will just contain loads (<ld_st_ratio> is ignored);
  • [--interleaved] indicates if the cores belong to interleaved numa domains (e.g. core 0 -> node 0, core 1 -> node 1, core 2 -> node 0, etc). Used for thread binding;
  • <path_config_file> is the path for configuration file of the system.

A simple run can be executed with the command

python run.py <path_config_file>

which by default runs the micro-benchmarks necessary to obtain CARM data, for AVX512 instructions and double-precision variables. The FP instruction used is the FMA (32768 operations) and the memory benchmarks contain 2 loads per each store, with the DRAM test using an array with size 512MiB.

For additional information regarding the input arguments, run the command:

python run.py -h

Fixed issues

Fixed avx and avx 256 difference. Fixed fma sse operations.

In papers and reports, please refer to this tool as follows

A. Ilic, F. Pratas and L. Sousa, "Cache-aware Roofline model: Upgrading the loft," in IEEE Computer Architecture Letters, vol. 13, no. 1, pp. 21-24, 21 Jan.-June 2014, doi: 10.1109/L-CA.2013.6.

Diogo Marques, Aleksandar Ilic, Zakhar A. Matveev, and Leonel Sousa. "Application-driven cache-aware roofline model." Future Generation Computer Systems 107 (2020): 257-273.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • C 80.4%
  • Python 18.8%
  • Makefile 0.8%