Write an application to implement a Matrix Multiplication.
Write a version simply using C with no optimizations to use it as a baseline.
Apply the concepts learnt in the previous exercise to optimze the code (loop unrolling, SIMD, etc) and compare performance.
Analyze the source of stalls and find a solution for those (if any).