Skip to content

Commit

Permalink
Update README.md
Browse files Browse the repository at this point in the history
  • Loading branch information
rohanverma94 authored Apr 30, 2018
1 parent 85d865a commit 214aeeb
Showing 1 changed file with 3 additions and 3 deletions.
6 changes: 3 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -27,9 +27,9 @@ Now look at the very simple code:
}
```

If you compile this code under** arm-gcc-none-eabi cross-compiler**, the instruction pipeline would take care of machine code produced.
If you compile this code under **arm-gcc-none-eabi cross-compiler**, the instruction pipeline would take care of machine code produced.
And as per my knowledge, the gcc is always a smart-ass and will automatically do any kind of loop unrolling if required, but given that target is embedded device, so gcc cross-compiler would have no liberty to act like a smart-ass as it were to do normally.
So optimized version ( loop unrolling) on a** pipelined CPU( without SIMD):**
So optimized version ( loop unrolling) on a **pipelined CPU( without SIMD):**
```
//Optimization - without SIMD
for(uint32_t i = 0 ; i < 1024 ; i = i+4){
Expand All @@ -43,7 +43,7 @@ So optimized version ( loop unrolling) on a** pipelined CPU( without SIMD):**
```
The question in above code , **why the loop is unrolled by factor of 4 ?**
Answer is **CPU loves the power of duo or power of 2.**
The SIMD version of the above code uses the instruction __UADD16 (https://arm-software.github.io/CMSIS_5/Core/html/group__intrinsic__SIMD__gr.html#gaa1160f0cf76d6aa292fbad54a1aa6b74) , which performs two 16-bit unsigned integer additions in parallel. Remember that SIMD unit in CPU can’t perform 32-bit unsigned addition, because this is how the SIMD unit in ARM Cortex M4 is designed.
The SIMD version of the above code uses the instruction __U@ADD16 (https://arm-software.github.io/CMSIS_5/Core/html/group__intrinsic__SIMD__gr.html#ga9e2cc5117e79578a08b25f1e89022966) , which performs two 16-bit unsigned integer additions in parallel. Remember that SIMD unit in CPU can’t perform 32-bit unsigned addition, because this is how the SIMD unit in ARM Cortex M4 is designed.
**The SIMD code equivalent on Cortex M4 CPU:**

```
Expand Down

0 comments on commit 214aeeb

Please sign in to comment.