Skip to content

Latest commit

 

History

History

Compiler

编译器 编译原理

一步步掌握 LLVM

定点数优化:性能成倍提升 llvm-mca代码分析 时钟 流水线

基于python写的 简单的编译器

spa-StaticProgramAnalysis程序静态分析-书籍

北京大学 件分析技术(Software Analysis)

模糊测试工具 基于llvm 在qume上仿真测试

使用自动代码生成技术TVM优化深度学习算子的一些思考

基于clang static analyzer的源码漏洞检测插件

编译器书籍 现代体系结构的优化编译器 高级编译器设计与实现

甲骨文公司编辑器Oracle Solaris Studio 12.4 Information Library (简体中文) c/cpp用户指南 数值计算指南 代码分析器 性能分析器 线程分析器

哈工大 编译原理

编译原理——词法分析器实现

编译技术 西电

LLVM_proj

TVM_proj

gcc五分钟系列 介绍 gcc 的基本用法

方舟编译器的Runtime参考实现-中科院软件研究所智能软件研究中心

PL/0 语言编译系统 HomePage

文言文編程語言

pycparser C代码解析器 AST树

方舟编译器开源代码

Shedskin 引擎系统 Python转 C/C++

oclint 静态代码检查

CodeChecker 静态代码检查

LLVM cmu 教案

收集一些如何使用 clang 库的例子

代码质量分析 圈复杂度

编译优化算法

初识 LLVM

利用LLVM,Clang制作自己的编译器 示例代码 强烈推荐

LLVM 每日谈

博客llvm-clang

在线编译器 汇编码

UFMG的DCC888课程 LLVM介绍 程序分析和优化

编译器

词法分析; 自顶向下的语法分析; 符号表 (symbol table); 基于堆栈 (stack-based) 的虚拟机; 代码生成; 数组和对象的实现.

预处理阶段 预处理器 (cpp) 根据以字符 # 开头的命令, 修改原始的 C 程序. 比如 hello.c 中第一行的 #include <stdio.h> 命令告诉预处理器读取系统头文件 stdio.h 的内容, 并把它直接插入到程序文本中. 结果就得到另一个 C 程序, 通常是以 .i 作为文件扩展名.

编译阶段 编译器 (cc1) 将文本文件 hello.i 翻译成文本文件 hello.s, 它包含一个汇编语言程序. 汇编语言程序中的每条语句都以一种标准的文本格式确切地描述了一条低级机器语言指令. 汇编语言是非常有用的, 因为它为不同高级语言的不同编译器提供了通用的输出语言. 例如, C 编译器和 Fortran 编译器产生的输出文件用的都是一样的汇编语言.

汇编阶段 接

插桩技术

插桩技术是指将额外的代码注入程序中以收集运行时的信息,可分为两种:

(1)源代码插桩[Source Code Instrumentation(SCI)]:额外代码注入到程序源代码中。

●静态二进制插桩[Static Binary Instrumentation(SBI)]:在程序执行前插入额外的代码和数据,生成一个永久改变的可执行文件。

(2)二进制插桩(Binary Instrumentation):额外代码注入到二进制可执行文件中。

●动态二进制插桩[Dynamic Binary Instrumentation(DBI)]:在程序运行时实时地插入额外代码和数据,对可执行文件没有任何永久改变。

DBI做些什么呢

(1)访问进程的内存

(2)在应用程序运行时覆盖一些功能

(3)从导入的类中调用函数

(4)在堆上查找对象实例并使用这些对象实例

(5)Hook,跟踪和拦截函数等等

GCC

编译选项

常用选项

  • 常用指令
  • 编译选项
    • -c Compile or assemble the source files, but do not link.
    • -S Stop after the stage of compilation proper; do not assemble.
    • -o file This applies regardless to whatever sort of output is being produced, whether it be an executable file, an object file, an assembler file or preprocessed C code.
  • 优化选项
    • -O == -O1
    • -O2已经为较激进优化
    • -O3更为激进
    • -Os和-O2有一定的差异性,主要表现在控制生成目标的size(为android默认优化选项)
    • -ffast-math
      • -ffast-math does a lot more than just break strict IEEE compliance. First of all, of course, it does break strict IEEE compliance, allowing e.g. the reordering of instructions to something which is mathematically the same (ideally) but not exactly the same in floating point. Second, it disables setting errno after single-instruction math functions, which means avoiding a write to a thread-local variable (this can make a 100% difference for those functions on some architectures). Third, it makes the assumption that all math is finite, which means that no checks for NaN (or zero) are made in place where they would have detrimental effects. It is simply assumed that this isn't going to happen. Fourth, it enables reciprocal approximations for division and reciprocal square root. Further, it disables signed zero (code assumes signed zero does not exist, even if the target supports it) and rounding math, which enables among other things constant folding at compile-time. Last, it generates code that assumes that no hardware interrupts can happen due to signalling/trapping math (that is, if these cannot be disabled on the target architecture and consequently do happen, they will not be handled).
      • it includes -fno-math-errno-funsafe-math-optimizations-ffinite-math-only-fno-rounding-math-fno-signaling-nans-fcx-limited-range and -fexcess-precision=fast
      • 参考:What does gcc's ffast-math actually do?
  • 特殊属性(attribute)

特殊选项

  • fomit-frame-point
    • fp record the history stack of outstanding calls. Most smaller functions don't need a frame-pointer
    • larger functions MAY need one. this option allows one extra register to be available for general-purpose use. In thumb it is R7
    • 查看:Trying to understand gcc option -fomit-frame-pointer
  • fvisibility(链接选项)
    • 将库中的symbol隐藏
      • -fvisibility=default|internal|hidden|protected
    • 可以设置所有符号全部隐藏,但暴露部分符号:
      • 暴露的符号: __attribute__ ((visibility ("default"))) and pass -fvisibility=hidden to the compiler
    • 也可以设置所有符号全部暴露,但部分隐藏:
      • 隐藏的符号: __attribute__ ((visibility ("hidden")))
    • 参考 how-to-hide-the-exported-symbols-name-within-a-shared-library
  • fpic
    • Generate position independent code
    • All objects in a shared library should be fpic or not fpic(keep the same) (GCC -fPIC option)
    • diff between fpic & fPIC
    • fpic and fPIC区别
      • Use -fPIC or -fpic to generate position independent code. Whether to use -fPIC or -fpic to generate position independent code is target-dependent. The -fPIC choice always works, but may produce larger code than -fpic (mnenomic to remember this is that PIC is in a larger case, so it may produce larger amounts of code). Using -fpic option usually generates smaller and faster code, but will have platform-dependent limitations, such as the number of globally visible symbols or the size of the code. The linker will tell you whether it fits when you create the shared library. When in doubt, I choose -fPIC, because it always works.
      • -fpic Generate position-independent code (PIC) suitable for use in a shared library, if supported for the target machine. Such code accesses all constant addresses through a global offset table (GOT). The dynamic loader resolves the GOT entries when the program starts (the dynamic loader is not part of GCC; it is part of the operating system). If the GOT size for the linked executable exceeds a machine-specific maximum size, you get an error message from the linker indicating that -fpic does not work; in that case, recompile with -fPIC instead. (These maximums are 8k on the SPARC, 28k on AArch64 and 32k on the m68k and RS/6000. The x86 has no such limit.) Position-independent code requires special support, and therefore works only on certain machines. For the x86, GCC supports PIC for System V but not for the Sun 386i. Code generated for the IBM RS/6000 is always position-independent.
  • -rpath
    • 链接选项,运行期生效
    • Add a directory to the runtime library search path. This is used when linking an ELF executable with shared objects. All -rpath arguments are concatenated and passed to the runtime linker, which uses them to locate shared objects at runtime.
  • -L
    • 链接选项,编译期链接生效
    • --library-path=searchdir
    • Add path searchdir to the list of paths that ld will search for archive libraries and ld control scripts.
    • So, -L tells ld where to look for libraries to link against when linking. You use this (for example) when you're building against libraries in your build tree, which will be put in the normal system library paths by make install. --rpath, on the other hand, stores that path inside the executable, so that the runtime dynamic linker can find the libraries. You use this when your libraries are outside the system library search path.
    • 参考:What's the difference between -rpath and -L?
  • -Wl
    • The -Wl,xxx option for gcc passes a comma-separated list of tokens as a space-separated list of arguments to the linker. So gcc -Wl,aaa,bbb,ccc eventually becomes a linker call ld aaa bbb ccc
    • 参考:I don't understand -Wl,-rpath -Wl,

GDB调试

  • 状态查看
    • 查看函数堆栈bt
    • 查看一次寄存器状态info registers
    • 打印变量
      • p your_variable
  • 控制执行
    • 单步执行(step into)step & s
    • 单条语句(step over)n
    • 单步汇编指令执行si & stepi
    • 继续运行c
    • 添加断点b xxx.c:line_num,比如b main.c:97
    • 运行r
  • 调试脚本
    •  ./gdbtest  -command=gdbtest.sh

transpiler

  • all python to cpp transpiler projects
  • cpython based方案分析
    • 难验证:基于PyObject以及PyObject衍生出来的系列function缺乏可读性,难以验证生成代码的正确性
    • 兼容性差: 基于PyObject类型的c代码风格,很难整合其他编译器功能,包括热点分析,热点函数抓取或替换,以及基于模型的代码块分类等。
    • 性能差: 任何对象,哪怕是简单序列操作都被PyObject化,转化的c代码本身的性能堪忧
  • shedskin
    • 框架部署分析
      • 执行环境:需要在嵌入式平台中加入libgc和libpcre3库(distributing binaries
      • libgc库,license: FSF(成分较为复杂,需要深入分析),依赖库:无
      • libpcre3库,license:BSD,可直接商用,依赖库:无

TVM

  • 图表示
    • NNVM
      • NNVM相当于深度学习领域的LLVM,是一个神经网络中比较高级的中间表示模块,通常称为计算图。前端你框架只需要将其计算表达成NNVM中间表示,之后NNVM则统一的对图做与具体硬件和框架无关的优化。包括内存分配,数据类型和形状的推导,算子融合等。
    • Relay
      • Relay解决了静态图和动态图的矛盾。是一种专用于自动微分编程领域的特定域语言。
    • 图优化方法
      • OpFusion:算子融合
      • FoldConstant:常量折叠
      • CombineParallelConv2D:结合并行的卷积与运算
      • FoldScaleAxis:折叠缩放轴
      • AlterOpLayout:改变算子排布
      • CanonicalizeOps:规范化算子
      • EliminateCommonSubexpr:消除公共子表达式
  • 算子优化
    • TVM低层次中间表达的特点
      • Halide & HalideIR
      • Auto-Tuning
      • loopy循环变换工具,多面体模型分析
      • python作为宿主语言