Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

JavaScript抽象语法树AST #23

Open
yacan8 opened this issue May 13, 2020 · 0 comments
Open

JavaScript抽象语法树AST #23

yacan8 opened this issue May 13, 2020 · 0 comments

Comments

@yacan8
Copy link
Owner

yacan8 commented May 13, 2020

前言

Babel为当前最流行的代码JavaScript编译器了,其使用的JavaScript解析器为babel-parser,最初是从Acorn 项目fork出来的。Acorn 非常快,易于使用,并且针对非标准特性(以及那些未来的标准特性) 设计了一个基于插件的架构。本文主要介绍esprima解析生成的抽象语法树节点,esprima的实现也是基于Acorn的。

解析器 Parser

JavaScript Parser 是把js源码转化为抽象语法树(AST)的解析器。这个步骤分为两个阶段:词法分析(Lexical Analysis)语法分析(Syntactic Analysis)

常用的JavaScript Parser:

词法分析

词法分析阶段把字符串形式的代码转换为 令牌(tokens)流。你可以把令牌看作是一个扁平的语法片段数组。

n * n;

例如上面n*n的词法分析得到结果如下:

[
  { type: { ... }, value: "n", start: 0, end: 1, loc: { ... } },
  { type: { ... }, value: "*", start: 2, end: 3, loc: { ... } },
  { type: { ... }, value: "n", start: 4, end: 5, loc: { ... } },
]

每一个 type 有一组属性来描述该令牌:

{
  type: {
    label: 'name',
    keyword: undefined,
    beforeExpr: false,
    startsExpr: true,
    rightAssociative: false,
    isLoop: false,
    isAssign: false,
    prefix: false,
    postfix: false,
    binop: null,
    updateContext: null
  },
  ...
}

和 AST 节点一样它们也有 start,end,loc 属性。

语法分析

语法分析就是根据词法分析的结果,也就是令牌tokens,将其转换成AST。

function square(n) {
  return n * n;
}

如上面代码,生成的AST结构如下:

{
  type: "FunctionDeclaration",
  id: {
    type: "Identifier",
    name: "square"
  },
  params: [{
    type: "Identifier",
    name: "n"
  }],
  body: {
    type: "BlockStatement",
    body: [{
      type: "ReturnStatement",
      argument: {
        type: "BinaryExpression",
        operator: "*",
        left: {
          type: "Identifier",
          name: "n"
        },
        right: {
          type: "Identifier",
          name: "n"
        }
      }
    }]
  }
}

下文将对AST各个类型节点做解释。更多AST生成,入口如下:

结合可视化工具,举个例子

如下代码:

var a = 42;
var b = 5;
function addA(d) {
    return a + d;
}
var c = addA(2) + b;

第一步词法分析之后长成如下图所示:

词法分析

语法分析,生产抽象语法树,生成的抽象语法树如下图所示

语法分析

Base

Node

所有节点类型都实现以下接口:

interface Node {
  type: string;
  range?: [number, number];
  loc?: SourceLocation;
}

该type字段是表示AST变体类型的字符串。该loc字段表示节点的源位置信息。如果解析器没有生成有关节点源位置的信息,则该字段为null;否则它是一个对象,包括一个起始位置(被解析的源区域的第一个字符的位置)和一个结束位置.

interface SourceLocation {
    start: Position;
    end: Position;
    source?: string | null;
}

每个Position对象由一个line数字(1索引)和一个column数字(0索引)组成:

interface Position {
    line: uint32 >= 1;
    column: uint32 >= 0;
}

Programs

interface Program <: Node {
    type: "Program";
    sourceType: 'script' | 'module';
    body: StatementListItem[] | ModuleItem[];
}

表示一个完整的源代码树。

Scripts and Modules

源代码数的来源包括两种,一种是script脚本,一种是modules模块

当为script时,body为StatementListItem
当为modules时,body为ModuleItem

类型StatementListItemModuleItem类型如下。

type StatementListItem = Declaration | Statement;
type ModuleItem = ImportDeclaration | ExportDeclaration | StatementListItem;

ImportDeclaration

import语法,导入模块

type ImportDeclaration {
    type: 'ImportDeclaration';
    specifiers: ImportSpecifier[];
    source: Literal;
}

ImportSpecifier类型如下:

interface ImportSpecifier {
    type: 'ImportSpecifier' | 'ImportDefaultSpecifier' | 'ImportNamespaceSpecifier';
    local: Identifier;
    imported?: Identifier;
}

ImportSpecifier语法如下:

import { foo } from './foo';

ImportDefaultSpecifier语法如下:

import foo from './foo';

ImportNamespaceSpecifier语法如下

import * as foo from './foo';

ExportDeclaration

export类型如下

type ExportDeclaration = ExportAllDeclaration | ExportDefaultDeclaration | ExportNamedDeclaration;

ExportAllDeclaration从指定模块中导出

interface ExportAllDeclaration {
    type: 'ExportAllDeclaration';
    source: Literal;
}

语法如下:

export * from './foo';

ExportDefaultDeclaration导出默认模块

interface ExportDefaultDeclaration {
    type: 'ExportDefaultDeclaration';
    declaration: Identifier | BindingPattern | ClassDeclaration | Expression | FunctionDeclaration;
}

语法如下:

export default 'foo';

ExportNamedDeclaration导出部分模块

interface ExportNamedDeclaration {
    type: 'ExportNamedDeclaration';
    declaration: ClassDeclaration | FunctionDeclaration | VariableDeclaration;
    specifiers: ExportSpecifier[];
    source: Literal;
}

语法如下:

export const foo = 'foo';

Declarations and Statements

declaration,即声明,类型如下:

type Declaration = VariableDeclaration | FunctionDeclaration | ClassDeclaration;

statements,即语句,类型如下:

type Statement = BlockStatement | BreakStatement | ContinueStatement |
    DebuggerStatement | DoWhileStatement | EmptyStatement |
    ExpressionStatement | ForStatement | ForInStatement |
    ForOfStatement | FunctionDeclaration | IfStatement |
    LabeledStatement | ReturnStatement | SwitchStatement |
    ThrowStatement | TryStatement | VariableDeclaration |
    WhileStatement | WithStatement;

VariableDeclarator

变量声明,kind 属性表示是什么类型的声明,因为 ES6 引入了 const/let。

interface VariableDeclaration <: Declaration {
    type: "VariableDeclaration";
    declarations: [ VariableDeclarator ];
    kind: "var" | "let" | "const";
}

FunctionDeclaration

函数声明(非函数表达式)

interface FunctionDeclaration {
    type: 'FunctionDeclaration';
    id: Identifier | null;
    params: FunctionParameter[];
    body: BlockStatement;
    generator: boolean;
    async: boolean;
    expression: false;
}

例如:

function foo() {}

function *bar() { yield "44"; }

async function noop() { await new Promise(function(resolve, reject) { resolve('55'); }) }

ClassDeclaration

类声明(非类表达式)

interface ClassDeclaration {
    type: 'ClassDeclaration';
    id: Identifier | null;
    superClass: Identifier | null;
    body: ClassBody;
}

ClassBody声明如下:

interface ClassBody {
    type: 'ClassBody';
    body: MethodDefinition[];
}

MethodDefinition表示方法声明;

interface MethodDefinition {
    type: 'MethodDefinition';
    key: Expression | null;
    computed: boolean;
    value: FunctionExpression | null;
    kind: 'method' | 'constructor';
    static: boolean;
}
class foo {
    constructor() {}
    method() {}
};

ContinueStatement

continue语句

interface ContinueStatement {
    type: 'ContinueStatement';
    label: Identifier | null;
}

例如:

for (var i = 0; i < 10; i++) {
    if (i === 0) {
        continue;
    }
}

DebuggerStatement

debugger语句

interface DebuggerStatement {
    type: 'DebuggerStatement';
}

例如

while(true) {
    debugger;
}

DoWhileStatement

do-while语句

interface DoWhileStatement {
    type: 'DoWhileStatement';
    body: Statement;
    test: Expression;
}

test表示while条件

例如:

var i = 0;
do {
    i++;
} while(i = 2)

EmptyStatement

空语句

interface EmptyStatement {
    type: 'EmptyStatement';
}

例如:

if(true);

var a = [];
for(i = 0; i < a.length; a[i++] = 0);

ExpressionStatement

表达式语句,即,由单个表达式组成的语句。

interface ExpressionStatement {
    type: 'ExpressionStatement';
    expression: Expression;
    directive?: string;
}

当表达式语句表示一个指令(例如“use strict”)时,directive属性将包含该指令字符串。

例如:

(function(){});

ForStatement

for语句

interface ForStatement {
    type: 'ForStatement';
    init: Expression | VariableDeclaration | null;
    test: Expression | null;
    update: Expression | null;
    body: Statement;
}

ForInStatement

for...in语句

interface ForInStatement {
    type: 'ForInStatement';
    left: Expression;
    right: Expression;
    body: Statement;
    each: false;
}

ForOfStatement

for...of语句

interface ForOfStatement {
    type: 'ForOfStatement';
    left: Expression;
    right: Expression;
    body: Statement;
}

IfStatement

if 语句

interface IfStatement {
    type: 'IfStatement';
    test: Expression;
    consequent: Statement;
    alternate?: Statement;
}

consequent表示if命中后内容,alternate表示else或者else if的内容。

LabeledStatement

label语句,多用于精确的使用嵌套循环中的continue和break。

interface LabeledStatement {
    type: 'LabeledStatement';
    label: Identifier;
    body: Statement;
}

如:

var num = 0;
outPoint:
for (var i = 0 ; i < 10 ; i++){
        for (var j = 0 ; j < 10 ; j++){
            if( i == 5 && j == 5 ){
                break outPoint;
            }
            num++;
        }
}

ReturnStatement

return 语句

interface ReturnStatement {
    type: 'ReturnStatement';
    argument: Expression | null;
}

SwitchStatement

Switch语句

interface SwitchStatement {
    type: 'SwitchStatement';
    discriminant: Expression;
    cases: SwitchCase[];
}

discriminant表示switch的变量。

SwitchCase类型如下

interface SwitchCase {
    type: 'SwitchCase';
    test: Expression | null;
    consequent: Statement[];
}

ThrowStatement

throw语句

interface ThrowStatement {
    type: 'ThrowStatement';
    argument: Expression;
}

TryStatement

try...catch语句

interface TryStatement {
    type: 'TryStatement';
    block: BlockStatement;
    handler: CatchClause | null;
    finalizer: BlockStatement | null;
}

handler为catch处理声明内容,finalizer为finally内容。

CatchClaus 类型如下

interface CatchClause {
    type: 'CatchClause';
    param: Identifier | BindingPattern;
    body: BlockStatement;
}

例如:

try {
    foo();
} catch (e) {
    console.erroe(e);
} finally {
    bar();
}

WhileStatement

while语句

interface WhileStatement {
    type: 'WhileStatement';
    test: Expression;
    body: Statement;
}

test为判定表达式

WithStatement

with语句(指定块语句的作用域)

interface WithStatement {
    type: 'WithStatement';
    object: Expression;
    body: Statement;
}

如:

var a = {};

with(a) {
    name = 'xiao.ming';
}

console.log(a); // {name: 'xiao.ming'}

Expressions and Patterns

Expressions可用类型如下:

type Expression = ThisExpression | Identifier | Literal |
    ArrayExpression | ObjectExpression | FunctionExpression | ArrowFunctionExpression | ClassExpression |
    TaggedTemplateExpression | MemberExpression | Super | MetaProperty |
    NewExpression | CallExpression | UpdateExpression | AwaitExpression | UnaryExpression |
    BinaryExpression | LogicalExpression | ConditionalExpression |
    YieldExpression | AssignmentExpression | SequenceExpression;

Patterns可用有两种类型,函数模式和对象模式如下:

type BindingPattern = ArrayPattern | ObjectPattern;

ThisExpression

this 表达式

interface ThisExpression {
    type: 'ThisExpression';
}

Identifier

标识符,就是我们写 JS 时自定义的名称,如变量名,函数名,属性名,都归为标识符。相应的接口是这样的:

interface Identifier {
    type: 'Identifier';
    name: string;
}

Literal

字面量,这里不是指 [] 或者 {} 这些,而是本身语义就代表了一个值的字面量,如 1,“hello”, true 这些,还有正则表达式(有一个扩展的 Node 来表示正则表达式),如 /\d?/。

interface Literal {
    type: 'Literal';
    value: boolean | number | string | RegExp | null;
    raw: string;
    regex?: { pattern: string, flags: string };
}

例如:

var a = 1;
var b = 'b';
var c = false;
var d = /\d/;

ArrayExpression

数组表达式

interface ArrayExpression {
    type: 'ArrayExpression';
    elements: ArrayExpressionElement[];
}

例:

[1, 2, 3, 4];

ArrayExpressionElement

数组表达式的节点,类型如下

type ArrayExpressionElement = Expression | SpreadElement;

Expression包含所有表达式,SpreadElement为扩展运算符语法。

SpreadElement

扩展运算符

interface SpreadElement {
    type: 'SpreadElement';
    argument: Expression;
}

如:

var a = [3, 4];
var b = [1, 2, ...a];

var c = {foo: 1};
var b = {bar: 2, ...c};

ObjectExpression

对象表达式

interface ObjectExpression {
    type: 'ObjectExpression';
    properties: Property[];
}

Property代表为对象的属性描述

类型如下

interface Property {
    type: 'Property';
    key: Expression;
    computed: boolean;
    value: Expression | null;
    kind: 'get' | 'set' | 'init';
    method: false;
    shorthand: boolean;
}

kind用来表示是普通的初始化,或者是 get/set。

例如:

var obj = {
    foo: 'foo',
    bar: function() {},
    noop() {}, // method 为 true
    ['computed']: 'computed'  // computed 为 true
}

FunctionExpression

函数表达式

interface FunctionExpression {
    type: 'FunctionExpression';
    id: Identifier | null;
    params: FunctionParameter[];
    body: BlockStatement;
    generator: boolean;
    async: boolean;
    expression: boolean;
}

例如:

var foo = function () {}

ArrowFunctionExpression

箭头函数表达式

interface ArrowFunctionExpression {
    type: 'ArrowFunctionExpression';
    id: Identifier | null;
    params: FunctionParameter[];
    body: BlockStatement | Expression;
    generator: boolean;
    async: boolean;
    expression: false;
}

generator表示是否为generator函数,async表示是否为async/await函数,params为参数定义。

FunctionParameter类型如下

type FunctionParameter = AssignmentPattern | Identifier | BindingPattern;

例:

var foo = () => {};

ClassExpression

类表达式

interface ClassExpression {
    type: 'ClassExpression';
    id: Identifier | null;
    superClass: Identifier | null;
    body: ClassBody;
}

例如:

var foo = class {
    constructor() {}
    method() {}
};

TaggedTemplateExpression

标记模板文字函数

interface TaggedTemplateExpression {
    type: 'TaggedTemplateExpression';
    readonly tag: Expression;
    readonly quasi: TemplateLiteral;
}

TemplateLiteral类型如下

interface TemplateLiteral {
    type: 'TemplateLiteral';
    quasis: TemplateElement[];
    expressions: Expression[];
}

TemplateElement类型如下

interface TemplateElement {
    type: 'TemplateElement';
    value: { cooked: string; raw: string };
    tail: boolean;
}

例如

var foo = function(a){ console.log(a); }
foo`test`;

MemberExpression

属性成员表达式

interface MemberExpression {
    type: 'MemberExpression';
    computed: boolean;
    object: Expression;
    property: Expression;
}

例如:

const foo = {bar: 'bar'};
foo.bar;
foo['bar']; // computed 为 true

Super

父类关键字

interface Super {
    type: 'Super';
}

例如:

class foo {};
class bar extends foo {
    constructor() {
        super();
    }
}

MetaProperty

(这个不知道干嘛用的)

interface MetaProperty {
    type: 'MetaProperty';
    meta: Identifier;
    property: Identifier;
}

例如:

new.target  // 通过new 声明的对象,new.target会存在

import.meta

CallExpression

函数执行表达式

interface CallExpression {
    type: 'CallExpression';
    callee: Expression | Import;
    arguments: ArgumentListElement[];
}

Import类型,没搞懂。

interface Import {
    type: 'Import'
}

ArgumentListElement类型

type ArgumentListElement = Expression | SpreadElement;

如:

var foo = function (){};
foo();

NewExpression

new 表达式

interface NewExpression {
    type: 'NewExpression';
    callee: Expression;
    arguments: ArgumentListElement[];
}

UpdateExpression

更新操作符表达式,如++--;

interface UpdateExpression {
  type: "UpdateExpression";
  operator: '++' | '--';
  argument: Expression;
  prefix: boolean;
}

如:

var i = 0;
i++;
++i; // prefix为true

AwaitExpression

await表达式,会与async连用。

interface AwaitExpression {
    type: 'AwaitExpression';
    argument: Expression;
}

async function foo() {
    var bar = function() {
        new Primise(function(resolve, reject) {
            setTimeout(function() {
                resove('foo')
            }, 1000);
        });
    }
    return await bar();
}

foo() // foo

UnaryExpression

一元操作符表达式

interface UnaryExpression {
  type: "UnaryExpression";
  operator: UnaryOperator;
  prefix: boolean;
  argument: Expression;
}

枚举UnaryOperator

enum UnaryOperator {
  "-" | "+" | "!" | "~" | "typeof" | "void" | "delete" | "throw"
}

BinaryExpression

二元操作符表达式

interface BinaryExpression {
    type: 'BinaryExpression';
    operator: BinaryOperator;
    left: Expression;
    right: Expression;
}

枚举BinaryOperator

enum BinaryOperator {
  "==" | "!=" | "===" | "!=="
     | "<" | "<=" | ">" | ">="
     | "<<" | ">>" | ">>>"
     | "+" | "-" | "*" | "/" | "%"
     | "**" | "|" | "^" | "&" | "in"
     | "instanceof"
     | "|>"
}

LogicalExpression

逻辑运算符表达式

interface LogicalExpression {
    type: 'LogicalExpression';
    operator: '||' | '&&';
    left: Expression;
    right: Expression;
}

如:

var a = '-';
var b = a || '-';

if (a && b) {}

ConditionalExpression

条件运算符

interface ConditionalExpression {
    type: 'ConditionalExpression';
    test: Expression;
    consequent: Expression;
    alternate: Expression;
}

例如:

var a = true;
var b = a ? 'consequent' : 'alternate';

YieldExpression

yield表达式

interface YieldExpression {
    type: 'YieldExpression';
    argument: Expression | null;
    delegate: boolean;
}

例如:

function* gen(x) {
  var y = yield x + 2;
  return y;
}

AssignmentExpression

赋值表达式。

interface AssignmentExpression {
    type: 'AssignmentExpression';
    operator: '=' | '*=' | '**=' | '/=' | '%=' | '+=' | '-=' |
        '<<=' | '>>=' | '>>>=' | '&=' | '^=' | '|=';
    left: Expression;
    right: Expression;
}

operator属性表示一个赋值运算符,leftright是赋值运算符左右的表达式。

SequenceExpression

序列表达式(使用逗号)。

interface SequenceExpression {
    type: 'SequenceExpression';
    expressions: Expression[];
}
var a, b;
a = 1, b = 2

ArrayPattern

数组解析模式

interface ArrayPattern {
    type: 'ArrayPattern';
    elements: ArrayPatternElement[];
}

例:

const [a, b] = [1,3];

elements代表数组节点

ArrayPatternElement如下

type ArrayPatternElement = AssignmentPattern | Identifier | BindingPattern | RestElement | null;

AssignmentPattern

默认赋值模式,数组解析、对象解析、函数参数默认值使用。

interface AssignmentPattern {
    type: 'AssignmentPattern';
    left: Identifier | BindingPattern;
    right: Expression;
}

例:

const [a, b = 4] = [1,3];

RestElement

剩余参数模式,语法与扩展运算符相近。

interface RestElement {
    type: 'RestElement';
    argument: Identifier | BindingPattern;
}

例:

const [a, b, ...c] = [1, 2, 3, 4];

ObjectPatterns

对象解析模式

interface ObjectPattern {
    type: 'ObjectPattern';
    properties: Property[];
}

例:

const object = {a: 1, b: 2};
const { a, b } = object;

结束

AST的作用大致分为几类

  1. IDE使用,如代码风格检测(eslint等)、代码的格式化,代码高亮,代码错误等等

  2. 代码的混淆压缩

  3. 转换代码的工具。如webpack,rollup,各种代码规范之间的转换,ts,jsx等转换为原生js

了解AST,最终还是为了让我们了解我们使用的工具,当然也让我们更了解JavaScript,更靠近JavaScript。

参考文献

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant