GNU C Compiler Internals/GEM Framework 3 4

< GNU C Compiler Internals

Hooks

GEM framework is designed to facilitate development of compiler extensions. The idea of GEM is similar to the idea of Linux Security Modules (LSM), a project that defines hooks throughout Linux kernel that allow one to enforce a security policy.

GEM defines a number of hooks throughout GCC's source code. It is implemented as a patch to GCC. With GEM, a compiler extension is developed as a stand-alone program. It is compiled into a dynamically-linked module which is specified as the command line argument when GCC is invoked. GCC loads the module and calls its initialization function. The module then registers its hooks that are call-back functions in GCC.

In addition to the compiler hooks, GEM provides macros and functions that simplify extension development. In this chapter we will first introduce the hooks that GEM framework adds to GCC. Then we describe the typical issues in extension programming.

The project home page is at http://research.alexeysmirnov.name/gem

GEM adds several hooks throughout GCC source code. New hooks are added to GEM as necessary.

Take home: GEM hooks are defined mostly at the AST level. A few hooks are defined at the assembly level. The new hooks are added as necessary.

Traversing an AST

When the function's AST is constructed one can instrument it. GEM's gem_finish_function hook receives the AST of a function. The idea is to traverse the AST and instrument the AST nodes as necessary. Function walk_tree() takes the AST, the callback function, the optional data, NULL by default, and the walk_subtrees parameter, NULL by default. The callback function is called for each node of the AST before the operands are traversed. If the callback function modifies the walk_subtree() variable then the operands are not processed.

The following code demonstrates the idea:

  static tree walk_tree_callback(tree *tp, int *walk_subtrees, void *data) {
    tree t=*tp;
    enum tree_code code = TREE_CODE(t);
    switch (code) {
    case CALL_EXPR:
      instrument_call_expr(t);
      break;
    case MODIFY_EXPR:
      instrument_modify_expr(t);
      break;
    }
  }
  walk_tree(&t_body, walk_tree_callback, NULL, NULL);
Take home: Function walk_tree() traverses an AST applying user-defined callback function to each tree node.

Instrumenting an AST

In this section we describe functions that create new tree nodes and how to add the new nodes to an AST.

The walk_tree callback function can instrument the AST. Functions build1() and build() construct new tree nodes. The former function takes one operand, the latter one takes more then one operand. The following code computes the address of the operand, same as '&' C operator:

  t = build1(ADDR_EXPR, TREE_TYPE(t), t);

The following example refers to an array element arr[0]:

  t = build(ARRAY_REF, integer_type_node, arr, integer_zero_node);

The following example builds an integer constant:

  t = build_int_2(value, 0);

Building string constant is more difficult. The following example demonstrates the idea:

  tree gem_build_string_literal(int len, const char *str) {
     tree t, elem, index, type;
     t = build_string (len, str);
     elem = build_type_variant (char_type_node, 1, 0);
     index = build_index_type (build_int_2(len-1, 0));
     type = build_array_type (elem, index);
     T_T(t) = type;
     TREE_READONLY(t)=1;
     TREE_STATIC(t)=1;
     TREE_CONSTANT(t)=1;
     type=build_pointer_type (type);
     t = build1 (ADDR_EXPR, type, t);
     t = build1 (NOP_EXPR, build_pointer_type(char_type_node), t);
     return t;
  }

To build a function call one needs to build the list of arguments. Then the CALL_EXPR is constructed:

  t_arg1 = build_tree_list(NULL_TREE, arg1);
  t_arg2 = build_tree_list(NULL_TREE, arg2);
  ...
  TREE_CHAIN(t_arg1)=t_arg2;
  ...
  TREE_CHAIN(t_argn)=NULL_TREE;
  t_call = build_function_call(t_func_decl, t_arg1);

The constructed tree node is added to the AST as a statement or as an expression. A statement is added to the linked list of statements using TREE_CHAIN:

  t_stmt=build_stmt (EXPR_STMT, t_call);
  TREE_CHAIN(t_cur)=t_stmt;

Adding a tree node as an expression is the same as using '()' C operator. The new expression is added as the first argument of the operator. The result of the operator is the result of its second argument:

  t_res = build(COMPOUND_EXPR, TREE_TYPE(t), t_call, t);

If you want to add the call after t then you need to build a compound statement. This is equivalent to using curly brackets in C:

  static tree gen_start_scope() {
    t_hdr = build_stmt (COMPOUND_STMT, NULL_TREE);
    ss = build_stmt (SCOPE_STMT, NULL_TREE);
    SCOPE_BEGIN_P(ss)=1;
    SCOPE_PARTIAL_P(ss)=0;
    TREE_CHAIN(t_hdr)=NULL_TREE;
    TREE_OPERAND(t_hdr, 0)=ss;
    return t_hdr;
  }
  static tree gen_end_scope() {
    ss = build_stmt (SCOPE_STMT, NULL_TREE);
    SCOPE_BEGIN_P(ss)=0;
    SCOPE_PARTIAL_P(ss)=0;
    return ss;
  }
  static tree scope_stmt(tree t) {
    t_res=gen_start_scope();
    TREE_CHAIN(TREE_OPERAND(t_res, 0)) = t;
    while (TREE_CHAIN(t)) t = TREE_CHAIN(t);
    TREE_CHAIN(t)=gen_end_scope();
    return t_res;
  }
Take home: Use GCC's functions to build new nodes and add them to the AST, either as a statement or as an expression.

When to Instrument

In this section we will describe when each of GEM hooks is used.

Function Prolog/Epilog

The assembly instructions are written to the assembly file:

  #define OUTPUT_ASM_INST(inst) \
    p=inst;                     \
    putc('\t', asm_out_file);   \
    while (*p++) putc(p, asm_out_file);  \
    putc('\n', asm_out_file);   
  OUTPUT_ASM_INST("pushl %%eax");
  OUTPUT_ASM_INST("popl %%eax");
Take home: Assembly instructions are added to function prolog and epilog using hooks gem_output_asm_insn and gem_final_start_function.
This article is issued from Wikibooks. The text is licensed under Creative Commons - Attribution - Sharealike. Additional terms may apply for the media files.