A Java Compiler Front End

A compiler reads a source program and translates it into an executable set of instructions (e.g., Java byte codes, native machine instructions). The Bear Products International compiler is itself internally structured as a set of translations. The BPI Java front end uses a parser generated from an LR grammar by the ANTLR parser generator. The parser reads the Java source and outputs an Abstract Syntax Tree (AST). The tree structure is carefully chosen to make later passes over the tree simpler.

The LR grammar, embedded C++ actions and comments consist of about 4,000 lines. This grammar is based heavily on the Java grammar developed by Terence Parr, John Lilley, John Mitchell and Scott Stanchfield. This grammar is part of the ANTLR release. Although the grammar is relatively large, it does not report all syntactically incorrect Java programs. Following the parser pass, there is a pass that does additional syntactic checking. For a partial list of these checks, click here

Compiling a language into native code can be thought of as the process of throwing out information during translation until the correct instruction sequence is created. This process of information distillation happens in all compiler phase, including the front end. To make AST processing easier, the AST does not provide an exact mirror of the Java source. Syntactic elements like parentheses, which are needed in the language to demarcate a production like an argument list, can be discarded in the AST.

Although it is rarely discussed, a great deal of work goes into clearly representing a language in AST form. The AST trees generated for each language production must be built so that they still contain enough information so that elements of the production can be recognized by later passes which walk the trees. In the case of the BPI Java front end, this AST is created by the C++ code associated with the grammar productions.

The output of the Java parser produces an intermediate that represents declarations (e.g., interfaces, classes, methods, and class members). These declarations are entered into the symbol table. Processing local declarations requires first processing imported class packages and classes (java.lang is always imported by default, for example). A Java file source will contain either implicit or explicit import statements (java.lang.* is implicitly imported into all Java source files). These define types, including the base type Object, that are referenced by local objects defined in the source.

The Java front end resolves any objects that are not defined locally from the import paths. The front end searches for .class files for the undefined objects. If a .class file for the object is not found, the front end searches for the associated .java file and compiles it into a .class file. If a class file is found, the Java front end check to see if the time stamp on the class file is later than the time stamp on any associated source. If the source time stamp is newer, the class file must be recompiled. The requirement that the front end rebuild Java class files as necessary removes the need for a program like UNIX's make or Windows nmake. But it moves this burden into the front end.

When the class file for a object is obtained, the Java front end reads the symbol information stored in the Java class file (see Javad: a class file disassembler).

The abstract syntax tree is composed of non-terminal nodes, like operators and terminal nodes like identifiers and constants. When all external symbols are read, the local symbol can be resolved. This allows all AST nodes to be annotated with the associated symbol or type information.

After resolving all symbols and types the declarations are "pruned" from the AST. Then the Java front end makes a semantic checking pass over the AST. For example, the Java front end verifies that an assignment statement with a new expression on the right hand side has a reference variable on the left. If no semantic errors are found, the AST is now considered a representation of a correct Java program. The declarations are pruned from the decorated AST. This pruned AST is the output of the Java front end. This serves as the input to the tree to tree transformation pass.

Ian Kaplan, March 5, 2000 revised: April 11, 2000


back to Java page