Symbol Table Management

last updated 5/11/07

 

Central part of compiler

The symbol table is accessed by most phases of a compiler, beginning with the lexical analysis to optimization.

The symbol table carries the collected information about each named object in the program to other phases of the compiler. As soon as a named token is found, depending on the token type, a call to the symbol table routines must be made.

Tokens:

Keywords of the language, special characters/punctuation, user defined identifiers used for different meanings, constants

Identifiers:

Identifiers are user defined

Need a global structure to store identifiers and their associated meanings (attributes) for future reference. This structure provides more information that the grammar and syntax directed translation of the language would not normally be able to carry.

Reserved words

These may be stored here too.  You have to look up an identifier and reserved words have the same syntax as an identifier.

Scoping rules

The symbol table must be able to handle the scoping rules of the language.

 


Symbol Table Requirements

fast lookup. The parser of a compiler is linear in speed. The table lookup needs to be as fast as possible.

flexible in structure. The entries must contain all necessary information, depending on usage of identifier

efficient use of space. This will require runtime allocation (dynamic) of the space.

handle characteristics of language (e.g., scoping, implicit declaration)

 

 

 

 


Program components

Declarative statements- define identifiers and their attributes

Imperative statements - uses identifiers assuming their attributes

Languages and their explicit declarative statements

Pascal, Ada VAR statement
C, C++, Java, PL/I statements beginning with a type
FORTRAN TYPE, DIMENSION statement
COBOL DATA DIVISION



Implicit Declarations

Some languages have implicit declarations

FORTRAN or PL/I SUM = SUM + I (first encounter assumes declaration with default attributes)

APL, BASIC, LISP (most interpreted languages)


Actions on declarations

Declarative statements generally do not translate to (or directly associated with) any executable code. They may allocate space at designated times, however.

Symbols can be monitored, especially in languages that require declaration statements
FIRST ENCOUNTER
SUBSEQUENT
ENCOUNTER
CONCLUSION
Declaration
Reference (in imperative stmt)
Continue--this is the expected case
Declaration
Declaration
Multiple declarations error unless appropriate within scoping rules
Declaration
None
Unused warning message
Reference (in imperative statement)
NA
Undefined error

 

 

 

 

 


String Management

How to efficiently store names of identifiers

1. Restrict length of identifier

2. Separate String Space

 


3. Dynamic allocation of strings

Advantages: don't need predefined space for strings

Disadvantages: complexity

Extend to other parts of the symbol table

 


Name Searching

Functions:

Declaration of name - enter new name into table
- return error if already there
- add new attributes as found
Use - expect name to be found
- return position of entry in table
Entry of new scope - allow new declarations or redefined declarations
Exit of scope - delete entries

Goals:

  1. efficiency of declaration entry, insertion of new name
  2. efficiency of retrieval, lookup
  3. sorted list of symbol table dump

Most important would be #2


Approaches:

n is the number of entries in the symbol table for the analysis below

n' is the number of entries in the current [local] block

1. Linear access

 

 


2. Binary search access

 

 

 

 


3. Tree access

 


4. Hash tables

 

 

 


Single pass vs multipass compiler: 

single pass can discard the entries while multipass will need to retain temporarily inaccessible entries