Wednesday 20 October 2010

Turbo Pascal Internals

Do you still remember Borland Turbo Pascal? The most successful Pascal compiler ever? If the answer is yes then you probably remember that Turbo Pascal was indeed Turbo. A very fast compiler which generated quite fast executable code. It featured integrated development environment (IDE) where you could edit, run and debug your programs. Originally created as Blue Label Pascal by Anders Hejlsberg, it was licensed by Borland as Turbo Pascal in early 1980s. Last version for DOS was released in 1992 as Turbo Pascal 7.0. Borland also released similar compiler for Windows and later a brand new product Delphi--Rapid Application Development tool for Windows.

Have you ever wondered what makes Turbo Pascal a fast compiler? Well, if you are interested in Turbo Pascal internals here are described some basic units consisting this compiler. Turbo Pascal design is not conventional as taught in compiler design books. Its design is oriented toward speed. The parser is tightly connected with code generator. There is some low-level intermediate code, but most of the raw code is generated by the parser.


A great example of data structures and algorithms used in Turbo Pascal is TPC16. TPC16 is a Turbo Pascal compatible compiler. It generates unit and executable files compatible with Turbo Pascal 7.0 command line compiler. TPC16 is written in Turbo Pascal. It is consisted of many units described below.

Common Variables
Here are declared all global variables that are used by the compiler. Declarations are grouped into sections. Some sections declare variables which hold data for particular module which is compiled and these variables are saved (pushed) during processing of used units.

Reserved Words
This unit declares a symbol table holding all reserved words. It is separated from other units because the file includes also hash values for reserved words which are generated by a separate program.

Type Definitions
This unit defines data structures that define basic Pascal types (integer, real, extended, Boolean, char, string, file, set, array, object, etc). Unit also contains functions to test type compatibility, symbol table type storage and type processing.

I/O Utilities
Routines for reading and writing files are located in the I/O Utilities unit. This unit also contains procedures for error handling and error reporting.

Symbol Table Management
Symbol table management is one of the most important internal operations in every compiler. This unit contains procedures to create symbol table, insert identifiers into symbol table, search symbol table for some identifier, search unit, procedure or record scope for particular identifier, and all other operations with symbol tables.

Scanner
Here are located procedures that scan source file and generate a stream of tokens. Token is the smallest element of the source which is processed by the parser. Scanner processes source files, skips comments, processes compiler directives, and generates tokens.

Parser
This is the brain of the compiler. It reads tokens, checks language syntax and generates intermediate code. One of the reasons for fast compilation lies in parser. Since it generates code which is close to executable code, the next steps of compilation are pretty fast.

Statements
This unit processes Pascal statements (for, while, repeat, while, etc.), begin-end blocks and assembler blocks.

System Functions
This unit processes system functions like abs, arctan, chr, int, odd, ofs, etc. and generates code for them.

System procedures
This unit processes system procedures like exit, fillchar, writeln, inc, val, etc. and generates code for them.

Expressions
This is the biggest unit and contains functions and procedures to process expressions. Expression is anything that needs to be calculated--from a simple identifier to complex expression in multiple parentheses. There are many cases that need to be tested and processed. Procedures in this unit also generate the majority of the code.

Calculator
This unit contains procedures to process various calculations. Calculation is some operation with one or two expressions (addition, subtraction, multiplication, etc.)

Assembler
This is the inline assembler that processes instruction in an asm-end block.

Assembler Types
This unit declares data structures that are needed for inline assembler.

Code Generator
This unit processes intermediate code and generates executable code and reference data needed in linker. Code generator also performs code optimizations.

OMF Import
This unit imports OMF object files and processes OMF records.

Linker
This unit generates the final executable code and creates output files. Before the code is generated each referenced item is recursively processed and marked. Unmarked items are not executed and therefore not included in the final executable code.


There are many excellent books on compiler design and implementation. However, the best book on compiler design is the compiler itself. If you are interested in Turbo Pascal internals or just need a source code of some real compiler then you should examine the Turbo Pascal compiler source code - Turbo Pascal compiler written in Turbo Pascal. This source code shows all the beauty of the Pascal programming language and reveals all the tricks needed to build a fast and compact compiler for any language, not just Pascal.

No comments:

Post a Comment