Everyone and their pet cats know about C, but relatively few people realize that C was indirectly derived from a language called BCPL, which stands for "Basic CPL"; CPL in turn stands for "Combined Programming Language".
I first encountered BCPL when I was at Cambridge University, the home of the language and its designer, Martin Richards. BCPL was the systems programming language of choice at Cambridge, because it generated code better than the IBM Fortran compilers while not requiring the agony of Assembler. In turn, Computer Science students were taught it as their first "serious" language (interestingly enough, not by Martin Richards). For many (myself not included) it was their first introduction to anything more advanced than Fortran 66.
After graduation, I went to work for a local manufacturer of Z80-based systems that did almost all of its systems programming in BCPL. Only after the move to Unix did we in turn move to C. As a result, I find myself with several years of practical BCPL programming experience.
When the topic came up once again on the net, I wrote this summary of the language for some readers who already know C. It is in no sense a comparison or a historical survey but simply a high-speed description. Feedback on this description is welcome; I have no doubt that there are parts which are still unduly terse and would benefit from expansion, but I'd like to know which !
BCPL implementations are available from Martin Richards, or from the old BCPL distribution.
The memory of a BCPL program is divided into cells. All cells have the same size. A cell stores a single value which can be treated as any of:
Every cell has an address. Under certain circumstances, two or more cells are required to be adjacent in memory. In this case, their addresses, if treated as integers, are consecutive.
In addition, a group of adjacent cells may be treated as a character
buffer. There are at least
for every cell (though possibly more).
Cells have a storage duration, which can be static, dynamic (associated with a scope), or heap.
Integer constants are decimal; prefix
# means octal,
FALSE are reserved words.
? is a constant with an indeterminate value (which
may be different each time it is evaluated); any constant expression
containing an evaluated
? is itself a
Character constants are of the form
'a'; there are
no multi-character constants.
String constants are of
"abc"; they contain between
0 and 255 characters.
Escape sequences in both string and character constants are
(representing the second character) and
*B (backspace), and
In addition, a string can be extended over more than one line; the
sequence *<white space>* is ignored within a string.
A string constant is replaced by the address of a large enough number of contiguous cells initialized so that:
"abc" % 0 = 3 "abc" % 1 = 'a' "abc" % 2 = 'b' "abc" % 3 = 'c'The cells have static storage duration.
A constant expression may use any operator, any constants other than string constants, and any identifier declared as a manifest.
Floating-point constants are of the form
! (dyadic) % :: highest precedence @ ! (monadic) * / REM #* #/ + - #+ #- (dyadic and monadic) = ~= < <= > >= #= #~= #< #<= #> #>= << >> ~ & | EQV NEQV -> TABLE VALOF lowest precedence SLCT precedence not described in R&W-S
Operators associate left-to-right except for -> and TABLE.
Except where stated, an identifier on the left of an assignment means that the result of the right hand side is to be stored in the cell named by that identifier, while an identifier in any other location represents the content of the cell it names. Statements like "a is an address" or "the integer a" mean "the value of the expression a, now taken as an address" or "the value of the expression a, now taken as an integer".
BITSPERBYTEbits of the result of the right side. In all other contexts, the value is that of the bth character padded to the left with zeroes (that is, characters are unsigned).
@may only be used in this way or in the following two special ways).
*b and a
#*b and a
<<b and a
OFb selects a bitfield from within
!b to the bottom x bits of the right hand side. The bits set are bits y to y+x-1, where bit 0 is the rightmost bit; if x is 0, they are all bits except the rightmost y.
OFb is equivalent to
TABLEk1,k2,...kn, where all the ki are constant expressions,
RESULTIScommand is reached, that controls the value of the expression. Otherwise the value is unspecified.
Logical operations are evaluated according to one of two rules. If the result of the operator is in "truth context", then "truth rules" are used. Otherwise the operator takes bit-patterns as argument(s) and operates on each bit separately. Truth context is:
FALSE. For dyadic operators, the left operand is evaluated. If and only if this does not determine the result, then the right operand is evaluated.
FALSE have values such that logical
operations not using truth rules evaluate correctly to
e1 () e1 (e2, e3, ..., ex)and associates from left to right. e1 to ex are all evaluated, and e1 must result in a procedure designator; the procedure is called with the appropriate arguments. If the procedure is a function, the value of the call is that of the evaluated function body; if it is a routine, the value is
?. The number of arguments does not have to match the number of parameters of the procedure. Arguments are passed strictly by value.
$)optionally followed by a tag - which is a sequence of letters, digits, and dots (not necessarily beginning with a letter) - without an intervening space. If the tag of the closing bracket does not match that of the opening one, it also closes any intermediate open section. For example, the following is valid BCPL:
$(1 some code $(1.1 some code $)1.1 $(1.2 some code $( some code $)1 // Also closes 1.2 and the untagged block
c REPEAT c REPEATWHILE e c REPEATUNTIL e IF e THEN c UNLESS e DO c TEST e THEN c1 ELSE c2 WHILE e DO c UNTIL e DO c FOR i = e1 TO e2 BY e3 DO c RESULTIS e SWITCHON e INTO c GOTO e FINISH RETURN BREAK LOOP ENDCASE
Assignments have the form:
a1, a2, ..., an := e1, e2, ..., enwhere the lists on the two sides must have the same length, and ai must be valid on the left side. The order of the assignments is unspecified; they may be interleaved. Most recent implementations allow the assignment symbol to be prefixed with any dyadic operator, as in:
a1, a2, a3 <<:= e1, e2, e3This is equivalent to:
a1, a2, a3 := a1 << e1, a2 << e2, a3 << e3except that the expressions
a3are evaluated only once, and the
<<has lower precedence than any operators in the six expressions.
A call has the same form as in an expression, except that any result is ignored.
The contained command of a
REPEAT command is the smallest
possible; in particular, in:
I := VALOF F () REPEATWHILE Jthe repeated command is the call to
F. The unconditional
REPEATcommand repeats forever.
REPEATcommands always execute the contained command at least once.
BY e3 part is optional (equivalent
BY 1); e3 must be a
The identifier i is declared as a new dynamic cell
whose scope is the contained command. The command
is executed zero or more times with the cell taking values
+2*e3, ... so long as
the value is less than (greater if e3 is negative)
e2. The expressions are evaluated before the
first iteration, and not re-evaluated each time.
SWITCHON command, the contained command must be a
block which cannot contain declarations, but can contain case
labels. These have the form
CASE e: DEFAULT:where e is a constant expression. Note that case labels can only be in that block, and not in inner blocks (except for nested
FINISH terminates the program;
the current procedure (with an undefined result if it is a function);
BREAK terminates, and
LOOP ends the
current iteration of, the textually surrounding
ENDCASE similarly exits the textually
terminate the current procedure.
Each declaration that creates a cell with dynamic storage duration causes a new cell to be created each time the declaration is executed; the cell is (notionally) destroyed when the block containing the declaration terminates. The order that the cells declared in the same simultaneous declaration are initialized is undefined; they may be interleaved.
There are three types of section declarations. In each case, vi stands for an identifier not declared at the current level, and ki for a constant expression.
; ... ;vn
; ... ;vn
; ... ;vn
Simultaneous declarations take the form:
There are four types of declaration that can be used; they are shown here with
AND ... ANDdn
LET, but of course can also occur with
AND. The types may be freely mixed within one simultaneous declaration.
GLOBALdeclaration for v, then it continues to designate the cell of the global vector. Otherwise it designates a cell with static storage duration. Before the program starts, the cell is initialized to a procedure designator that invokes the appropriate body.
The first form generates a routine (which has no result), and the second a function (which does). In effect, the following are equivalent:
The first two forms may only occur within blocks.
This isn't a formal constraint written in the
BCPL book by Richards and
Whitby-Strevens, but I've never
LET declarations of variables. In
particular, if I wanted a static or global array, I would always
declare the array within
START, and then assign it to
the static or global variable.
The parameters of a function or routine designate contiguous cells, with adjacent parameters being adjacent. It is implementation defined whether the first parameter has the lowest or highest address. It is implementation-defined whether the values of arguments in excess of the number of parameters are stored in further contiguous cells or are discarded.
Functions and routines may be declared inside blocks. A function or routine may not refer to an identifier which designates a cell with dynamic storage duration and is not declared in that function or routine.
GLOBALdeclaration of the identifier is in scope, that cell is used) which is initialized, before the program starts, to a jump target referring to the location of the label. A label is in scope within the smallest of:
A jump closure is a value which refers to
a particular invocation of a function or routine. The closure
of the current invocation can be determined by calling the library
LEVEL(); this value is no longer valid when
the invocation terminates.
There are two ways to jump to a label:
and calling the library routine
t is an expression which evaluates to a jump target,
and c is an expression evaluating to a jump closure.
The jump must be made either from the scope of the target label,
or from a function or routine called (possibly indirectly) from
within that scope.
In the latter case, c must evaluate to the closure of
an invocation of the function containing the label and which is
GLOBALdeclarations in the header
LIBHDR. The allocation of routines and functions to particular cells is implementation defined, but only cells less than
Note that input and output goes via the designators held in
the global vector cells designated by
Altering those cells will alter the behaviour of all
LIBHDR contains the manifests:
BITSPERBYTE BYTESPERWORD ENDSTREAMCH FIRSTFREEGLOBAL
ENDSTREAMCHat end of file.
RDCHand returns it.
WRCH. Padding is done with spaces on the left.
WRCH. The format is a string which can contain:
%% call WRCH('%') %S call WRITES with the next argument %C call WRCH with the next argument %O1 call WRITEOCT with the next argument %X1 call WRITEHEX with the next argument %I1 call WRITED with the next argument %N call WRITEN with the next argumentwhere
1can be any single base-36 digit (i.e.
Bmeans 11, and so on), and is passed as the second argument of the call.
Other formats are implementation-defined. Note that these calls go
via the global vector (normally twice, since each
s%i; used when
%is not supported.
s%i:=c; used when
%is not supported.
0 TO v!0 DO s%i
v!iexcept that it returns the offset from
vof the cell holding
0 TO s%0 DO v!i
(x*y/z)evaluated without overflow if the final result fits in a cell.
\\to end of line, or from
\*to the corresponding close symbol (the two characters reversed); comments do not nest, and all comment symbols other than the required close symbol are ignored in a comment.
DOare interchangeable in all circumstances, as are
DOmay be omitted if followed by another keyword or a block.
GETfollowed by a string constant causes the file indicated by the string to be textually included at this point. The keyword and string (possibly followed by a comment) should form a source line of their own.
The following parts of the language are not supported by all compilers:
%operator (but string constants are part of the core language)