Forth Lesson 7
Review
In the previous lesson, we learned how to:
- Use variables and values to store numbers
- Use "create" and "," to make named data structure instances
- Perform simple address arithmetic
- Use "constant" to make efficient symbolic names for constants
Control Structures
Conditionals
ok hex ok : hex-digit ( n -- ascii ) \ Convert low nibble of n to ASCII hex ] h# f and ( low-nibble ) ] dup 9 <= if ( low-nibble ) ] h# 30 + ( ascii ) ] else ( low-nibble ) ] h# 37 + ( ascii ) ] then ( ascii ) ] ; ok 6 hex-digit . \ . displays the numerical value 36 ok 6 hex-digit emit \ Emit displays the number as an ASCII character 6 ok f hex-digit . 46 ok f hex-digit emit F
This simple example illustrates quite a few points. First, observe that you can continue definitions across lines. When you are doing so interactively, the prompt changes from "ok" to "]" to remind you that you are in the middle of a definition.
Also note the commenting style of listing the stack contents at the end of each line. This simple expedient is extremely helpful in making Forth code easy to write, debug, analyze, and maintain.
Finally, let's focus on the conditional structure "if ... else ... then", the main topic of this lesson.
These names are unfortunate, causing immediate confusion. The names should have been "if ... else ... endif", but for some reason, the name "then" was chosen as the end of the structure.
The general usage is
<code_to_compute_a_flag> if <true_clause> else <false_clause> then
"<code_to_compute_a_flag>" can be anything that leaves a number on the top of the stack. "<true_clause>" can be any sequence of Forth code; it is executed if the flag (the number on top of the stack when you get to "if") is nonzero. "<false_clause>" is executed if the flag is zero.
You can think of "if" as a conditional branch to just after the "else" and "else" as an unconditional branch to just after the "then". In fact that is essentially how it is implemented.
The "else <false_clause>" is optional; you can just write "if ... then".
Comparison Operators
The "flag" value that "if" tests can be an ordinary number (0 is considered false, nonzero true) or a well-formed flag. A well-formed flag is either 0 (false) or -1 (true). In either case, it is just a number on the top of the stack. There are quite a few operators that perform comparisons, typically used to create flags for control structures. All of these operators return well-formed (0 or -1) flags.
= ( n1 n2 -- flag ) \ True if n1 = n2 <> ( n1 n2 -- flag ) \ True if n1 != n2 < ( n1 n2 -- flag ) \ True if n1 < n2 (signed) u< ( u1 u2 -- flag ) \ True if n1 < n2 (unsigned) > ( n1 n2 -- flag ) \ True if n1 > n2 (signed) u> ( u1 u2 -- flag ) \ True if n1 > n2 (unsigned) <= ( n1 n2 -- flag ) \ True if n1 <= n2 (signed) u<= ( u1 u2 -- flag ) \ True if n1 <= n2 (unsigned) >= ( n1 n2 -- flag ) \ True if n1 >= n2 (signed) u>= ( u1 u2 -- flag ) \ True if n1 >= n2 (unsigned) 0= ( n -- flag ) \ True if n = 0 (logical not, like '!' in C) 0<> ( n -- flag ) \ True if n != 0 0< ( n -- flag ) \ True if n < 0 0> ( n -- flag ) \ True if n > 0 0<= ( n -- flag ) \ True if n <= 0 0>= ( n -- flag ) \ True if n >= 0 true ( -- -1 ) \ Pushes true (-1) false ( -- 0 ) \ Pushes false (0) between ( n low high -- flag ) \ True if low <= n <= high within ( n low high -- flag ) \ True if low <= n < high
Let's break down the "if" line in the example above. It reads "dup 9 <= if". Before this line starts executing, the top of the stack is a number that is the low nibble of the argument value, i.e. a number from 0..f, as indicated by the stack comment at the end of the "h# f and" line.
( low-nibble ) dup ( low-nibble low-nibble ) 9 ( low-nibble low-nibble 9 ) <= ( low-nibble flag ) if ( low-nibble )
The "dup" makes a copy of the number, because we will still need that number after we do the comparison. "9" pushes the number nine. "<=" pops two numbers from the stack (low-nibble and 9), performs the comparsion, and pushes a flag that represents the result. "if" pops the flag. If the flag was nonzero, the code between "if" and "else" is executed, otherwise the code between "else" and "then" is executed. In either case, low-nibble is on the stack ready for the addition. (The "+" could be moved out of the conditional, after the "then".)
Conditional Loops
Conditional looping works in a similar fashion; a flag on the stack
controls whether or not looping continues. There are several forms:
begin again \ Executes forever
begin ( flag ) until \ Executes until the flag is true
begin <code1> ( flag ) while <code2> repeat \ See below
"begin <code> again" is a forever loop. Essentially, "again" is an
unconditional branch back to just after "begin". "<code>" can be
any sequence of Forth words.
- PROTIP: If you find you have accidentally created an infinite loop, you may interrupt the OLPC Open Firmware interpreter by pressing the frame key at the upper right corner of the OLPC keyboard
"begin <code> until" is an "exit at the end" loop. It executes
<code>, which should leave a flag on the top of the stack.
"until" pops that flag and if it is zero, "until" branches back to just
after the "begin".
"begin <code1> while <code2> repeat" is an "exit at the beginning"
loop. "while" pops a flag from the stack (presumably left by <code1>).
If the flag is nonzero, loop execution continues with <code2> and
"repeat" branches back to just after the "begin". If the flag is zero,
"while" branches to just after the "repeat". Normally <code1> does
the conditional test and <code2> does the "work", but you can put
anything you want in <code1> so long as it leaves a flag value on
the stack when it is done.
Note that the sense of the flag is different for "while" and "until".
"while" loops while the flag is true, whereas "until" loops until the
flag is true (i.e. while it is false).
Here are some concrete examples:
ok 0 begin dup . 1+ dup 5 = until
0 1 2 3 4
Note that in Open Firmware control structures can be used either interactively as shown above
or inside colon definitions as shown in the "if" example. (By the way,
there is a better way to do counting loops, as we will see soon.)
ok begin 5 . key? until
5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 ...
That will keep displaying 5 until you type a key. "key? ( -- flag )"
tests the input device to see if a character is available.
"begin ... key? until" is a handy way to beat on hardware registers;
just replace ... with code to do whatever you want. It will run
until you stop it by typing a key.
ok 5 begin dup while dup . 1- repeat
5 4 3 2 1
ok 0 begin dup while dup . 1- repeat
ok █
Do Loops
ok 7 2 do 9 . loop
9 9 9 9 9
ok 7 2 do i . loop
2 3 4 5 6
ok 7 2 do i . 3 +loop
2 5
"( limit start ) do <code> loop" iterates over integers beginning with
start (the top of the stack when "do" is first encountered) and ending when
the index reaches limit (second on stack). Note that the loop stops when
limit is reached, so the index value is never equal to limit inside the loop.
ok 5 0 do i . loop
0 1 2 3 4
If "start" is 0 and "limit" is n, the body of the loop will execute n times.
"i" pushes the loop index onto the stack. If you nest do-loops and need to
"reach out" and get the index of the enclosing loop, that is "j", and then
"k" for triply-nested loops. That is as far as it goes.
"( n ) +loop" adds n to the loop index instead of 1.
"do" always executes the body at least once, even if start and limit
are the same. "?do" is like "do" except that it won't execute the
body if start and limit are the same limit.
To specify the loop boundaries as "( start length )" instead of
"( limit start )", use "bounds" to do the conversion:
bounds ( start length -- limit start )
Here is an example:
ok 10000 5 bounds do i . loop
10000 10001 10002 10003 10004
You can run a loop backwards, with a negative increment value to "+loop".
The stopping condition is somewhat surprising:
ok 1 5 do i . -1 +loop
5 4 3 2 1
Why did it display 1, since up-counting loops don't execute when the
index is equal to the limit value? The rule is this: The loop stops
when the index crosses the boundary between limit-1 and limit. So on
the way up, the last index value is less than limit, while on the way
down, the last index is equal to limit. Takes a bit of getting used
to, but the rule is precise, and there is really no way to do it
better, considering that "+loop" cannot know in advance which way the
loop is going (because the increment value can be computed at run time).
You can terminate a do loop prematurely, before the limit has
been reached, with "leave".
ok 1000000 0 do i . key? if leave then loop
0 1 2 3 4 ...
That would count up to 1000000, stopping early if you type a key.
You can terminate a do loop prematurely before exit in a definition:
: counting 1000000 0 do i . key? if unloop exit then loop ." finished!" ;
That would count up to 1000000, stopping early if you type a key, but if you did not then it would print "finished!".
How Control Structures Work
You might find it difficult to reconcile the behavior of control
structures with what previous lessons have said about how the compiler
works. In compilation state, the interpreter/compiler is supposed
to simply add the behavior of each word to the new definition. So
how do the various parts of control structures "hook up" with each
other? For example, how does "if" find the target of its forward
branch?
The answer is that a few words are specially marked with an "immediate" bit.
Those "immediate" words are not compiled directly, but instead are
executed when encountered in compilation state. They can be thought
of as parts of the compiler mechanism. The compiler mechanism is thus
extensible, because you can create your own immediate words.
Most of the control structure words are immediate. Consider the
sequence "if ... then" inside a definition. Since "if" is immediate,
the compiler doesn't compile it in the usual way, but instead executes
it. When "if" executes, it uses compiler primitives to compile a
conditional branch token into the current definition, followed by
a dummy branch offset (since the true branch offset is not yet known).
Then it pushes the address of that branch offset onto the stack, thus
remembering it for later. "if" then returns to the compiler, which
proceeds to compile the body of the control structure in the normal
way. The compiler eventually gets to "then", which is also immediate,
so it gets executed too. "then" gets the address of the dummy branch
offset from the stack and fixes up the offset, since the branch distance
is now known.
Other control structure words work in similar fashion, executing during
compilation to compile conditional or unconditional branches, resolving
the offsets as they become known. All of the stack activity surrounding
the branch resolution occurs during compilation. When the new definition
is later executed, the only stack activity related to the conditional
control structures is the popping of the flag value that controls which
path is taken.
There is an additional wrinkle with do loops. They have to manage the
loop index and the limit value, neither of which is necessarily known
at compile time. Do loops use the return stack (the auxiliary stack
that holds subroutine return addresses) for storing loop control parameters.
This is possible because do loops must be properly nested with respect
to definitions, i.e. a do loop must finish in a controlled fashion
before you can return from the word that contains it.
How Interpreted Control Structures Work
As mentioned previously, in Open Firmware you can use control structures interactively,
without compiling them inside a colon definition. How does that work,
in light of the preceding description about how they are compiled?
When a control structure starting word (e.g. "begin") is executed but
the interpreter state is "interpretation" instead of "compilation",
it calls a word that starts a temporary anonymous colon definition,
and then everything works as before. When that control structure
is completed, the ending word (e.g. "until") notices that, and calls
a word that executes then discards the temporary definition. So
"interpreted" control structures execute at full compiled speed.
Note that in ANS Standard Forth, such as gforth, control structures are only allowed inside of definitions, not at the interactive command line.
Thus endeth the lesson.