Forth Lesson 7

From OLPC
Revision as of 17:17, 3 November 2016 by WilliamBulley (talk | contribs) (fix minor typo: is --> it)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search
Mitch Bradley's Forth and
Open Firmware Lessons:

Review

In the previous lesson, we learned how to:

  • Use variables and values to store numbers
  • Use "create" and "," to make named data structure instances
  • Perform simple address arithmetic
  • Use "constant" to make efficient symbolic names for constants

Control Structures

Conditionals

 ok hex
 ok : hex-digit  ( n -- ascii )  \ Convert low nibble of n to ASCII hex
  ]    h# f and         ( low-nibble )
  ]    dup 9 <=  if     ( low-nibble )
  ]       h# 30 +       ( ascii )
  ]    else             ( low-nibble )
  ]       h# 37 +       ( ascii )
  ]    then             ( ascii )
  ] ;
 ok 6 hex-digit .       \ . displays the numerical value
 36
 ok 6 hex-digit emit    \ Emit displays the number as an ASCII character
 6
 ok f hex-digit .
 46
 ok f hex-digit emit
 F

This simple example illustrates quite a few points. First, observe that you can continue definitions across lines. When you are doing so interactively, the prompt changes from "ok" to "]" to remind you that you are in the middle of a definition.

Also note the commenting style of listing the stack contents at the end of each line. This simple expedient is extremely helpful in making Forth code easy to write, debug, analyze, and maintain.

Finally, let's focus on the conditional structure "if ... else ... then", the main topic of this lesson.

These names are unfortunate, causing immediate confusion. The names should have been "if ... else ... endif", but for some reason, the name "then" was chosen as the end of the structure.

The general usage is

 <code_to_compute_a_flag>  if  <true_clause>  else  <false_clause>  then

"<code_to_compute_a_flag>" can be anything that leaves a number on the top of the stack. "<true_clause>" can be any sequence of Forth code; it is executed if the flag (the number on top of the stack when you get to "if") is nonzero. "<false_clause>" is executed if the flag is zero.

You can think of "if" as a conditional branch to just after the "else" and "else" as an unconditional branch to just after the "then". In fact that is essentially how it is implemented.

The "else <false_clause>" is optional; you can just write "if ... then".

Comparison Operators

The "flag" value that "if" tests can be an ordinary number (0 is considered false, nonzero true) or a well-formed flag. A well-formed flag is either 0 (false) or -1 (true). In either case, it is just a number on the top of the stack. There are quite a few operators that perform comparisons, typically used to create flags for control structures. All of these operators return well-formed (0 or -1) flags.

 =    ( n1 n2 -- flag )  \ True if n1 = n2
 <>   ( n1 n2 -- flag )  \ True if n1 != n2
 <    ( n1 n2 -- flag )  \ True if n1 < n2  (signed)
 u<   ( u1 u2 -- flag )  \ True if n1 < n2  (unsigned)
 >    ( n1 n2 -- flag )  \ True if n1 > n2  (signed)
 u>   ( u1 u2 -- flag )  \ True if n1 > n2  (unsigned)
 <=   ( n1 n2 -- flag )  \ True if n1 <= n2  (signed)
 u<=  ( u1 u2 -- flag )  \ True if n1 <= n2  (unsigned)
 >=   ( n1 n2 -- flag )  \ True if n1 >= n2  (signed)
 u>=  ( u1 u2 -- flag )  \ True if n1 >= n2  (unsigned)
 0=   ( n -- flag )      \ True if n = 0   (logical not, like '!' in C)
 0<>  ( n -- flag )      \ True if n != 0
 0<   ( n -- flag )      \ True if n < 0
 0>   ( n -- flag )      \ True if n > 0
 0<=  ( n -- flag )      \ True if n <= 0
 0>=  ( n -- flag )      \ True if n >= 0
 true   ( -- -1 )        \ Pushes true (-1)
 false  ( -- 0 )         \ Pushes false (0)
 between  ( n low high -- flag )  \ True if low <= n <= high
 within   ( n low high -- flag )  \ True if low <= n <  high

Let's break down the "if" line in the example above. It reads "dup 9 <= if". Before this line starts executing, the top of the stack is a number that is the low nibble of the argument value, i.e. a number from 0..f, as indicated by the stack comment at the end of the "h# f and" line.

         ( low-nibble )
 dup     ( low-nibble low-nibble )
 9       ( low-nibble low-nibble 9 )
 <=      ( low-nibble flag )
 if      ( low-nibble )

The "dup" makes a copy of the number, because we will still need that number after we do the comparison. "9" pushes the number nine. "<=" pops two numbers from the stack (low-nibble and 9), performs the comparsion, and pushes a flag that represents the result. "if" pops the flag. If the flag was nonzero, the code between "if" and "else" is executed, otherwise the code between "else" and "then" is executed. In either case, low-nibble is on the stack ready for the addition. (The "+" could be moved out of the conditional, after the "then".)

Conditional Loops

Conditional looping works in a similar fashion; a flag on the stack controls whether or not looping continues. There are several forms:

 begin    again            \ Executes  forever
 begin    ( flag ) until   \ Executes until the flag is true
 begin  <code1> ( flag ) while  <code2>  repeat   \ See below

"begin <code> again" is a forever loop. Essentially, "again" is an unconditional branch back to just after "begin". "<code>" can be any sequence of Forth words.

PROTIP: If you find you have accidentally created an infinite loop, you may interrupt the OLPC Open Firmware interpreter by pressing the frame key at the upper right corner of the OLPC keyboard Key frame.jpg

"begin <code> until" is an "exit at the end" loop. It executes <code>, which should leave a flag on the top of the stack. "until" pops that flag and if it is zero, "until" branches back to just after the "begin".

"begin <code1> while <code2> repeat" is an "exit at the beginning" loop. "while" pops a flag from the stack (presumably left by <code1>). If the flag is nonzero, loop execution continues with <code2> and "repeat" branches back to just after the "begin". If the flag is zero, "while" branches to just after the "repeat". Normally <code1> does the conditional test and <code2> does the "work", but you can put anything you want in <code1> so long as it leaves a flag value on the stack when it is done.

Note that the sense of the flag is different for "while" and "until". "while" loops while the flag is true, whereas "until" loops until the flag is true (i.e. while it is false).

Here are some concrete examples:

 ok  0  begin  dup .  1+  dup 5 =  until
 0 1 2 3 4

Note that in Open Firmware control structures can be used either interactively as shown above or inside colon definitions as shown in the "if" example. (By the way, there is a better way to do counting loops, as we will see soon.)

 ok begin  5 .  key? until
 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 ...

That will keep displaying 5 until you type a key. "key? ( -- flag )" tests the input device to see if a character is available. "begin ... key? until" is a handy way to beat on hardware registers; just replace ... with code to do whatever you want. It will run until you stop it by typing a key.

 ok  5  begin  dup  while  dup .  1-  repeat
 5 4 3 2 1
 ok  0  begin  dup  while  dup .  1-  repeat
 ok █

Do Loops

 ok 7 2  do  9 .  loop
 9 9 9 9 9 
 ok 7 2  do  i .  loop
 2 3 4 5 6
 ok 7 2  do  i .  3 +loop
 2 5 

"( limit start ) do <code> loop" iterates over integers beginning with start (the top of the stack when "do" is first encountered) and ending when the index reaches limit (second on stack). Note that the loop stops when limit is reached, so the index value is never equal to limit inside the loop.

 ok 5 0  do  i .  loop
 0 1 2 3 4

If "start" is 0 and "limit" is n, the body of the loop will execute n times.

"i" pushes the loop index onto the stack. If you nest do-loops and need to "reach out" and get the index of the enclosing loop, that is "j", and then "k" for triply-nested loops. That is as far as it goes.

"( n ) +loop" adds n to the loop index instead of 1.

"do" always executes the body at least once, even if start and limit are the same. "?do" is like "do" except that it won't execute the body if start and limit are the same limit.

To specify the loop boundaries as "( start length )" instead of "( limit start )", use "bounds" to do the conversion:

 bounds  ( start length -- limit start )

Here is an example:

 ok  10000 5  bounds  do  i .  loop
 10000 10001 10002 10003 10004 

You can run a loop backwards, with a negative increment value to "+loop". The stopping condition is somewhat surprising:

 ok  1 5  do  i .  -1 +loop
 5 4 3 2 1

Why did it display 1, since up-counting loops don't execute when the index is equal to the limit value? The rule is this: The loop stops when the index crosses the boundary between limit-1 and limit. So on the way up, the last index value is less than limit, while on the way down, the last index is equal to limit. Takes a bit of getting used to, but the rule is precise, and there is really no way to do it better, considering that "+loop" cannot know in advance which way the loop is going (because the increment value can be computed at run time).

You can terminate a do loop prematurely, before the limit has been reached, with "leave".

 ok  1000000 0  do  i .  key?  if  leave  then   loop
 0 1 2 3 4 ...

That would count up to 1000000, stopping early if you type a key.

You can terminate a do loop prematurely before exit in a definition:

: counting  1000000 0  do  i .  key?  if  unloop exit  then   loop  ." finished!"  ;

That would count up to 1000000, stopping early if you type a key, but if you did not then it would print "finished!".

How Control Structures Work

You might find it difficult to reconcile the behavior of control structures with what previous lessons have said about how the compiler works. In compilation state, the interpreter/compiler is supposed to simply add the behavior of each word to the new definition. So how do the various parts of control structures "hook up" with each other? For example, how does "if" find the target of its forward branch?

The answer is that a few words are specially marked with an "immediate" bit. Those "immediate" words are not compiled directly, but instead are executed when encountered in compilation state. They can be thought of as parts of the compiler mechanism. The compiler mechanism is thus extensible, because you can create your own immediate words.

Most of the control structure words are immediate. Consider the sequence "if ... then" inside a definition. Since "if" is immediate, the compiler doesn't compile it in the usual way, but instead executes it. When "if" executes, it uses compiler primitives to compile a conditional branch token into the current definition, followed by a dummy branch offset (since the true branch offset is not yet known). Then it pushes the address of that branch offset onto the stack, thus remembering it for later. "if" then returns to the compiler, which proceeds to compile the body of the control structure in the normal way. The compiler eventually gets to "then", which is also immediate, so it gets executed too. "then" gets the address of the dummy branch offset from the stack and fixes up the offset, since the branch distance is now known.

Other control structure words work in similar fashion, executing during compilation to compile conditional or unconditional branches, resolving the offsets as they become known. All of the stack activity surrounding the branch resolution occurs during compilation. When the new definition is later executed, the only stack activity related to the conditional control structures is the popping of the flag value that controls which path is taken.

There is an additional wrinkle with do loops. They have to manage the loop index and the limit value, neither of which is necessarily known at compile time. Do loops use the return stack (the auxiliary stack that holds subroutine return addresses) for storing loop control parameters. This is possible because do loops must be properly nested with respect to definitions, i.e. a do loop must finish in a controlled fashion before you can return from the word that contains it.

How Interpreted Control Structures Work

As mentioned previously, in Open Firmware you can use control structures interactively, without compiling them inside a colon definition. How does that work, in light of the preceding description about how they are compiled?

When a control structure starting word (e.g. "begin") is executed but the interpreter state is "interpretation" instead of "compilation", it calls a word that starts a temporary anonymous colon definition, and then everything works as before. When that control structure is completed, the ending word (e.g. "until") notices that, and calls a word that executes then discards the temporary definition. So "interpreted" control structures execute at full compiled speed.

Note that in ANS Standard Forth, such as gforth, control structures are only allowed inside of definitions, not at the interactive command line.

Thus endeth the lesson.

Next Lesson